1) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38629)
Posted 1 day ago by Profile skgiven
Thanks skgiven. Do you see an increase in performance on GPUGrid WUs when the memory clock is increased to 3500 MHz?

I think so but I don't have much to go on so far. I was really just looking for a MCL drop, which I found (~53% to ~45%).
To confirm actual runtime improvement (if any) that results solely from the memory freq. increase I would really need to run several long same type WU's at 3505MHz, then several at 3005MHz, all with the same GPU clock and Boinc settings. Ideally others would do the same to confirm findings.
That will take two or three days as there are a mixture of Long task types and each take 5 or 6h to run...
I think you would be less likely to spot a small performance change from running short WU's as those only have a MCL of around about 27%; it's not like we are overclocking here, just making sure the GDDR5 runs at the speed it's supposed to. Most of us run the Long WU's anyway, so that's what we should focus on.
2) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38604)
Posted 2 days ago by Profile skgiven
Using NVIDIA Inspector you can make sure the Current GDDR5 clocks are high, but you have to match the P-State value on the Overclocking panel to the state shown on the left. For me the P-State is P2, so in order to ensure 3505MHz is used I have to set the overclocking Performance Level to P2. Then I can push the Memory CLock Offset to 3505MHz.
When I did this with the GPU clock at 1406MHz-ish, the MCU load dropped to 45%
While I can select to unlock the clocks I cannot increase past 3505MHz - it just reverts. Hopefully this will allow for better performance and tuning...

For those with this issue, you might want to create a batch file setting your required (command line) values, and getting it to run at startup or by create a clocks shortcut from NVIDIA Inspector and either just double-clicking on it every time you restart or get it to automatically run at startup.
3) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38584)
Posted 3 days ago by Profile skgiven
Is 53% MCL really a bottleneck?

That's the question I started out trying to find the answer to - is the increased MCL really a bottleneck?
Our point of reference is that we know it was with some Kepler's. While that picture was complicated by cache variations the GTX650TiBoost allowed us to determine that cache wasn't the only bottleneck and the MCL was definitely a bottleneck in itself (for some other cards).

Shouldn't this bottleneck lower the GPU usage?

Depends on how GPU usage is being measured, but MCL should rise with GPU usage, as more bandwidth is required to support the GPU, and it appeared to do just that:
When I reduced CPU usage from 100% to 55% the GPU usage rose from 89% to 93% and the MCL increased from ~46% to 49%.
At 100% CPU usage both the GPU usage and MCL were also more erratic.

Also, when I increased the GPU clock the MCL increased:
1126MHz GPU - 45% MCL
1266MHz GPU - 49% MCL
1406MHz GPU - 53% MCL

So the signs are there.

Being able to OC or boost the GDDR5 should offset the increase in MCL (it did with Kepler's).

Did you try to lower the memory clock to measure the effect of this 'bottleneck'?

I tried but I cant change the memory clock - the Current Clock remains at 3005MHz (the default clock). It appears that NVidia Inspector, GPUZ (and previously MSI Afterburner) recognised that I've asked that the GDDR5 clocks are increased, but they don't actually rise.

I've tried Furmark, and it seems to be limited by memory bandwith, while GPUGrid seems to be limited by GPU speed:

I'm wondering if the measured MCL is actually measuring usage of the new compression system and if this actually reflects a bottleneck or not. Increasing the GDDR5 would be the simple test, but that's a non-starter, which is another question in itself.

The only way to confirm if the MCL increase is really a bottleneck is to run similar WU's at different GPU frequencies and plot the results looking for diminishing returns. You would still expect to gain plenty from a GPU OC, but should see less gain as a result of MCL increases at higher GPU frequencies. Even with a frequency difference of 1406 vs 1126 (280MHz) the MCL difference is just 18% (53% vs 45% load), but six or seven points down to around 1051MHz might be enough to spot the effect of a MCL bottleneck, if it exists.
4) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38581)
Posted 3 days ago by Profile skgiven
It's a Palit NE5X970014G2-2041F (1569) GM204-A Rev A1 with a default core clock of 1051MHz.
It uses an exhaust fan (blower), so while it's a Palit shell it's basically of reference design. Don't know of any board alterations from reference designs.
My understanding is that Palit support GDDR5 from Elpida, Hynix and Samsung. This model has the Samsung GDDR5 and like other Palit models is supposed to operate at 3505MHz (7000MHz effectively). However it seems fixed at 3005MHz. While I can set the clock to 3555MHz the current clock remains at 3005MHz. Raising or lowering it does not change the MCL (so it appears that my settings are being ignored).
So while it can run at ~110% power @ 1.212V (1406MHz) @64C Fan@75% I cannot reduce the MCL bottleneck (53% @1406MHz) which I would prefer to do.

http://www.palit.biz/palit/vgapro.php?id=2406
PN : NE5X970014G2-2041F
Memory : 4096MB / 256bit
DRAM Type : GDDR5
Clock : 1051MHz (Base) / 1178MHz (Boost)
Memory Clock : 3500 MHz (DDR 7000 MHz)
mHDMI / DVI / DisplayPort

biodoc, thanks for letting us know you are experiencing the same GDDR5 issue. Anyone else seeing this (or not)?
5) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38576)
Posted 3 days ago by Profile skgiven
Reference rated TDP Wattage per Fermi 32coreSM/ Kelper 192coreSMX/ Maxwell 128coreSMM

GTX580-244TDP [16SM/512cores] 15.25 watts per SM @ 0.47 watt per core

GTX680-195TDP [8SMX/1536cores] 24.37 watts per SMX @ 0.126 watt per core

GTX780-225TDP [12SMX/2304cores] 18.75 watts per SMX @ 0.097 watt per core

GTX780Ti-250TDP [15SMX/2880cores] 16.66 watts per SMX @ 0.086 watt per core

GTX750Ti-60TDP [5SMM/640cores] 12 watts per SMM @ 0.093 watt per core

GTX970-145TDP [13SMM/1664cores] 11.15 watts per SMM @ 0.087 watt per core

GTX980-170TDP [16SMM/2048cores] 10.62 watts per SMM @ 0.083 watt per core

GDDR5/VRM variations not included.

Reflects efficiency (GFlops/Watt) quite accurately and goes some way to explaining the design rational.

Can boost the 970 core to 1400MHz but just cant shift the GDDR5 which for here would be more productive (with most tasks)!
Can lower core and week for efficiency; dropping the Power and Temp target results in an automatic Voltage drop. Even @1265 can drop the Power and Temp target to 90% without reducing throughput.
6) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38573)
Posted 3 days ago by Profile skgiven
What was the improvement for GTX 680 compared to GTX580?

The 680 was eventually ~42% faster and had a TDP of 195W vs 244W for the 580.
Overall that jump improved performance slightly more whereas this jump has improved performance/Watt more.
7) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38571)
Posted 4 days ago by Profile skgiven
With a 970, I’ve seen Memory Controller loads from 27% for short NOELIA_SH2 tasks to 50% for several different long task types.

Running a NOELIA_SH2 WU the reference 970 boosted to 1265MHz straight out of the box and hit 27% load with the CPU being over used (100%), with less CPU usage MC load went up to 31%.

With the GPU clocked @1354MHz MC load reached 50% running long NOELIA_20MG2, SDOERR_BARNA5 and NOELIA_UNFOLD WU's.

Unfortunately I cannot OC the GDDR using Afterburner!

When the CPU was completely saturated (100%) my stock GPU performance was 29% less than with the CPU at 50%.

@1354MHz my 970 is ~30% faster than my 770 was at stock on the same system. So I would expect 970's to generally be about 20 to 30% faster than 770's at reference.
8) Message boards : Graphics cards (GPUs) : GTX 750Ti Questions (Message 38553)
Posted 6 days ago by Profile skgiven
If you OC you should see some improvement unless you are encountering recoverable errors. With the GTX750Ti there isn't a Memory Controller bottleneck so your probably not going to get much out of the GDDR5. Its likely you would see a greater improvement from OC'ing the GPU core.
For cards with Memory Controller bottlenecks it's the exact opposite.
9) Message boards : Graphics cards (GPUs) : Maxwell now (Message 38551)
Posted 7 days ago by Profile skgiven
The GTX970 Maxwell is only about 10% more energy efficient than a GTX750Ti Maxwell. Considering efficiency scales well with core count this suggests an issue with the GTX900's.

WRT the GTX980 and the GTX970, for most people the GTX970 is the better choice; it's significantly cheaper than the GTX980 (started out at half the price) and as pointed out comes close to matching performance (initially thought to be 80% but looks more like 88% for here). Why? Both are Memory Controller constricted but more so the 980. The 750Ti does not have such Memory Controller issues. We've seen this Memory Controller Factor before especially with smaller Kepler GPU's.
This obviously suggests better performance would come from OC'ing the GTX900's GDDR5, and it might even be worth while researching which memory chips various cards use before purchasing. It could also hint at what's to come, one way or another...
In the UK the GTX970 has increased in price from ~£250 at release to ~£263 (5% rise) while the GTX980 has fallen in price from ~£500 to £419.99 (19% drop). This mostly reflects the relative gaming value. It wouldn't surprise me if we found that the actual performance/Watt for the GTX970 here was slightly better than the GTX980 (2% or so)...
Anyway, unless you need quad Sli, the GTX980 is too pricey.
Presently in the UK three GTX970's would cost £789, while two GTX980's would cost £840. The three 970's would do 32% more work (assuming they actually perform at 88% of a GTX980 for here) and cost £51 less.
10) Message boards : News : GPUGRID statistics graphs (Message 38542)
Posted 7 days ago by Profile skgiven
The time scale is too short to construe meaning; compute capability distribution vs time should be over two or three months rather than 11 days.


Next 10