Advanced search

Message boards : Number crunching : Ensuring GPU Overclock Stability

Author Message
Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,403,839
RAC: 1,089,717
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43617 - Posted: 28 May 2016 | 9:45:33 UTC
Last modified: 28 May 2016 | 9:51:59 UTC

I recently discovered a new tool that has helped me to ensure that my Overclock is stable, and I wanted to share the info. I have tried many tools to ensure a stable GPU overclock, but the following are the best I've come across, including the 3rd new one! Read on :)

Previously, my 2 tools were the following (which I still recommend!):
- Run the Heaven benchmark, at maximum resolution and Antialiasing and tessellation, full-screen, overnight. If it crashes or leaves leaves a TDR .dmp in C:\Windows\LiveKernelReports\WATCHDOG, then the overclock is unstable
- Run GPUGrid tasks, and inspect the results on the web; even look at the successful ones; if any task ever says "The simulation has become unstable", then the overclock is unstable

My new tool is to continuously run a complex PrimeGrid Genefer OpenCL calculation.

Specifically:
- Download the latest PrimeGrid Genefer OpenCL app
(current version is: http://www.primegrid.com/download/primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe)
- Open a Command Prompt window
- Run with the following arguments, including the double quotes:
primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1" -d 0
- If the application completes successfully with a .log file that has "is composite. (RES=" in it, then you are short-term-stable.
- If you want to run on a different GPU, change the -d parameter
- Another good test is to swap out that number in double-quotes, with this number: "43370168^131072+1"

So ... to REALLY use this method to ensure stability, I recommend putting it into a loop, and running it overnight.

You can create a .bat file, that has this in it:

FOR /L %%G IN (1,1,200) DO (
primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1" -d 0 > 43322502_d0_%%G.txt
primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe -q "43370168^131072+1" -d 0 > 43370168_d0_%%G.txt )


... then look at the .txt files afterwards to see if any runs crashed! If they did, it's time to downclock a bit!

I have used this method to successfully determine that my brand new EVGA GTX 980 Ti FTW GPU .... was actually overclocked TOO MUCH by the factory! I have to apply "-20 MHz" to the Core Clock in MSI Afterburner, in order for it to run that .bat file overnight without error.
---shakes head at EVGA---

But overall, I'm pleased that this tool has helped me determine my max stable overclock, and I wanted to share :) Enjoy!

Jim1348
Send message
Joined: 28 Jul 12
Posts: 460
Credit: 1,130,761,180
RAC: 18,722
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43618 - Posted: 28 May 2016 | 12:14:55 UTC - in response to Message 43617.

I have used this method to successfully determine that my brand new EVGA GTX 980 Ti FTW GPU .... was actually overclocked TOO MUCH by the factory! I have to apply "-20 MHz" to the Core Clock in MSI Afterburner, in order for it to run that .bat file overnight without error.
---shakes head at EVGA---

It happens all the time. My very first Fermi card (a GTX 450) was factory overclocked too much, and I learned my lesson the hard way. Most people are willing to pay more for greater factory overclock, and they expect the card to be stable. That may be true for games (maybe), but certainly not for this type of work, as you know quite well yourself.

I like the idea of tests, but the problem is that you never know when a harder GPUGrid work unit will crash your card anyway. But let us know how you make out when the new crop of work units comes along. There may be a Noelia II waiting in the wings (with all due respect - I liked her work).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,403,839
RAC: 1,089,717
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43619 - Posted: 28 May 2016 | 12:23:07 UTC - in response to Message 43618.
Last modified: 28 May 2016 | 12:32:33 UTC

Yeah, I've previously had to downclock an EVGA GTX 660 Ti, due to factory overclocks that were too high. Sad, really. I didn't realize overclock stability was a concern, until it had crashed and ruined 2 of my long iRacing races. I learned a lot back then.

:) I have 2 GTX 980 Ti GPUs in my new system here - the other one is a reference-clocked Dell. I've used these 3 overclocking stability tools/methods, to find its maximum stable overclock, and am extremely happy that I'm able to apply a "+170 Mhz" to the Core Clock and remain stable.

In fact .. my Dell GPU, at boost, runs at a faster clock than my pricey EVGA FTW! I'm even using the Dell to drive my single display for my games, because of that. I'm not sure if I got lucky, or if it's normal to be able to outclock a FTW, but I am still surprised.

For both of these Maxwell GTX 980 Ti GPUs, the PrimeGrid Genefer test was the most-stressful (and most useful) of the 3 tests. So much so, that I felt obligated to write the first post hehe. I'm looking forward to more GPUGrid tasks to test my new GPU overclocks with!

Post to thread

Message boards : Number crunching : Ensuring GPU Overclock Stability