Advanced search

Message boards : Number crunching : Validation error when switching between Maxwell & Kepler GPUs

Author Message
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38916 - Posted: 15 Nov 2014 | 18:27:52 UTC

Hi guys,

I'm currently struggling with the stability of my system, after I added a GTX970 to the existing GTX660Ti. Because of this I was getting a few system crashes during the last week, which made the GPU-Grid WUs restart.

Here I observed something which may very well be a bug in 6.47 long runs: it seems like when ever one GPU started a WU and the other one takes over later (after the restart due to a crash) the WU may complete fine, but is marked as invalid. While the crash may cause this, at least of them show have no "computation has become unstable" entries in the log file.

I'm not certain that this happens all the time. But: in all cases when it happened, there is this switch between Kepler and Maxwell GPUs in the log. Here are the affected WUs currently in the database:

13385262
13377098
13364308
13363918
13356785

MrS
____________
Scanning for our furry friends since Jan 2002

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38917 - Posted: 15 Nov 2014 | 22:31:01 UTC - in response to Message 38916.
Last modified: 15 Nov 2014 | 22:36:17 UTC

I'm currently struggling with the stability of my system, after I added a GTX970 to the existing GTX660Ti.

I can't answer your question on the work units, but I had a devil of a time trying to get my two Asus GTX 750 Tis stable on a Haswell board (Gigabyte GA-Z97X-UD3H). The good news is that I found every conceivable hardware and software weakness and eliminated it. The bad news is that it still did not work without a BSOD (or freeze or hang on shutdown or startup) every few days. But the board worked fine with a pair of GTX 660s or a pair of HD 7790s, so I chose the latter. The GTX 750 Tis work fine on Biostar Z77 and Asrock Z87/97 boards, so that is where I use them.

Just yesterday I noticed a BIOS update for my Gigabyte motherboard, not on their website yet but reachable through a link on station-drivers.com, so I installed it. It claimed improved stability or compatibility without being specific. I noticed that my GPU cards are slightly slower now, so I expect that the board was running a little too fast for the Maxwell cards. Maybe there is something similar you could try on your board.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38918 - Posted: 15 Nov 2014 | 22:56:48 UTC - in response to Message 38917.
Last modified: 22 Nov 2014 | 10:49:16 UTC

Thanks. BTW: almost just as I posted I've observed a WU whith a switch but no validation error: 133862848

Edit, 22th Nov: I had another switch between GPUs without a prompt error. So this was probably really related to the PC crashes.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Number crunching : Validation error when switching between Maxwell & Kepler GPUs

//