Advanced search

Message boards : Graphics cards (GPUs) : Computation Error and ACEMD Crashes

Author Message
David
Send message
Joined: 3 Jun 11
Posts: 2
Credit: 425,507
RAC: 0
Level

Scientific publications
wat
Message 24159 - Posted: 26 Mar 2012 | 22:26:02 UTC
Last modified: 26 Mar 2012 | 22:26:20 UTC

Before you ask, yes i read this faq: http://www.gpugrid.net/forum_thread.php?id=1314#12178

I've never had it working fully so most of the "do not change unless needed" steps are not helpful :/

It does seem like 2 or 3 jobs can stay running, so far only 1 of my assigned jobs has completed without "Computation error" message. Often when i sit down to the workstation there will be a windows error message that ACEMD or something similar has crashed as well.

You can probably see this already but i'll include the stderr output from one of the jobs via my results page:


<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using device 2
# There are 4 devices supporting CUDA
# Device 0: "GeForce GTX 590"
# Clock rate: 1.22 GHz
# Total amount of global memory: 1610612736 bytes
# Number of multiprocessors: 16
# Number of cores: 128
# Device 1: "GeForce GTX 590"
# Clock rate: 1.22 GHz
# Total amount of global memory: 1610612736 bytes
# Number of multiprocessors: 16
# Number of cores: 128
# Device 2: "GeForce GTX 590"
# Clock rate: 1.22 GHz
# Total amount of global memory: 1610612736 bytes
# Number of multiprocessors: 16
# Number of cores: 128
# Device 3: "GeForce GTX 590"
# Clock rate: 1.22 GHz
# Total amount of global memory: 1610612736 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO: cannot open file "restart.coor"

</stderr_txt>
]]>


I have the nVidia 296.10 drivers installed. Yesterday i was running with the previous version and attempted an upgrade to fix the issue. As you can see from the above output i run 2x gtx 590's which is effectively quad sli however i often keep SLI turned off in the nVidia control panel since some games i play get very ornery when it's on. The SLI connector, however, is always attached. They are identical asus gtx 590's purchased from the same batch .

In the above referenced FAQ it says to disable screen savers and any kind of graphics processing but i can sit here and watch the jobs fail after running for a short period of time. Another project im a part of (for now) is milkyway@home, they also utilize cuda and their GPU tasks fail instantly, also with "Computation Error". The Seti@home and collatz project GPU tasks complete without issue every time so far. I hope if there is a fix it can apply to the milkyway@home project as well or i'll have to drop it as im not doing much to help.

Thanks in advance.

-Dave

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24161 - Posted: 27 Mar 2012 | 6:55:50 UTC - in response to Message 24159.
Last modified: 27 Mar 2012 | 6:59:34 UTC

<snipped>

I have the nVidia 296.10 drivers installed. Yesterday i was running with the previous version and attempted an upgrade to fix the issue. As you can see from the above output i run 2x gtx 590's which is effectively quad sli however i often keep SLI turned off in the nVidia control panel since some games i play get very ornery when it's on. The SLI connector, however, is always attached. They are identical asus gtx 590's purchased from the same batch .

In the above referenced FAQ it says to disable screen savers and any kind of graphics processing but i can sit here and watch the jobs fail after running for a short period of time. Another project im a part of (for now) is milkyway@home, they also utilize cuda and their GPU tasks fail instantly, also with "Computation Error". The Seti@home and collatz project GPU tasks complete without issue every time so far. I hope if there is a fix it can apply to the milkyway@home project as well or i'll have to drop it as im not doing much to help.

Thanks in advance.

-Dave


Get rid of the 296.x driver. There is a bug in there with the monitor going into sleep mode and killing any task being computed. The 295.x ones have the same problem. The most recent usable drivers are 290.x ones.

Some projects have even blocked anyone with those drivers from getting GPU work (Einstein) until Nvidia fixes it.

Another option if you really want to use either series of drivers is to change your power setting to never turn off the monitor and set your screen saver to the same. Won't help with Einstein though.
____________
BOINC blog

David
Send message
Joined: 3 Jun 11
Posts: 2
Credit: 425,507
RAC: 0
Level

Scientific publications
wat
Message 24167 - Posted: 27 Mar 2012 | 22:45:15 UTC - in response to Message 24161.

Thank you, i'll do that :)

david_alary
Send message
Joined: 16 Nov 08
Posts: 4
Credit: 10,286,292
RAC: 0
Level
Pro
Scientific publications
watwatwat
Message 24493 - Posted: 20 Apr 2012 | 15:00:36 UTC - in response to Message 24167.

I have the same problem as you, David. Do these solutions have solved your problem? If yes, what method did you choose?

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24494 - Posted: 20 Apr 2012 | 15:31:43 UTC
Last modified: 20 Apr 2012 | 15:32:24 UTC

301.24 Beta get's rid of the sleep monitor bug, further, you will need it in order to run the new beta app which uses cuda4.2 If you wish to keep 295-296, all you have to do is change the power settings to allow your computer to never put your monitor to sleep (also shouldn't hibernate). If you turn your monitor off manually, than after changing these settings, everything works fine.

Again, the beta driver 301.24 has fixed this problem.

voss749
Send message
Joined: 27 Mar 11
Posts: 26
Credit: 307,452,808
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 24584 - Posted: 25 Apr 2012 | 15:46:20 UTC

Did this effect the Linux version of the nvidia driver or just the windows version????

Paul Hulme
Send message
Joined: 20 Dec 08
Posts: 1
Credit: 451,646,135
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24823 - Posted: 9 May 2012 | 10:02:25 UTC

Hi everyone.
I have spent many months being very patient and following advice given especially regarding drivers. Nothing has improved from the continuous failure of WU's with the famous "computation error" which tells me nothing. I have returned to the latest nvidia drivers for my GTX 580. On the plus side i do occasionally have a WU or two which finish successfully but this does not make up for the many, many failures. It appears that when i check the same is common among many others too on the same WU's. This very disappointing and i feel that i am wasting valuable GPU time.
I do not have this with Seti. A failed WU is quite rare. Sadly i may donate all my GPU to Seti as there does not appear to be a solution to this ongoing problem which many others seem to suffer too. :(

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24830 - Posted: 9 May 2012 | 15:13:30 UTC

One, see your using 296 driver, I'm assuming you have your machine set to Never sleep. Two, I see you card is OC, drop the clock to stock, as well as your memory. If your crunching on all cores, leave one available for GPUgrid.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24874 - Posted: 10 May 2012 | 14:57:47 UTC - in response to Message 24823.

Hi everyone.
I have spent many months being very patient and following advice given especially regarding drivers. Nothing has improved from the continuous failure of WU's with the famous "computation error" which tells me nothing. I have returned to the latest nvidia drivers for my GTX 580. On the plus side i do occasionally have a WU or two which finish successfully but this does not make up for the many, many failures. It appears that when i check the same is common among many others too on the same WU's. This very disappointing and i feel that i am wasting valuable GPU time.
I do not have this with Seti. A failed WU is quite rare. Sadly i may donate all my GPU to Seti as there does not appear to be a solution to this ongoing problem which many others seem to suffer too. :(

I'll make one recommendation: If you are using the BOINC screen saver, turn it off. I have noted that when using the BOINC screen saver, the number of GPUGrid WUs that fail with computation error is high. Having turned off the screen saver completely, I get almost no WUs that fail due to computation errors. BTW - I'm running a mix of Win 7 and Win XP machines, and I do not run GPUGrid on the one Linux machine I have.

Lastly, I recommend not using a screen saver at all. LCDs have no burn-in problems like CRTs did; given that, screen savers are eye candy and nothing more. For me, I power off my monitor and let the computer run when running any BOINC project for extended periods of time, and GPUGrid WUs failing due to computation errors is now very rare for me.
____________

Post to thread

Message boards : Graphics cards (GPUs) : Computation Error and ACEMD Crashes

//