Advanced search

Message boards : Graphics cards (GPUs) : Compute error 1(0x1) on all units since last night

Author Message
Alez
Send message
Joined: 17 Nov 12
Posts: 10
Credit: 185,958,753
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 27896 - Posted: 1 Jan 2013 | 12:49:56 UTC

MY GTX 660 TI and GTX 650 just suddenly started erroring out every task all with the same error as far as I can tell.

Name 2x11_8-NOELIA_hfXA_long-0-2-RND7200_1
Workunit 3977330
Created 1 Jan 2013 | 5:44:59 UTC
Sent 1 Jan 2013 | 10:14:59 UTC
Received 1 Jan 2013 | 10:23:43 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1)
Computer ID 138949

<core_client_version>7.0.33</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
]]>

Everything was working fine until last night. nVidia drivers 306,97

any ideas whats wrong ?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27897 - Posted: 1 Jan 2013 | 13:15:39 UTC - in response to Message 27896.

MY GTX 660 TI and GTX 650 just suddenly started erroring out every task all with the same error as far as I can tell.

<core_client_version>7.0.33</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
]]>

Everything was working fine until last night. nVidia drivers 306,97

any ideas whats wrong ?

Sometimes the card (driver, or the OS) gets stuck, and only a restart can resolve it.
Have you tried a system restart?

Alez
Send message
Joined: 17 Nov 12
Posts: 10
Credit: 185,958,753
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 27898 - Posted: 1 Jan 2013 | 13:27:57 UTC

Just reset GPUgrid and away to restart system. Was wondering if there was a known error as i've already trashed 32 units so didn't want to keep trashing more.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27900 - Posted: 1 Jan 2013 | 15:10:42 UTC - in response to Message 27898.
Last modified: 1 Jan 2013 | 15:33:55 UTC

I appear to have had a similar problem. It started last night, just after midnight CET.
http://www.gpugrid.net/results.php?hostid=139265&offset=0&show_names=0&state=5&appid=
Long tasks just started failing, one after the other. Most failed after ~200sec. They might have just been failing on my GTX660Ti, and not my GTX470's; a task was running on it. After I restarted the same task started to run on my GTX660Ti, and now seems to be progressing normally...
GPUGrid stopped sending me work, so I will have to run some jobs from other projects and wait for my rating to improve before getting new tasks (only ~4h if the one task I have completes and reports successfully).
As well as the possibility that this was cause by bad tasks, this could have been cause by a CPU Boinc project, Boinc, or be down to the driver (306.97 in my case). W7x64.

Of the failed WU's, two tasks also failed on other systems:
http://www.gpugrid.net/workunit.php?wuid=3977079
http://www.gpugrid.net/workunit.php?wuid=3977023

However some resends ran successfully, suggesting it's not an issue with GPUGrid.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Alez
Send message
Joined: 17 Nov 12
Posts: 10
Credit: 185,958,753
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 27902 - Posted: 1 Jan 2013 | 15:30:46 UTC

Reset the project, did a clean nVidia driver update to 310.70 and rebooted. So far got 1 task and that seems to be running to completion. 15 more % to go and we will see....

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27906 - Posted: 1 Jan 2013 | 21:34:30 UTC - in response to Message 27902.

The task I had failed!
http://www.gpugrid.net/result.php?resultid=6235285

Name 2x12_4-NOELIA_hfXA_long-0-2-RND6878_0
Workunit 3977346
Created 19 Dec 2012 | 20:37:02 UTC
Sent 1 Jan 2013 | 5:58:06 UTC
Received 1 Jan 2013 | 17:01:28 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 98 (0x62)
Computer ID 139265
Report deadline 6 Jan 2013 | 5:58:06 UTC
Run time 35,731.98
CPU time 30,914.86
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v6.16 (cuda42)

ERROR: file deven.cpp line 1106: # Energies have become nan

Perhaps it was one of the earlier tasks that failed on completion?
It wasn't resent.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27912 - Posted: 2 Jan 2013 | 0:39:58 UTC - in response to Message 27906.

The task I had failed!

ERROR: file deven.cpp line 1106: # Energies have become nan

Perhaps it was one of the earlier tasks that failed on completion?
It wasn't resent.

Since then, it was resent to another host, so we will see.
We have 17880 unsent workunits (and as low as 2174 in progress) at the moment, so a resend takes more time than usual.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27915 - Posted: 2 Jan 2013 | 13:16:19 UTC - in response to Message 27912.

I have identified the root of the problem I was encountering, and it was simply that the GTX660Ti's fan remained/was stuck at 40%. I had it on a profile, so fan speed would increase with temperature, but after updating MSI Afterburner a couple of days back the profile was not applied to the GTX660Ti, it only applied to the GTX470.

That's what I get for 'upgrading' software without any real need.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27930 - Posted: 3 Jan 2013 | 11:13:56 UTC - in response to Message 27915.
Last modified: 3 Jan 2013 | 14:00:31 UTC

I've had another error on that system (GTX660Ti now at 62°C):
6286250 4012377 2 Jan 2013 | 13:14:43 UTC 2 Jan 2013 | 20:26:00 UTC Error while computing 18,859.67 1,537.28 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)

Stderr output

<core_client_version>7.0.42</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
ERROR: file deven.cpp line 1106: # Energies have become nan

called boinc_finish

</stderr_txt>
]]>

It also failed on another system using the 3.1app.
6285719 79738 2 Jan 2013 | 8:55:50 UTC 2 Jan 2013 | 10:36:53 UTC Error while computing 9.51 0.05 --- Long runs (8-12 hours on fastest card) v6.16 (cuda31)
6286250 139265 2 Jan 2013 | 13:14:43 UTC 2 Jan 2013 | 20:26:00 UTC Error while computing 18,859.67 1,537.28 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)
6287663 142106 2 Jan 2013 | 23:35:09 UTC 7 Jan 2013 | 23:35:09 UTC In progress --- --- --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)

I went through earlier WU failures and while most WU's eventually succeeded most of the resends failed on at least one other system, some failing numerous times. The issue seems to be the same for Long and Short WU's:

http://www.gpugrid.net/results.php?hostid=139265&offset=0&show_names=0&state=5&appid=

While the errors are mostly early in the runs, some occur late into the run. It's also an issue for both apps (3.2 and 4.2), and there seems to be quite a few 'error while downloading' failures.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27940 - Posted: 5 Jan 2013 | 13:00:30 UTC - in response to Message 27930.

These probably belong in the Energies have become nan thread, but,
6294105 4017217 139265 4 Jan 2013 | 23:14:14 UTC 5 Jan 2013 | 12:31:59 UTC Error while computing 42,955.90 3,395.41 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42)

6293240 112581 4 Jan 2013 | 18:40:13 UTC 4 Jan 2013 | 18:49:20 UTC Error while computing 2.16 2.09 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42)
6294105 139265 4 Jan 2013 | 23:14:14 UTC 5 Jan 2013 | 12:31:59 UTC Error while computing 42,955.90 3,395.41 --- Long runs (8-12 hours on fastest card) v6.17 (cuda42)
6296657 --- --- --- Unsent --- --- ---
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Graphics cards (GPUs) : Compute error 1(0x1) on all units since last night

//