Advanced search

Message boards : Graphics cards (GPUs) : Workunit failures

Author Message
ianmbaker
Send message
Joined: 23 Jul 08
Posts: 2
Credit: 1,015,635
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 5877 - Posted: 22 Jan 2009 | 19:25:27 UTC

SInce my system has downloaded Ver 6.61 all of the workunits have failed. The Messages from BOIC say:-

22/01/2009 18:43:19|GPUGRID|Sending scheduler request: To fetch work. Requesting 3021 seconds of work, reporting 1 completed tasks
22/01/2009 18:43:24|GPUGRID|Scheduler request completed: got 1 new tasks
22/01/2009 18:43:26|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-LICENSE
22/01/2009 18:43:26|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-COPYRIGHT
22/01/2009 18:43:27|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-LICENSE
22/01/2009 18:43:27|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-COPYRIGHT
22/01/2009 18:43:27|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.coor
22/01/2009 18:43:27|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.vel
22/01/2009 18:43:28|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.vel
22/01/2009 18:43:28|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-input.idx
22/01/2009 18:43:29|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-input.idx
22/01/2009 18:43:29|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.pdb
22/01/2009 18:43:38|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.coor
22/01/2009 18:43:38|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.psf
22/01/2009 18:43:56|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.pdb
22/01/2009 18:43:56|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-parameters
22/01/2009 18:43:59|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-parameters
22/01/2009 18:43:59|GPUGRID|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-SH2_US_41140000
22/01/2009 18:44:00|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-SH2_US_41140000
22/01/2009 18:44:18|GPUGRID|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.psf
22/01/2009 18:44:20|GPUGRID|Starting kA28006-SH2_US_4-0-10-SH2_US_41140000_1
22/01/2009 18:44:20|GPUGRID|Starting task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 using acemd version 661
22/01/2009 18:44:23|GPUGRID|Computation for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 finished
22/01/2009 18:44:23|GPUGRID|Output file kA28006-SH2_US_4-0-10-SH2_US_41140000_1_1 for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 absent
22/01/2009 18:44:23|GPUGRID|Output file kA28006-SH2_US_4-0-10-SH2_US_41140000_1_2 for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 absent
22/01/2009 18:44:23|GPUGRID|Output file kA28006-SH2_US_4-0-10-SH2_US_41140000_1_3 for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 absent
22/01/2009 18:44:25|GPUGRID|Started upload of kA28006-SH2_US_4-0-10-SH2_US_41140000_1_0
22/01/2009 18:44:27|GPUGRID|Finished upload of kA28006-SH2_US_4-0-10-SH2_US_41140000_1_0

Similar messages for every work unit which downloaded and failed.
There were no problems with the previous 6.55 version.

Any ideas on how I can get crunching again?

Ian

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2450
Credit: 167,653,791
RAC: 376,790
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 5880 - Posted: 22 Jan 2009 | 20:31:13 UTC - in response to Message 5877.

Strange, your error message is

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile


Sorry, I've never seen this message before and don't know what you could do.

MrS
____________
Scanning for our furry friends since Jan 2002

PeteS
Send message
Joined: 1 Jan 09
Posts: 7
Credit: 3,550,064
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 5882 - Posted: 22 Jan 2009 | 21:06:20 UTC - in response to Message 5877.

I also had all workunits failing, finally I was told that daily quota of 8WU's has been exceeded and was not allocated any more.

You can see the failed WU's from here:
http://www.gpugrid.net/results.php?userid=12774

Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 5884 - Posted: 22 Jan 2009 | 21:09:25 UTC - in response to Message 5880.

Got almost 15 WUs now erroring out with exact the same error message that ianmbaker got...but me am running LINUX with app version 6.59...
Any ideas what's going wrong here ?

Donnie
Send message
Joined: 13 Nov 08
Posts: 11
Credit: 11,185,470
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 5886 - Posted: 22 Jan 2009 | 22:55:41 UTC

I had 9 WUs also error out. They appear to have been back to back starting at 18:21:28 UTC thru 18:30:37 UTC.

I have 1 "good" 6.61 running at 69% completion.

Same error message as below:

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1078071040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile

Now I've reached my daily quota and will have 2 260 GTX 216 cards sitting idle. I suppose I could do folding @ home.

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 288,349,077
RAC: 208,855
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 5888 - Posted: 22 Jan 2009 | 23:21:05 UTC

I've errored out on a bunch of WU's after doing 1 successfully. Looks like 6.61 needs to be repealed and go back to 6.55.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 5891 - Posted: 22 Jan 2009 | 23:32:15 UTC

I have completed two so far, both a success ...
____________

Michael Milan
Send message
Joined: 19 Jan 09
Posts: 4
Credit: 1,037,300
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 5895 - Posted: 23 Jan 2009 | 3:02:50 UTC

I have a strange workunit error:

http://www.gpugrid.net/result.php?resultid=238028

Anyone knows what this means?:

"Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 283 : unknown error."

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 5896 - Posted: 23 Jan 2009 | 3:11:53 UTC - in response to Message 5895.

I have a strange workunit error:

http://www.gpugrid.net/result.php?resultid=238028

Anyone knows what this means?:

"Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 283 : unknown error."



Bad unknown things ...

Really bad, and really, really unknown things ...

But we know exactly where ...

:)

Sorry, I could not resist ... and it is the only sense of humor that I have ...
____________

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 5897 - Posted: 23 Jan 2009 | 4:45:16 UTC - in response to Message 5896.



Bad unknown things ...

Really bad, and really, really unknown things ...

But we know exactly where ...

:)

Sorry, I could not resist ... and it is the only sense of humor that I have ...




Well...there are different kinds of unknowns...

see http://www.youtube.com/watch?v=_RpSv3HjpEw

:)


Profile X1900AIW
Send message
Joined: 12 Sep 08
Posts: 73
Credit: 16,308,824
RAC: 91
Level
Pro
Scientific publications
watwatwatwat
Message 5898 - Posted: 23 Jan 2009 | 6:11:46 UTC - in response to Message 5891.

I have completed two so far, both a success ...


Same to me, success with both WUs 236301, 236294, no error, but regardless I suspend now further downloads and wait for all 6.61-result (5 from 7) in my task-queue.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 5899 - Posted: 23 Jan 2009 | 7:05:50 UTC - in response to Message 5897.

Well...there are different kinds of unknowns...

see http://www.youtube.com/watch?v=_RpSv3HjpEw

:)




As an engineer I lived by those rules ... and it was almost always the unknown unknowns that got me ... which is why I tried so hard to find out what they might be ...
____________

Chris S
Send message
Joined: 18 Jan 09
Posts: 17
Credit: 293,515
RAC: 0
Level

Scientific publications
watwatwatwatwat
Message 5905 - Posted: 23 Jan 2009 | 10:34:27 UTC

I had 2 with this error, but running a 6.61 OK now

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile

ianmbaker
Send message
Joined: 23 Jul 08
Posts: 2
Credit: 1,015,635
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 5911 - Posted: 23 Jan 2009 | 11:58:07 UTC

It looks like the problem reported in thread http://www.gpugrid.net/forum_thread.php?id=671 was the cause of the problem. The failing work units I had were in that series.

Thanks to all who responded.

Ian


____________

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwat
Message 5914 - Posted: 23 Jan 2009 | 15:14:34 UTC

Same here. I've had two successfull WU's in the past 4 days. The rest of the time I maxed out my daily quota and now it has been reduced to 1/day.

Now I sit idle (well not me at work but my computer). I have no problems with my 8800GT on Vista 64. All my issues are with 2 GTX280's on a Vista 64 machine.

Pat

Profile X1900AIW
Send message
Joined: 12 Sep 08
Posts: 73
Credit: 16,308,824
RAC: 91
Level
Pro
Scientific publications
watwatwatwat
Message 5921 - Posted: 23 Jan 2009 | 18:51:09 UTC - in response to Message 5905.

I had 2 with this error, but running a 6.61 OK now

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile



Same error, first time a 6.61 crashed: 239257, Exit status 98 (0x62)

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile

Profile mikaok
Avatar
Send message
Joined: 16 Jan 09
Posts: 12
Credit: 639,094
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 5922 - Posted: 23 Jan 2009 | 19:22:54 UTC - in response to Message 5921.
Last modified: 23 Jan 2009 | 19:24:14 UTC

I had 2 with this error, but running a 6.61 OK now

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile



Same error, first time a 6.61 crashed: 239257, Exit status 98 (0x62)

MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected
ERROR: mdioload.cu, line 146: Unable to read binvelfile


Same problem here. exit code 98 for the last 11 wu's in a row.. Mostly those were SH2_US_4 units, but also at least one SH2_US_5. Hope I don't get penalty for these :D

Post to thread

Message boards : Graphics cards (GPUs) : Workunit failures