Advanced search

Message boards : Graphics cards (GPUs) : 13 hour task, 10K award (Yippeee)

Author Message
Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9210 - Posted: 2 May 2009 | 19:01:35 UTC

This morning I woke to a task that took 13 hours to run with a time-step of 47 ms. I note that the task had a 53M output file and the time to run was about double the normal. Credit asked was 8K, granted 10K (sweet!).

The task named 10-KASHIF_HIVPR_dim_ba3-2-100-RND7725 failed for another participant (see Work Unit Page. Looking at the card he/she had looks to me like it should have been able to run the task

The stdio from my system had the following lines:

# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"

I have no idea what this means or if it is an error or an expected outcome I am only noting it here so that the those of us that might see this happen again will be forewarned and hopefully GDF or my friend ignasi will admit that it is all his fault again ... :)

Profile X1900AIW
Send message
Joined: 12 Sep 08
Posts: 74
Credit: 23,566,124
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9215 - Posted: 2 May 2009 | 19:10:50 UTC - in response to Message 9210.

Same warning text, but other credits (4352 > 5440): 598868, Name 35-IBUCH_HIVPR_mon_ba8-1-100-RND8412_0

# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
# Time per step: 30.302 ms

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 9217 - Posted: 2 May 2009 | 19:43:58 UTC

I have had several of these, works out to about the same points per hours as the shorter ones.

Upload is over 50 meg though. Completely shuts down my DSL until finished uploading.....completely.
____________
mike

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9223 - Posted: 2 May 2009 | 20:59:06 UTC - in response to Message 9217.

I have had several of these, works out to about the same points per hours as the shorter ones.

Upload is over 50 meg though. Completely shuts down my DSL until finished uploading.....completely.

First one I THINK I have had. Did not go back to exhaustively check.

Seems to mess up the DCF a little bit, not sure why ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9249 - Posted: 3 May 2009 | 12:02:15 UTC

See here.

Don't know about the warning, though.. it appears to be quite common. I guess it's related to the new amber field method.

MrS
____________
Scanning for our furry friends since Jan 2002

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9262 - Posted: 3 May 2009 | 19:45:29 UTC

OMG i got a huge one .... i thought only the big guns would get them lolz.
I wasn't paying attention what gpugrid is doing
Guess i was wrong i wonder if i can finish it in time i got a kashif_hivpr one which has done 12:30 of work and still show more then 26 hours to go
if i am right you guys did this one in 13 hours.
But you guys do a normal unit in 3 hours which costs me about 20 should i abort it ?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9263 - Posted: 3 May 2009 | 19:56:00 UTC - in response to Message 9262.

It's not 3h, more like a good 5h for the fastest cards. How much of that WU have you done after 12.5h? If it's ~30%, as BOINC thinks, you could just as well spend the other 24h. If your machine runs more tzhan 8h a days you should be able to make it before the deadline.

MrS
____________
Scanning for our furry friends since Jan 2002

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9265 - Posted: 3 May 2009 | 20:20:02 UTC
Last modified: 3 May 2009 | 20:22:53 UTC

27% done after 13 hours now so i try to see if i can finish it.
My machine runs 24/7 playing server :D, i see all others had error out this unit

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9269 - Posted: 3 May 2009 | 21:40:47 UTC - in response to Message 9265.

well i don't have to worry anymore it suddenly errored out like with all the others 13 hours gone.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9271 - Posted: 3 May 2009 | 22:01:14 UTC

Ugh...

On 5 of my 6 total cores running tasks the normal speed to run a task is about 6.5 hours. There are the shorter ones which obviously would take a little less time. The KASHIF models seem to take 13 some hours to run.

The only good news is that the pay is comparable or maybe a little better on a hourly basis.

It is a shame that you got one that errored out on you. So far, I have not seen one die yet.

I have to admit that I do prefer tasks that run faster than 6 hours per, as a matter of fact I dearly love those that are 1 hour or less in that if they fail you don't lose much.

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9277 - Posted: 3 May 2009 | 22:41:56 UTC
Last modified: 3 May 2009 | 22:45:12 UTC

The new one i got errored out as well but much sooner Unit2 with - exit code 98 (0x62)

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9290 - Posted: 4 May 2009 | 9:15:16 UTC - in response to Message 9210.

I have no idea what this means or if it is an error or an expected outcome I am only noting it here so that the those of us that might see this happen again will be forewarned and hopefully GDF or my friend ignasi will admit that it is all his fault again ... :)


It is nobody's fault.
A new feature has been implemented into the scientific application (ACEMD) which permits the usage of a different force-field called Amber.
This Warning message is nothing but this, a warning. It doesn't affect the output.

thanks for caring,
ignasi

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9305 - Posted: 4 May 2009 | 18:51:53 UTC - in response to Message 9290.

I have no idea what this means or if it is an error or an expected outcome I am only noting it here so that the those of us that might see this happen again will be forewarned and hopefully GDF or my friend ignasi will admit that it is all his fault again ... :)


It is nobody's fault.
A new feature has been implemented into the scientific application (ACEMD) which permits the usage of a different force-field called Amber.
This Warning message is nothing but this, a warning. It doesn't affect the output.

thanks for caring,
ignasi

I was only teasing ... :)

I just saw something unusual and reported it like a good boy ...

And I could not resist the tease ... sorry ... I will be a good boy now.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9313 - Posted: 4 May 2009 | 20:57:44 UTC - in response to Message 9305.
Last modified: 4 May 2009 | 20:58:19 UTC

If I could clock my GTX295 higher (or the tasks get minutely smaller) it would be great to have tasks complete in 6 or 12 hours as that makes a nice WU per Day calc.

The best I have clocked my shaders sucessfully is 1554 at stock v (have not tried higher yet) but due to my own meddling with CPU OC, driver updates, boinc update, etc. I have caused many compute errors lately (I apologize to all the project team for that) and have vowed not to change anything for a while ... it is just so temping ... up the shader ... grrr ... leaving at stock ... 1274 for now :-(

Steve

Profile X1900AIW
Send message
Joined: 12 Sep 08
Posts: 74
Credit: 23,566,124
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9318 - Posted: 5 May 2009 | 4:29:51 UTC - in response to Message 9313.

The best I have clocked my shaders sucessfully is 1554 at stock v ... 1274 for now :-(

I thought, there are discrete clock rates (+54MHz): 1242 - 1296 - 1350 - 1404 - 1458 - 1512 - 1566 - 1620 and so on. Test it with Rivatuner.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9338 - Posted: 5 May 2009 | 17:12:06 UTC - in response to Message 9318.

I have an EVGA card and am using their "Precision" utility which is actually quite nice and very easy to use. When I click the "Reset All" button the shader goes to 1274. I can either manually key in numbers (it will accept anything) or I can push a slider along which moves in increments of either 7 or 9 (I am at work and don't remember exactly what the step sizes are).

Are you saying that the driver will take the settings I enter and internally adjust them into the appropraite +54 step bucket? Do you know at what point it decides to go up or down? What I mean is if I set it to 1554 is it being adjusted to 1512 or 1566? I have tested in the past with OCCT but it really suggests that you test with SLI on but that is not how we crunch so I have always wondered how applicable that test is to my crunching configuration. I made a guess that if I was OK in OCCT using SLI that I would be even more stable when dropping SLI (assumption being that there is some overhead needed to make SLI work). I will take a look at Rivatuner tonight.

Steve

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9349 - Posted: 5 May 2009 | 21:33:37 UTC - in response to Message 9338.

Are you saying that the driver will take the settings I enter and internally adjust them into the appropraite +54 step bucket?

Exactly. The driver will accept any setting and will adjust the real clock to multiples of 54.. as soon as you're not watching.


Do you know at what point it decides to go up or down? What I mean is if I set it to 1554 is it being adjusted to 1512 or 1566?

Tell us if you find out! I would have supposed that 1554 already means upclocking, but if you were stable at 1554 and not at e.g. 1570 the actual clock would seem to be 1512.


MrS
____________
Scanning for our furry friends since Jan 2002

Profile X1900AIW
Send message
Joined: 12 Sep 08
Posts: 74
Credit: 23,566,124
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9372 - Posted: 6 May 2009 | 10:48:32 UTC - in response to Message 9210.

First big workunit , took 12+ hours, it was the missing part in the variety of my task list and the last proof to get some reliability for my overclocking settings, I found bios flashing (inclusive fan adjustment) considerable more stable and comfortable than software tuning. 628175

Name 19-KASHIF_HIVPR_dim_ba2-6-100-RND7091_0
(...)
CPU time 1133.734
(...)
# Device 0: "GeForce GTX 260"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
# Time per step: 43.619 ms
# Approximate elapsed time for entire WU: 43618.990 s
(...)
Claimed credit 8076
Granted credit 10096

Post to thread

Message boards : Graphics cards (GPUs) : 13 hour task, 10K award (Yippeee)

//