Advanced search

Message boards : Number crunching : To be or not to be...

Author Message
tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37405 - Posted: 25 Jul 2014 | 14:46:02 UTC

Here’s a funny one. Just had a storm pass through, which tripped the electric power to the house.

When I restarted BOINC, one GPUGrid WU gave a computation error. The other WU picked up where it was interrupted, six hours into processing, but the progress counter restarted from zero!



Should I let it continue, or will it fail?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,191,046,966
RAC: 10,543,257
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37406 - Posted: 25 Jul 2014 | 15:31:08 UTC - in response to Message 37405.
Last modified: 25 Jul 2014 | 15:31:54 UTC

Continue, it should finish ok. This happened to me a few times.

As far as your failed unit that happened to me too. I wish these units were a little more robust.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37409 - Posted: 25 Jul 2014 | 16:25:34 UTC - in response to Message 37406.

Thanks for that, Bedrich. That WU continues...

Me too on WU robustness. I've had lots fail after an electricity interruption...

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,191,046,966
RAC: 10,543,257
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38034 - Posted: 23 Sep 2014 | 23:01:18 UTC

On occasions, when the computer is abruptly shut down, and windows doesn't shutdown properly. The next time, the computer boots up, otherwise perfectly good units, which would have successfully completed, crash, with the following error:


Stderr output
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -52 (0xffffffcc)
</message>
<stderr_txt>
# GPU [GeForce GTX 690] Platform [Windows] Rev [3301M] VERSION [60]
# SWAN Device 1 :
# Name : GeForce GTX 690
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:04:00.0
# Device clock : 1019MHz
# Memory clock : 3004MHz
# Memory width : 256bit
# Driver version : r337_00 : 33788
# GPU 0 : 57C
# GPU 1 : 65C
# GPU 2 : 66C
# GPU 3 : 66C
# GPU 0 : 59C
# GPU 1 : 67C
# GPU 3 : 68C
# GPU 0 : 60C
# GPU 2 : 67C
# GPU 0 : 61C
# GPU 0 : 62C
# GPU 0 : 63C
# GPU 0 : 64C
# GPU 1 : 68C
# GPU 3 : 69C
# GPU 0 : 65C
# GPU 2 : 68C
# GPU 3 : 70C
# GPU [GeForce GTX 690] Platform [Windows] Rev [3301M] VERSION [60]
# SWAN Device 0 :
# Name : GeForce GTX 690
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:08:00.0
# Device clock : 1019MHz
# Memory clock : 3004MHz
# Memory width : 256bit
# Driver version : r337_00 : 33788
SWAN : FATAL Unable to load module .mshake_kernel.cu. (702)

</stderr_txt>
]]>


My request is, can we make these units not crash, when this happens?

This happens, on my machines, more often in windows 7 then xp.



Post to thread

Message boards : Number crunching : To be or not to be...

//