Advanced search

Message boards : Number crunching : Strange computation error

Author Message
5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24470 - Posted: 18 Apr 2012 | 3:40:02 UTC
Last modified: 18 Apr 2012 | 3:52:59 UTC

I4R96-NATHAN_FAX4-16-100-RND2329_1 is the WU

core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 570"
# Clock rate: 1.46 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 15
# Number of cores: 120
SIGABRT: abort called
Stack trace (13 frames):
../../projects/www.gpugrid.net/acemd.linux64.2352(boinc_catch_signal+0x4d)[0x482bed]
/lib/x86_64-linux-gnu/libc.so.6(+0x36420)[0x7ff0ec84c420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7ff0ec84c3a5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7ff0ec84fb0b]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4935db]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x434dd0]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4312d6]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4309e7]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x414ef9]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407c9a]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x40857e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7ff0ec83730d]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407a19]

One before me failed on it as well with this:

The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)


Make that 2 in a row, on a machine that has never had computation error:

I13R21-NATHAN_CB1_1-117-125-RND3121_0 is WU

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 570"
# Clock rate: 1.46 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 15
# Number of cores: 120
SIGABRT: abort called
Stack trace (13 frames):
../../projects/www.gpugrid.net/acemd.linux64.2352(boinc_catch_signal+0x4d)[0x482bed]
/lib/x86_64-linux-gnu/libc.so.6(+0x36420)[0x7fbb9b623420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fbb9b6233a5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fbb9b626b0b]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4935db]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x434dd0]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4312d6]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4309e7]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x414ef9]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407c9a]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x40857e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fbb9b60e30d]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407a19]

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24471 - Posted: 18 Apr 2012 | 3:54:32 UTC
Last modified: 18 Apr 2012 | 4:25:02 UTC

I2R58-NATHAN_FAX4-21-100-RND1777 & I2R42-NATHAN_FAX4-20-100-RND7945_0 as well,

SKGIVEN if you happen to catch this, I saw you had a similiar problem, was it on your end?

I have suspended work on this project for this GPU until further notice, and am currently running Einstein to see if GPU fails, and so far it has not.

Sorry for not posting info. 295.33 driver, got update manager telling me to switch to 295.40, should I? And this card is currently not OC

EDIT: One more thing, I am not a Linux user by nature, even though I love it speed on WU for both GPUgrid and WCG, however currently this is a dual boot machine, and it would do me no harm to switch back to W7, as I already own it. If this is the simplest solution, than by no means hesitate to say so. I do not know if this is on me or GPUgrid, but I hate to suspend this project.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24480 - Posted: 18 Apr 2012 | 14:40:31 UTC

I just unistalled everything and switched back to Windows. Up and running again on GPUgrid, so whatever it was, I don't care anymore.

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 24483 - Posted: 19 Apr 2012 | 12:45:00 UTC

Glad you're up and running again. Does look like the same error.

With regard to this error, it may be related to some problem with BOINC and Ubuntu/Debian distros. We'll see if the update mentioned fixes it.

See here: http://setiathome.berkeley.edu/forum_thread.php?id=67670

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24484 - Posted: 19 Apr 2012 | 13:04:48 UTC

When searching through Google, I too saw that the error was popping up on SETI, with very few people responding in terms of possible issues w/rig, fixes, etc. so as much as I love linux, lately it has been giving me more and more headaches and wasting precious crunching time.

Windows it is apparently (for me), but I'm glad it's running again here though. Also glad I caught it when it was only into 4 tasks, and didn't happen before I went to bed. Wouldn't want a ton of errors, and all that wasted bandwidth too.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24490 - Posted: 19 Apr 2012 | 21:35:11 UTC - in response to Message 24484.

My rig's now behaving itself wrt Nate's tasks =)

As for why it wasn't, there are two possibilities:
1. Updates I installed messed with things, and subsequent updates fixed things - at which point I stopped updating.
2. Client issue, possibly triggered by having too many queued tasks or too high a cache of CPU tasks. This is more likely as the other crunchers affected crunch for similar projects as I do.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Number crunching : Strange computation error

//