Advanced search

Message boards : Graphics cards (GPUs) : Error units PAOLA

Author Message
Profile [AF>WildWildWest] Al Tarf
Send message
Joined: 22 Oct 10
Posts: 6
Credit: 10,043,483
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 26214 - Posted: 7 Jul 2012 | 22:08:40 UTC

For three days, all units go to error (paola), where does the problem? I specify that the units (IBUCH) calculated without problems!
Sorry for my English, I write it with google translation. thank you
My pc: Athlon II X3, GTX 550ti, ubuntu.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26216 - Posted: 7 Jul 2012 | 23:28:49 UTC

Ah im not the only one with this problem ^^ is it only with app 42 like me?
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile [AF>WildWildWest] Al Tarf
Send message
Joined: 22 Oct 10
Posts: 6
Credit: 10,043,483
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 26218 - Posted: 8 Jul 2012 | 9:06:36 UTC - in response to Message 26216.
Last modified: 8 Jul 2012 | 9:08:42 UTC

Hi, yes, cuda 42 PAOLA. My graphics card is compatible with CUDA 4.2, I do not understand?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26219 - Posted: 8 Jul 2012 | 11:20:55 UTC - in response to Message 26218.
Last modified: 8 Jul 2012 | 11:23:56 UTC

I've checked the error message of these 2HDQ_43_9-PAOLA_2HDQ workunits in their stderr outpuf file, and I've find out that every one of them on every host I've checked resulted in ERROR: Failed to parse input file. So I've came to the conclusion that the source of this error is not on your side. This batch of 2HDQ_43_9-PAOLA_2HDQ workunits are messed up somehow.

Profile [AF>WildWildWest] Al Tarf
Send message
Joined: 22 Oct 10
Posts: 6
Credit: 10,043,483
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 26220 - Posted: 8 Jul 2012 | 11:25:49 UTC - in response to Message 26219.

Oh okay, thank you for your answer!

Profile [AF>WildWildWest] Al Tarf
Send message
Joined: 22 Oct 10
Posts: 6
Credit: 10,043,483
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 26221 - Posted: 8 Jul 2012 | 13:47:24 UTC - in response to Message 26220.

it looks like it arranges, I have a unit (PAOLA) that runs without error for 27 minutes.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26222 - Posted: 8 Jul 2012 | 15:47:41 UTC - in response to Message 26221.

it looks like it arranges, I have a unit (PAOLA) that runs without error for 27 minutes.

This workunit comes from a different batch: 3EKO_8_10-PAOLA_3EKO_8LIGs

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26228 - Posted: 8 Jul 2012 | 18:31:25 UTC - in response to Message 26222.

There's definitely problems with the PAOLA_2HDQ batch. These tasks all fail on all cards no matter which app is used (3.1 or 4.2). Fortunately they all fail after a few seconds:

    Stderr output

    <core_client_version>7.0.24</core_client_version>
    <![CDATA[
    <message>
    process exited with code 98 (0x62, -158)
    </message>
    <stderr_txt>
    ERROR: file mdsim.cpp line 167: Failed to parse input file
    20:12:31 (5458): called boinc_finish

    </stderr_txt>
    ]]>


http://www.gpugrid.net/workunit.php?wuid=3543365

name 2HDQ_32_2-PAOLA_2HDQ-0-100-RND5005
application ACEMD2: GPU molecular dynamics
created 4 Jul 2012 | 20:41:18 UTC
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
errors Too many errors (may have bug)

5571497 103419 4 Jul 2012 | 21:52:29 UTC 4 Jul 2012 | 21:58:43 UTC Error while computing 2.20 0.05 --- ACEMD2: GPU molecular dynamics v6.16 (cuda31)
5572053 111371 4 Jul 2012 | 23:30:37 UTC 5 Jul 2012 | 1:35:58 UTC Error while computing 2.20 0.12 --- ACEMD2: GPU molecular dynamics v6.16 (cuda42)
5572983 58172 5 Jul 2012 | 3:42:30 UTC 5 Jul 2012 | 11:00:52 UTC Error while computing 3.02 0.03 --- ACEMD2: GPU molecular dynamics v6.16 (cuda42)
5575434 95154 5 Jul 2012 | 13:01:46 UTC 5 Jul 2012 | 13:08:38 UTC Error while computing 2.55 0.12 --- ACEMD2: GPU molecular dynamics v6.16 (cuda31)
5575986 122075 5 Jul 2012 | 14:20:47 UTC 5 Jul 2012 | 19:13:43 UTC Error while computing 2.02 0.04 --- ACEMD2: GPU molecular dynamics v6.16 (cuda42)
5577441 128121 5 Jul 2012 | 23:15:57 UTC 5 Jul 2012 | 23:22:37 UTC Error while computing 2.30 0.11 --- ACEMD2: GPU molecular dynamics v6.16 (cuda31)
5578198 113979 6 Jul 2012 | 1:41:12 UTC 6 Jul 2012 | 1:48:00 UTC Error while computing 0.00 0.00 --- ACEMD2: GPU molecular dynamics v6.16 (cuda42)
5578603 24523 6 Jul 2012 | 6:10:45 UTC 6 Jul 2012 | 11:21:31 UTC Error while computing 43.44 0.25 --- ACEMD2: GPU molecular dynamics v6.16 (cuda42)


Thanks,
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Paola Bisignano
Send message
Joined: 9 May 12
Posts: 16
Credit: 8,100
RAC: 0
Level

Scientific publications
wat
Message 26245 - Posted: 9 Jul 2012 | 10:01:51 UTC - in response to Message 26228.

Hi dear volunteers,

There was such a stupid error in the input file :p, I fixed it and now I am going to submit again the system (500 WU) on acemd short (12500 credits for WU), the group is 2HDQbis.

thanks for your patience and your computing time :D

Paola

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26269 - Posted: 10 Jul 2012 | 14:01:20 UTC

This is not a parse file error, but this WU has failed on 2 other rigs. One was with 295 so maybe their end, but the other was with a 470 301.42 driver.

3EKO_36_9_step_13_20_7-PAOLA_ADAPT-10-20-RND9221


<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59


Cheers

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26307 - Posted: 13 Jul 2012 | 5:38:00 UTC

There are still so much errors that wasting energy on resent failed workunits :(
____________
DSKAG Austria Research Team: http://www.research.dskag.at



5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26308 - Posted: 13 Jul 2012 | 6:02:59 UTC

Gotta take the good with the bad. This is the only WU that has crashed for me.

Things happen.

Cheers.

Profile Raptures Riot
Send message
Joined: 30 Apr 11
Posts: 6
Credit: 220,588,795
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26316 - Posted: 14 Jul 2012 | 6:50:59 UTC - in response to Message 26308.

Agreed to take the bad with the good. Just had a 1E2I fail after 7 hours. I get occassional driver crashes and permanently frozen displays with the 4.2's. All 4.2's seem 'touch and go' regardless of the author. Any further refinements would be much appreciated.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26317 - Posted: 14 Jul 2012 | 8:01:07 UTC - in response to Message 26316.
Last modified: 14 Jul 2012 | 8:10:15 UTC

Try this post or this post, among others, in the "FAQ - Why does my run fail? Some answers."
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile [AF>WildWildWest] Al Tarf
Send message
Joined: 22 Oct 10
Posts: 6
Credit: 10,043,483
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 26517 - Posted: 30 Jul 2012 | 16:13:56 UTC

I have error with new PAOLA units "1H46 RNP" ?

Speedy
Send message
Joined: 19 Aug 07
Posts: 42
Credit: 28,391,082
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 26557 - Posted: 8 Aug 2012 | 4:58:48 UTC

I had 1H46_19_9-PAOLA_1H46_RNP-6-100-RND5172_1 fail with

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

Ran for 19.29 minutes

Profile [AF>EDLS]GuL
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 160,687,223
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26616 - Posted: 16 Aug 2012 | 6:46:58 UTC

Hello,
Congratulations to all the team for your job.

All actual units (PAOLA) are exiting with error code 247, see 5747509 for instance.

Stderr output

<core_client_version>7.0.29</core_client_version>
<![CDATA[
<message>
process exited with code 247 (0xf7, -9)
</message>
<stderr_txt>

</stderr_txt>
]]>


All this units are using cuda42. The only cuda31 unit I got was going fine until I've done a mistake, using the gpu at the same time.

My card is a GTX260, on a freshly installed fedora 17 system, with NVIDIA driver 304.32 and cuda Toolkit 4.2.9. I have followed the procedure at http://doc.fedora-fr.org/wiki/Cuda. The GTK toolkit is working fine and primegrid also.

What are this errors due to ?
Thank you for your help

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 190
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26618 - Posted: 16 Aug 2012 | 8:05:19 UTC

200 series cards do not work with cuda 4.2 tasks under Linux!

Profile [AF>EDLS]GuL
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 160,687,223
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26619 - Posted: 16 Aug 2012 | 9:41:37 UTC

Ok, thanks for the answer. It this case is there a way to have only cuda31 units ?
Cheers

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26623 - Posted: 16 Aug 2012 | 11:44:16 UTC - in response to Message 26619.

Use an older driver, to prevent getting the CUDA4.2 app:
Uninstall the present driver completely. Find something pre-CUDA 4.2 (or whatever the dll's actually are). Probably around 265 to 285 should be good. Install these. Reset project and you should only get the 3.1app and thus 3.1tasks.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile [AF>EDLS]GuL
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 160,687,223
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26631 - Posted: 17 Aug 2012 | 10:58:11 UTC

Thanks for the advice, I will try this. Is it the same with all cuda42 applications or specific to this project ? I know that GTX 2XX are only 1.3 capable but I thought it was backward compatible

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26693 - Posted: 25 Aug 2012 | 11:08:01 UTC - in response to Message 26631.

It's only this project. In principle it's backwards compatible, but GPU-Grid is facing some bug in the new app, probalby in some nVidia library, which only affects the older cards.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Error units PAOLA

//