Advanced search

Message boards : News : Old Noelia WUs

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 29045 - Posted: 7 Mar 2013 | 13:59:37 UTC

We have checked the error statistics and they are too high to be normal, so we are going to abort them.

They work perfectly over here, so it's not clear what is the problem. We might need to run few in beta to try to understand it.

gdf

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29046 - Posted: 7 Mar 2013 | 14:21:55 UTC

I have put a whole bunch of simulations to replace the cancelled ones on long. It is a system I have been meaning to run more simulations to get more statistics. They should pose no problems, but I'll be keeping an eye on them. As always, please let us know if the case is otherwise.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29048 - Posted: 7 Mar 2013 | 14:52:44 UTC

Thank you guys for the very fast intervention. I have 13 of the new NATHAN units processed half way so far, and they are all running just perfectly. Will let you know in exactly 2 hours if they all end succesfull in here.

Profile algabe
Send message
Joined: 23 May 10
Posts: 9
Credit: 464,820,292
RAC: 353,148
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29049 - Posted: 7 Mar 2013 | 16:04:28 UTC - in response to Message 29048.
Last modified: 7 Mar 2013 | 16:21:08 UTC

I am very disappointed with this project lately, yesterday NOELIA two units of 8 hours each processed with faulty execution, today NOELIA other two units with 10 hours each processed aborted by the user, this is unacceptable and very little serious.
Now I am processing nathan two units, if these errors persist with these units, much lamentadolo definitely fails to process in this project, I'm paying a lot of money on electricity bills with the crisis there, that is useless after this effort.


Greetings.
____________

Profile microchip
Send message
Joined: 4 Sep 11
Posts: 107
Credit: 187,573,639
RAC: 160,664
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29050 - Posted: 7 Mar 2013 | 16:09:24 UTC - in response to Message 29049.

I am very disappointed with this project lately, yesterday NOELIA two units of 8 hours each processed with faulty execution, today NOELIA other two units with 10 hours each processed aborted by the user, this is unacceptable and very little serious.
Now I am processing nathan two units, if these errors persist with these units, much lamentadolo definitely fails to process in this project, I'm paying a lot of money on electricity bills with the crisis there, that is useless after this effort.


Greetings.


I agree with you. Lots of WU problems recently, even on the short queue. I don't know if I'll continue to support this project if these problems keep going.
____________

Team Belgium
The Cyberpunk Movies Database

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29052 - Posted: 7 Mar 2013 | 17:05:35 UTC

The new NATHAN units are processing just fine. A small 23,88mb result, 70800 credits, very good ones.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,427,941,620
RAC: 1,315,908
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29053 - Posted: 7 Mar 2013 | 17:24:34 UTC

Yes, I've just reported my first completed one - task 6588199 - on the same host, same settings, same session (no reboot) as the one which failed a Noelia this morning.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29054 - Posted: 7 Mar 2013 | 17:35:25 UTC

All my first 13 units where processed without any issue. Second batch is processing. I say we are back on business, i´m glad.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29061 - Posted: 7 Mar 2013 | 21:43:36 UTC - in response to Message 29046.
Last modified: 7 Mar 2013 | 21:44:06 UTC

I have put a whole bunch of simulations to replace the cancelled ones on long. It is a system I have been meaning to run more simulations to get more statistics. They should pose no problems, but I'll be keeping an eye on them. As always, please let us know if the case is otherwise.

Thanks Nate, want to mention that these NATHAN WUs are running great on my 4 GTX 460 768mb cards too so I think you've done some honing. Thanks again, appreciate it.

mhhall
Send message
Joined: 21 Mar 10
Posts: 22
Credit: 861,205,598
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29065 - Posted: 7 Mar 2013 | 22:12:20 UTC

Been seeing multiple cases where new WU are not checkpointing
(or showing progress in BOINC Manager). Have aborted a couple
without improvement. Running on client v7.0.27 under linux x86.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29066 - Posted: 7 Mar 2013 | 22:21:56 UTC

Yummy, there are some TONYs on the pipe aswell... tasty wu´s, gimme gimme....crunch them all and gladly pay the energy bills when all works like a charm....

Profile Bikermatt
Send message
Joined: 8 Apr 10
Posts: 36
Credit: 1,816,017,490
RAC: 681,274
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29070 - Posted: 8 Mar 2013 | 2:33:50 UTC - in response to Message 29065.

Been seeing multiple cases where new WU are not checkpointing
(or showing progress in BOINC Manager). Have aborted a couple
without improvement. Running on client v7.0.27 under linux x86.


Yes I am still having problems in linux also. I thought it was just the NOELIAs but the new NATHANs are doing it also. The tasks will lock up or remain at 0% and the system has to be rebooted for the GPU to work again on any project. Might be the new app, my linux sytems had not needed a reboot in months before this.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 29072 - Posted: 8 Mar 2013 | 7:25:17 UTC

I switched to short ones only and recently got a short NOELIA task which after 24 hours was stuck at 0%. Too bad I was away from the machine and realized it too late.
So the short ones are problematic as well.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29077 - Posted: 8 Mar 2013 | 16:27:43 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.
____________
Greetings from TJ

Profile microchip
Send message
Joined: 4 Sep 11
Posts: 107
Credit: 187,573,639
RAC: 160,664
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29079 - Posted: 8 Mar 2013 | 17:54:35 UTC - in response to Message 29077.
Last modified: 8 Mar 2013 | 17:55:13 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.


Yup, crunch "fine" on my GTX 560. I noticed though that they often crash the NV driver and they also show CUDA errors when looking at task details, but they complete fine here and I get valid results
____________

Team Belgium
The Cyberpunk Movies Database

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 29080 - Posted: 8 Mar 2013 | 18:40:01 UTC
Last modified: 8 Mar 2013 | 18:42:29 UTC

They did run well for few days, but this one:
http://www.gpugrid.net/result.php?resultid=6583089
was stuck, so I tried to relaunch it and after that I had to abort it.
Other computers returned error with this WU too.

Wiyosaya
Send message
Joined: 22 Nov 09
Posts: 111
Credit: 172,459,353
RAC: 309,153
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29088 - Posted: 9 Mar 2013 | 3:46:23 UTC - in response to Message 29079.
Last modified: 9 Mar 2013 | 3:48:36 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.


Yup, crunch "fine" on my GTX 560. I noticed though that they often crash the NV driver and they also show CUDA errors when looking at task details, but they complete fine here and I get valid results

Long queue Noelia's run fine on my GTX 460 and my GTX 580 both running 310.70 driver.
____________

Profile microchip
Send message
Joined: 4 Sep 11
Posts: 107
Credit: 187,573,639
RAC: 160,664
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29094 - Posted: 9 Mar 2013 | 13:52:28 UTC - in response to Message 29088.
Last modified: 9 Mar 2013 | 13:52:54 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.


Yup, crunch "fine" on my GTX 560. I noticed though that they often crash the NV driver and they also show CUDA errors when looking at task details, but they complete fine here and I get valid results

Long queue Noelia's run fine on my GTX 460 and my GTX 580 both running 310.70 driver.


Well, I've disabled long one for the time being as the last 2 long WUs I crunched errored out, so I'm crunching only short ones at the moment. I mostly get short NOELIA and so far, so good. I'm able to report valids
____________

Team Belgium
The Cyberpunk Movies Database

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 178
Credit: 132,357,411
RAC: 1,373
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 29096 - Posted: 9 Mar 2013 | 15:06:24 UTC
Last modified: 9 Mar 2013 | 15:07:49 UTC

All short NOELIA runs from me with my two GTX 650 Ti GPUs, too: no problems.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29100 - Posted: 9 Mar 2013 | 21:45:02 UTC - in response to Message 29046.
Last modified: 9 Mar 2013 | 21:45:38 UTC

I have put a whole bunch of simulations to replace the cancelled ones on long. It is a system I have been meaning to run more simulations to get more statistics. They should pose no problems, but I'll be keeping an eye on them. As always, please let us know if the case is otherwise.



These ran smoothly until yesterday evening when one failed:

http://www.gpugrid.net/result.php?resultid=6595932

It was the same thing that was happening with the last bunch of TONI units.

Today, I had an adventure with this unit:

http://www.gpugrid.net/result.php?resultid=6600488

It finished successfully, but barely. When it was about 25% done, I got an error message saying that acemd.2865.exe had failed, and the unit wasn't crunching, so I suspended it before I got computation error in boinc manager, the video card's speed and setting were reset (slower speed), and I rebooted the computer, resumed the unit, and it continued to crunch. At 90%+ completion, the computer froze, so I had to unplug it, and restart. It finished successfully!
But the subsequent unit refuse to start crunching and the video card speed and settings were reset to a slower speed. I had to suspend that unit and reboot. It is running okay right now, and hopefully it won't crash.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29108 - Posted: 10 Mar 2013 | 23:37:34 UTC

True. Not 100%, but doable.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29109 - Posted: 10 Mar 2013 | 23:38:23 UTC
Last modified: 10 Mar 2013 | 23:38:41 UTC

If I babbyseat the machines, I mean.... I will be traveling in two days, then the worst is expected.

Dylan
Send message
Joined: 16 Jul 12
Posts: 91
Credit: 169,575,912
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwat
Message 29110 - Posted: 11 Mar 2013 | 0:29:01 UTC - in response to Message 29109.
Last modified: 11 Mar 2013 | 0:29:08 UTC

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29111 - Posted: 11 Mar 2013 | 1:06:02 UTC - in response to Message 29110.

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 29113 - Posted: 11 Mar 2013 | 8:43:15 UTC

Now I got more problems even with short Noelia tasks. They were stuck, caused errors or app crash. A reboot was needed to start a new GPU task.
I have ordered a new GPU for GPUGrid, but I think I'll suspend this whole project (and switch to another one) until these problems are solved.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 360
Credit: 251,941,635
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29114 - Posted: 11 Mar 2013 | 9:08:08 UTC

Same here, got 5 Box's running the shorter ones & I think all 5 are hung Wu's right no, one at 37 Hr's ...
____________
STE\/E

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 178
Credit: 132,357,411
RAC: 1,373
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 29115 - Posted: 11 Mar 2013 | 10:43:09 UTC
Last modified: 11 Mar 2013 | 10:47:18 UTC

No problems with short NOELIA tasks. I have not attempted any long NOELIAs for about a week.

PC #1 AMD 1090T with Acer GTX 650 Ti
PC #2 AMD A10 5800K with Acer GTX 650 Ti
____________
John

Ken_g6
Send message
Joined: 6 Aug 11
Posts: 7
Credit: 12,298,627
RAC: 1,357
Level
Pro
Scientific publications
watwatwatwatwat
Message 29116 - Posted: 11 Mar 2013 | 16:55:18 UTC

Short Noelias were going fine, until I had to abort this one, which was restarting repeatedly with error:

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29122 - Posted: 12 Mar 2013 | 11:24:18 UTC
Last modified: 12 Mar 2013 | 11:34:37 UTC

226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one!


After running flawlessly, I got a few units with this error, on the latest set of betas.

http://www.gpugrid.net/result.php?resultid=6611952

http://www.gpugrid.net/result.php?resultid=6610530

http://www.gpugrid.net/result.php?resultid=6610707

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29123 - Posted: 12 Mar 2013 | 11:47:08 UTC

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.


It looks like most of the major errors are gone (severe error % is good), but this one does seem to be occurring more frequently than we would like. We'll see if we can find a cause.

cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29124 - Posted: 12 Mar 2013 | 12:28:12 UTC - in response to Message 29123.

dmesg from the beta WU's

[400033.132826] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400049.637834] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400054.854423] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400066.358868] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400082.863901] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400099.368878] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400115.873938] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400119.305177] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400133.382624] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400136.664677] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400149.890962] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400166.399277] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400182.904290] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400198.412211] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400215.917612] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400220.224939] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400244.929342] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400260.437256] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400276.942267] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400293.450605] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400308.955195] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400325.463524] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400341.968561] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400358.476864] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400369.667884] NVRM: Xid (0000:01:00): 13, 0001 00000000 000090c0 00001b0c 00000000 00000000
[400382.174156] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400397.678751] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400414.183758] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400430.692078] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400446.196682] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400461.704604] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400464.387651] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400484.212040] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400500.218499] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400516.723568] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400533.231872] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400535.747891] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400555.739274] NVRM: Xid (0000:01:00): 8, Channel 00000001
[401174.487665] NVRM: Xid (0000:01:00): 8, Channel 00000001
[401189.992293] NVRM: Xid (0000:01:00): 8, Channel 00000001

I suspect I will have to reboot to recover from these.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29125 - Posted: 12 Mar 2013 | 12:30:39 UTC - in response to Message 29122.

226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one!


After running flawlessly, I got a few units with this error, on the latest set of betas.

http://www.gpugrid.net/result.php?resultid=6611952

http://www.gpugrid.net/result.php?resultid=6610530

http://www.gpugrid.net/result.php?resultid=6610707


Is it my imagination or did you change the error message these units?


cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29126 - Posted: 12 Mar 2013 | 12:38:32 UTC - in response to Message 29124.

Verified the beta WU's hang the GPU in some manner. rmmoding nvidia and modprobing nvidia does not resolve. The system must be rebooted to recover from whatever the WU is causing. On Nvidia 313.26.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1068
Credit: 1,151,203,264
RAC: 1,008,903
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29127 - Posted: 12 Mar 2013 | 13:28:05 UTC
Last modified: 12 Mar 2013 | 13:38:17 UTC

I wanted to chime in to say I just had 12 NOELIA tasks fail hard on the "ACEMD beta version v6.49 (cuda42)" app, using Windows 8 Pro x64, BOINC v7.0.55 x64 beta, nVidia 314.14 beta drivers, GTX 660 Ti (which usually works on GPUGRID) and GTX 460 (which usually works on World Community Grid)

The tasks resulted in "Driver stopped responding" errors, and Windows restarted the drivers to recover. But the failures also appear to have caused other GPUs (which were working on entirely different projects, like World Community Grid)... to also fail.

I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app,
Jacob

================================================
PS: The 12 that failed were:

063ppx43-NOELIA_063pp_equ-1-2-RND4865_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

148px44-NOELIA_148p_equ-1-2-RND1140_2
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

216px20-NOELIA_216p_equ-1-2-RND7557_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

041px45-NOELIA_041p_equ-1-2-RND6478_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

041px33-NOELIA_041p_equ-1-2-RND8614_2
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

255px9-NOELIA_255p_equ-1-2-RND6395_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

063ppx29-NOELIA_063pp_equ-1-2-RND2517_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

148nx39-NOELIA_148n_equ-1-2-RND5760_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

063ppx16-NOELIA_063pp_equ-1-2-RND8732_1
The system cannot find the path specified.
(0x3) - exit code 3 (0x3)

063ppx18-NOELIA_063pp_equ-1-2-RND6787_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

109nx31-NOELIA_109n_equ-1-2-RND1501_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

148nx37-NOELIA_148n_equ-1-2-RND2228_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

ETQuestor
Send message
Joined: 11 Jul 09
Posts: 26
Credit: 778,992,282
RAC: 450,634
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29128 - Posted: 12 Mar 2013 | 14:53:25 UTC

These NOELIA acemdbeta WUs are all hanging for me. They get stuck at a "Current CPU Time" of between 1 and 5 seconds. I had to abort them.


http://www.gpugrid.net/result.php?resultid=6610160
http://www.gpugrid.net/result.php?resultid=6610894

http://www.gpugrid.net/show_host_detail.php?hostid=43352

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29135 - Posted: 12 Mar 2013 | 23:28:41 UTC
Last modified: 12 Mar 2013 | 23:34:32 UTC

On my system Vista 32bit, BOINC 6.10.58 nVidia 314.7 the latest Noelia beta errored out after more than 11 hours. It is this one:
http://www.gpugrid.net/workunit.php?wuid=4248935
____________
Greetings from TJ

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29137 - Posted: 13 Mar 2013 | 0:45:34 UTC - in response to Message 29127.


I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app


They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29138 - Posted: 13 Mar 2013 | 10:34:33 UTC - in response to Message 29137.


I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app


They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right?


I agree with you flashawk. We crunchers need to do the testing with all the different set-ups and platforms. Win8 is a pain indeed.

____________
Greetings from TJ

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29139 - Posted: 13 Mar 2013 | 10:49:48 UTC

Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.


We do test them locally, to the extent we can. Part of the issue is that running locally for us vs. running on BOINC are not comparable. We do have an in-house fake BOINC project, but even that isn't exactly comparable to sending to you users. Additionally, we have very limited ability to test on Windows. In the future we will improve there, but we have limited resources right now.

What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it.

Fred Bayliss
Send message
Joined: 27 May 11
Posts: 9
Credit: 255,985,614
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29140 - Posted: 13 Mar 2013 | 10:57:02 UTC - in response to Message 29139.

I'm running these om Win7 with GTX670 and often get a windows message Nvidia driver stopped working
Hope this helps.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29141 - Posted: 13 Mar 2013 | 11:01:13 UTC

The previous bunch of Noelia's beta's did good on my WinVista 32 bit pc with driver 314.7 BOINC 6.10.58. The batch from last days error out after hours with the message that the acemd driver stopped and has recovered from an unexpected error. I am now trying the long runs from Nathan on my GTX550Ti.
____________
Greetings from TJ

Oktan
Send message
Joined: 28 Mar 09
Posts: 16
Credit: 577,451,967
RAC: 218,998
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29142 - Posted: 13 Mar 2013 | 11:18:51 UTC - in response to Message 29139.

Hi there im having problems on my linux box havent been able to run any work at all for a 3-4 days..

Mvh/ Oktan

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1068
Credit: 1,151,203,264
RAC: 1,008,903
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29143 - Posted: 13 Mar 2013 | 11:22:00 UTC - in response to Message 29139.
Last modified: 13 Mar 2013 | 11:31:32 UTC

Thanks for the reply, Nate. I'm glad to hear that you guys are looking to improve the testability for Windows, even before issuing tasks on the Beta application to us Beta users.

Regarding your request for info, my previously mentioned NOELIA task failures are happening on Windows 8 Pro x64, using BOINC v7.0.55 x64 beta, running nVidia drivers 314.14 beta, using 2 video cards, GTX 660 Ti and GTX 460.

It appears to me that, when a GPUGrid task causes the nVidia driver to stop responding, Windows catches the error and restarts the driver (instead of BSOD), giving a Taskbar balloon to the effect of "The nVidia driver had a problem and has been restarted successfully." (I'm not sure of the exact text). When this happens, in addition to the GPUGrid task erroring out on my main video card, crunching on my other GPU (which is usually doing World Community Grid Help Conquer Cancer work) also results in its tasks erroring out.

I believe the next tasks that get processed after that driver recovery, are successful, unless another NOELIA task on the beta app causes an additional driver crash and recovery.

If you have any more resources to test these tasks out more, locally, it would save us a huge headache. I understand I signed up for these beta tasks, and I understand that seeing these errors is part of the gig, and so... If you find a way to replicate the error locally, then I'd politely ask that you also remove the bugged tasks from the beta queue. If you cannot yet reproduce the problem locally, then we'll keep erroring them for you, as part of our obligation.

Not sure if this much info helps, but that's the behavior I'm seeing on my Windows 8 x64 PC, and if you need anything more, feel free to ask.

Kind regards,
Jacob Klein

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,427,941,620
RAC: 1,315,908
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29144 - Posted: 13 Mar 2013 | 11:24:51 UTC - in response to Message 29139.

What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it.

I've just aborted one of your long run tasks which looked as if it was going bad - http://www.gpugrid.net/workunit.php?wuid=4246107 (replication _6 is always a bad sign).

The first cruncher to try it was running Linux.

Killer 69
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 3,624,425
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29146 - Posted: 13 Mar 2013 | 15:50:08 UTC

All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,427,941,620
RAC: 1,315,908
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29147 - Posted: 13 Mar 2013 | 16:03:24 UTC - in response to Message 29146.
Last modified: 13 Mar 2013 | 16:04:34 UTC

All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle.

Deselect

Run test applications?
This helps us develop applications, but may cause jobs to fail on your computer

as well.

Killer 69
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 3,624,425
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29150 - Posted: 13 Mar 2013 | 17:20:32 UTC - in response to Message 29147.

Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,427,941,620
RAC: 1,315,908
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29151 - Posted: 13 Mar 2013 | 17:29:49 UTC - in response to Message 29150.

Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment.

There aren't any short run tasks available today. Might you have had

If no work for selected applications is available, accept work from other applications?

selected as well?

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29154 - Posted: 13 Mar 2013 | 18:14:17 UTC - in response to Message 29151.

109nx33-NOELIA_109n_equ-1-2-RND6949_0 4248581 139265 12 Mar 2013 | 8:04:33 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 58,742.39 1.73 --- ACEMD beta version v6.49 (cuda42)

This Long WU hung after 16h on a W7 system with a GTX660Ti. The GPU sat at zero usage and the app stayed running/crashed preventing new work units from starting or a backup GPU project from running. It also prevented an additional CPU core from being used at a CPU project. Saw the usual cuda driver pop-up error.

Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

The next two WU's also failed:
148px38-NOELIA_148p_equ-1-2-RND3814_6 4249317 139265 13 Mar 2013 | 17:29:07 UTC 13 Mar 2013 | 17:32:46 UTC Error while computing 31.09 1.76 --- ACEMD beta version v6.49 (cuda42)
216px36-NOELIA_216p_equ-1-2-RND0721_0 4249016 139265 12 Mar 2013 | 9:48:07 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 12.60 1.81 --- ACEMD beta version v6.49 (cuda42)

I don't see the point in testing a WU 7 or more times, especially if it's one of a batch of hundreds.

Again, I suggest you start up an alpha project to test on properly - Beta testing shouldn't crash systems, hang drivers or banjax the OS!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Martin Aliger
Send message
Joined: 16 Jul 10
Posts: 7
Credit: 35,198,028
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29156 - Posted: 13 Mar 2013 | 20:07:36 UTC
Last modified: 13 Mar 2013 | 20:14:10 UTC

All of these betas failed on mine machine. Moreover, I opted out from beta, updated, but am still receiving them (and only them).

I also observed that mine W7 always restart driver few seconds after acemd application is killed.

And all those WUs failed immediatelly. If you see some times at statistics, that becouse there is crash messagebox onscreen which counts to time running. Sometimes its on screen for hours...

Dylan
Send message
Joined: 16 Jul 12
Posts: 91
Credit: 169,575,912
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwat
Message 29158 - Posted: 13 Mar 2013 | 20:18:33 UTC - in response to Message 29156.

Martin, did you follow this thread on how to completely opt out of beta tasks?


http://www.gpugrid.net/forum_thread.php?id=3272

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 29159 - Posted: 13 Mar 2013 | 20:20:05 UTC
Last modified: 13 Mar 2013 | 20:22:23 UTC

Seven in a row of the 6.49 ACEMD beta NOELIAs failed for me also, all in 8 seconds or less, so I am giving it a rest for now. That was on a Kepler GTX 650 Ti card, and I will try a Fermi GTX 560 tomorrow to see if that does any better. This is on Win7 64-bit, and BOINC 7.0.56 x64. Those cards have been basically error free for the last several days, since the last Noelia errors.

Tsukiouji
Send message
Joined: 6 Jan 13
Posts: 1
Credit: 1,548,050
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29164 - Posted: 14 Mar 2013 | 11:08:10 UTC

A few NOELIA WUs failed recently on my system too.
I'm running GTS450 (314.14, Win7 x64).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1068
Credit: 1,151,203,264
RAC: 1,008,903
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29165 - Posted: 14 Mar 2013 | 12:39:22 UTC - in response to Message 29164.

Tsukiouji,
When I clicked your link, I got a page that says "No access". For your account settings, in GPUGRID preferences, do you have "Should GPUGRID show your computers on its web site?" set to yes?

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 136
Credit: 589,133,518
RAC: 149,366
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29166 - Posted: 14 Mar 2013 | 14:49:35 UTC - in response to Message 29165.

The problem is at link. Filter set "http://www.gpugrid.net/results.php?userid=94436" can be set and results can be seen by the owner only. There is no problem with "host" filers, e.g. http://www.gpugrid.net/results.php?hostid=144019.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1068
Credit: 1,151,203,264
RAC: 1,008,903
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29170 - Posted: 14 Mar 2013 | 15:54:42 UTC - in response to Message 29166.

Thanks for the explanation -- I was able to find the user's tasks by clicking on their name, and looking at the tasks for the only computer. Link: http://www.gpugrid.net/results.php?hostid=144019

Anyway... The way I look at this issue is...

The project admins have already made a decision whether they want the beta testers to suffer through and to "process all these failures".

So, what I do is, look at the server status page here http://www.gpugrid.net/server_status.php ... and just keep praying that the "Unsent" tasks for the "ACEMD beta version" app goes down quicker.

Good news - it's pretty much exhausted - Now maybe my system will be stable again!

Martin Aliger
Send message
Joined: 16 Jul 10
Posts: 7
Credit: 35,198,028
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29174 - Posted: 15 Mar 2013 | 4:14:14 UTC - in response to Message 29158.

Martin, did you follow this thread on how to completely opt out of beta tasks?


http://www.gpugrid.net/forum_thread.php?id=3272


No, but I'm in all other queues, so there is (plenty of) other work.

But the problem is solved now. Admins cancelled existing beta tasks and no others are waiting. I'll opt in to beta again to help test on Win platform.

Profile AdamYusko
Send message
Joined: 29 Jun 12
Posts: 26
Credit: 21,540,800
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 29185 - Posted: 16 Mar 2013 | 23:53:19 UTC

Now I am not sure if this is an error with my machine it has been offline for a few weeks, or if it is due to a bug in the Noelia Tasks it got earlier today.

The errors both of them sent out had issues with the file: "restart.coor"

One had this output:

<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"


the other had a much shorter but similar output of:

<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"

____________

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29189 - Posted: 17 Mar 2013 | 12:36:08 UTC
Last modified: 17 Mar 2013 | 12:36:36 UTC

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.

idimitro
Send message
Joined: 25 Jun 12
Posts: 3
Credit: 47,912,263
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 29281 - Posted: 29 Mar 2013 | 13:01:30 UTC

After working for a few days the Nathan packages are also crashing the application and my driver.
I liked the cause of this project but basically I can not allow it to crush my computer and interrupt my work.
Asta la vista.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29287 - Posted: 30 Mar 2013 | 15:37:56 UTC - in response to Message 29189.
Last modified: 30 Mar 2013 | 15:39:08 UTC

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.


On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault.
BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause.

Typo edited*

Jorge Alberto Ramos Olive...
Send message
Joined: 13 Aug 09
Posts: 24
Credit: 156,684,745
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwat
Message 29303 - Posted: 31 Mar 2013 | 23:40:02 UTC - in response to Message 29287.

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.


On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault.
BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause.

Typo edited*


BSODs Strike Back!

I don't have my 690's OC'ed and my system crashed today with NATHAN units e.g. http://www.gpugrid.net/workunit.php?wuid=4313870 (I deactivated the project before error reports from this unit could be assembled, as the system BSODs first before BOINC notices it)

Have been working through them for a month or so without a BSOD, after experiencing the same crash reports seen elsewhere around here (e.g. .http://www.gpugrid.net/forum_thread.php?id=3308&nowrap=true#29090)

I will be crunching my backup project until this is fixed.

Profile Bikermatt
Send message
Joined: 8 Apr 10
Posts: 36
Credit: 1,816,017,490
RAC: 681,274
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29330 - Posted: 6 Apr 2013 | 3:18:06 UTC

I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29335 - Posted: 6 Apr 2013 | 7:36:26 UTC - in response to Message 29330.

I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again.


Exact same thing here, Windows XP Pro 64 bit. I had 3 NOELIA's come through, I caught one at 0% after 5 1/2 hours of crunching on a GTX680, GPU was at 99%, memory controller was at 0% along with the CPU usage for that GPU. The other 2 caused a 2685 error and one NOELIA hosed a CPDN work unit that I had over 250 hours on. I am not signed on to do beta testing, these came through the regular server (I also did a TONI without issue).

Interesting that they slipped them through like this, makes me feel like they don't trust us.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29338 - Posted: 6 Apr 2013 | 8:34:30 UTC - in response to Message 29335.

Interesting that they slipped them through like this, makes me feel like they don't trust us.

No, the way I understand it is that Noelia is testing new functionality, which had been added in the recent app update but wasn't used in previous WUs (except the infamous Noelias).

To me it looks like there's more alpha and beta testing needed here. And serious debugging.

MrS
____________
Scanning for our furry friends since Jan 2002

Trotador
Send message
Joined: 25 Mar 12
Posts: 83
Credit: 1,071,383,799
RAC: 149,994
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 29339 - Posted: 6 Apr 2013 | 8:45:23 UTC

Same here, this morning the machine (Ubuntu 64, 2x660GTIs) was hung, reboot to see that there was a Noelia stuck at 0%, wait to see if it progresses...no way...a couple of reboots more to finally abort and get back to normality.

Weekends are not the best moments for new trials imho.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29340 - Posted: 6 Apr 2013 | 9:02:35 UTC
Last modified: 6 Apr 2013 | 9:04:36 UTC

Well, I guess you're getting information through the moderators lounge, I seriously didn't see any post about those work units coming through or I would have been on the look out.

I guess I got a little complacent doing the NATHAN's for the last month. I just can't wrap my mind around the fact that she (NOELIA) always has problems with her work units and it's tough for anyone to figure out why.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29341 - Posted: 6 Apr 2013 | 9:02:42 UTC - in response to Message 29339.
Last modified: 6 Apr 2013 | 9:07:21 UTC

On 30th March I had a Short task sit for 18h before I spotted it doing nothing, 47x2-NOELIA_TRYP_0-2-3-RND8854_6 (6.52app). Since then I've had three Nathan tasks fail and one Noelia 148nx9xBIS-NOELIA_148n-1-2-RND8819_1 (all 6.18apps).

It bugs me too when tasks fail after 6h, run indefinitely or crash systems.

'moderators lounge' - ha!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29360 - Posted: 6 Apr 2013 | 22:14:52 UTC

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29362 - Posted: 6 Apr 2013 | 23:36:25 UTC - in response to Message 29360.

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.


Ya buddy, you got the touch. Maybe you can work you're magic on rebuilding the NOELIA's, you seem to have the "Right Stuff". I admit, I have no idea what goes into writing these wu's, Noelia must be doing something fundamentally different than the rest of the scientist's at GPUGRID. I'm hoping she'll get it right soon and this well all have been worth it.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 29363 - Posted: 7 Apr 2013 | 5:36:58 UTC

Please NO MORE NEW LONG NOELIA tasks until they are really tested.
I have been running well any tasks for few weeks, but yesterday got a new long Noelia and the same result again - hang.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29364 - Posted: 7 Apr 2013 | 6:28:57 UTC - in response to Message 29360.
Last modified: 18 Apr 2013 | 12:25:17 UTC

There have been some really odd errors in the last couple of months,
I11R10-NATHAN_dhfr36_3-26-32-RND2505_7
Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
MDIO: unexpected end-of-file for file "input.coor": reached end-of-file before reading 39350 coordinates
ERROR: file mdioload.cpp line 80: Unable to read bincoordfile

called boinc_finish

</stderr_txt>
]]>


Would like plenty of Noelia's NOELIA_Klebe_Equ WU's.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1068
Credit: 1,151,203,264
RAC: 1,008,903
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29395 - Posted: 9 Apr 2013 | 3:39:59 UTC - in response to Message 29360.

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.


Thank you Nate for suspending them. I really hope you guys can figure out the problems in your staging environment, before even sending them through the beta app. If there's anything I can do to help (like some sort of pre-Beta test, if possible), you can PM me. I really enjoy testing, especially when I know it might fail, but I expect the production apps to be near-error-free.

Regards,
Jacob

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29406 - Posted: 11 Apr 2013 | 9:42:31 UTC

I just got another NOELIA long wu and it gave me an error message after 30 seconds of run time, I had to reboot to get the GPU back working.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29407 - Posted: 11 Apr 2013 | 11:28:36 UTC - in response to Message 29406.

Had a NOELIA beta fail this morning, 291px1x1BIS-NOELIA_291p_beta-1-2-RND9212

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 467,796,497
RAC: 702,551
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 29409 - Posted: 11 Apr 2013 | 13:05:45 UTC

063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2
WU has run for 8 hr 20 min with another 8 hr 05 min projected.
Seems excessive on a GTX580
____________

Simba123
Send message
Joined: 5 Dec 11
Posts: 147
Credit: 69,970,684
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 29411 - Posted: 11 Apr 2013 | 13:17:21 UTC - in response to Message 29111.

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen.



you can set teamviewer to start with windows and auto-login, so if the computer at home is setup this way, if it reboots, you will still have access to it.

Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 467,796,497
RAC: 702,551
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 29412 - Posted: 11 Apr 2013 | 15:57:06 UTC

Further to http://www.gpugrid.net/forum_thread.php?id=3318&nowrap=true#29409


063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2 crashed after 10+ hours Locking up whole system and requiring reboot.

The following error from tasks:


<core_client_version>7.0.31</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,427,941,620
RAC: 1,315,908
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29413 - Posted: 11 Apr 2013 | 16:02:06 UTC

I aborted 063px1x1BIS-NOELIA_063p_beta-1-2-RND8034_1 after it had given the "acemd.2865P.exe has encountered a problem ..." popup error three times in succession.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29418 - Posted: 11 Apr 2013 | 19:20:51 UTC

I guess I should have clarified, the NOELIA that crashed on me came through the regular server. Richard, I always get the 2865P error, I thought it was a Windows XP thing.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 29421 - Posted: 12 Apr 2013 | 2:48:56 UTC

I had another NOELIA sneak through on the non-beta long run server, I didn't get the error message this time, it ran for 59 minutes and remained at 0%. The CPU usage was at 0%, the GPU usage was at 0% and the memory controller was at 0% so I aborted it and had to reboot my computer to get my GTX680 working again.

Windows XP Pro x64

2x EVGA GTX680 2GB

Running CPDN on the other 6 cores.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 12,279,345,996
RAC: 8,209,337
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29422 - Posted: 12 Apr 2013 | 8:41:04 UTC
Last modified: 12 Apr 2013 | 8:44:52 UTC

Also just noticed on one of my Linux machines a NOELIA beta task must have been sent using the non-beta long run server & had stalled 24hrs ago.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29562 - Posted: 25 Apr 2013 | 13:30:42 UTC

I got one Noelia on a vista ultimate x86 system with a GTX550Ti. It took 93686.14 seconds to complete, but it did with almost 95000 credits.
So not all Noelia's WU error out!
____________
Greetings from TJ

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 29830 - Posted: 11 May 2013 | 19:40:15 UTC

Just had two Noelia tasks failed:
http://www.gpugrid.net/result.php?resultid=6852307
http://www.gpugrid.net/result.php?resultid=6849844

Others running well.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 29898 - Posted: 13 May 2013 | 13:43:18 UTC - in response to Message 29830.
Last modified: 13 May 2013 | 14:00:32 UTC

Just had two Noelia tasks failed:
http://www.gpugrid.net/result.php?resultid=6852307
http://www.gpugrid.net/result.php?resultid=6849844

They both completed successfully on other machines after you posted, but I don't see any rhyme or reason for it. The machines that failed all did so quickly (in a few seconds). But they have a variety of GPU cards and operating systems, and I doubt they were all overclocked so much that they failed right away (though that is a possibility that should be checked), and they wouldn't have time to get too hot either.

I noticed though that my GTX 650 Ti would sometimes fail after only a few seconds, which I haven't yet seen on my GTX 660s (except those bad work units that everyone failed on). That suggests to me that some work units just won't run on some types of cards. I know that on Folding, it was found out a couple of years ago that some of the more complex work units would fail on cards with only 96 shaders, but would run fine with 192 shaders or more. I don't see that pattern here yet, but something else might become apparent.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 29900 - Posted: 13 May 2013 | 14:56:06 UTC

Jim1348; Not had any problems with Noelia's with either my 650's nor 670. I am running xp sp3 with beta 320 drivers which have been completely stable for me. Actually I even noticed a small perf improvement over the 314's. Might be worth a try one of your problematic machines?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 29901 - Posted: 13 May 2013 | 15:06:27 UTC - in response to Message 29900.
Last modified: 13 May 2013 | 15:15:26 UTC

Jim1348; Not had any problems with Noelia's with either my 650's nor 670. I am running xp sp3 with beta 320 drivers which have been completely stable for me. Actually I even noticed a small perf improvement over the 314's. Might be worth a try one of your problematic machines?

Not problematic; only an occasional failure at the outset on the GTX 650 Ti. But it was a factory-overlocked card, and I have now reduced the clock (and increased the core voltage) to the point where I don't think it gets even the occasional failure anymore.

But many of the cards are factory-overclocked now. That is the same insofar as errors are concerned as if you had used software to overclock a card; it is the chip specs from Nvidia that determine the default clock rate. If the work units fail quickly, it is not much of a problem and you will gain points overall with the faster clocks.

The real problem comes when they fail after a couple of hours; then you should get out the MSI Afterburner and start reducing the clocks, or check the cooling. You will be points ahead in the end. Also, the work units change difficulty; what starts out as a stable card can easily start failing later when (not if) the harder ones come along. So I just don't overclock, which save a lot of troubleshooting later.

wdiz
Send message
Joined: 4 Nov 08
Posts: 20
Credit: 871,871,594
RAC: 2
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29902 - Posted: 13 May 2013 | 16:52:28 UTC - in response to Message 29421.

I had another NOELIA sneak through on the non-beta long run server, I didn't get the error message this time, it ran for 59 minutes and remained at 0%. The CPU usage was at 0%, the GPU usage was at 0% and the memory controller was at 0% so I aborted it and had to reboot my computer to get my GTX680 working again.

Windows XP Pro x64

2x EVGA GTX680 2GB

Running CPDN on the other 6 cores.


Same problem here.
Noelia task is running for 10 hours and only 3% done !
No CPU load, no GPU load


Linux Arch - GTX680 - Driver 319.17

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 29990 - Posted: 16 May 2013 | 6:01:07 UTC

Another Noelia failed after 8:45 hours.
I think I'll take a break with GPUGrid...

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 30019 - Posted: 16 May 2013 | 17:55:18 UTC

Mumak; reading all negative comments yet I have not had any problems with Noelia's. I see you have two machines, one with a 650ti which appears stable the other with a 660ti which is causing you to loose your hair.
Starting to see a pattern and wondering if Nvidia's boost is causing stability issues here?
Will state the obvious and as a test suggest decreasing your 660ti's power target to 72% and clock to Nvidia's default (928, 1500) and see how it goes?

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 30023 - Posted: 16 May 2013 | 18:33:46 UTC

I had no issues with other tasks, but Noelia's failed on 650Ti in the past too.. It's no all which fail - I currently got another one, so will see how that goes...

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 810,073,458
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 30028 - Posted: 16 May 2013 | 19:22:27 UTC
Last modified: 16 May 2013 | 19:23:30 UTC

I only want to ask how much of you try to raise the gpu voltage for about 25mV whats described in some forumthreads. It still is a need on some cards on gpugrid with some type of workunits. Perhaps it helps some of you.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Crunching for my deceased Dog who had "good" Braincancer..

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30052 - Posted: 17 May 2013 | 15:24:14 UTC

Just wanted to add that I too had a Noelia WU that ran for almost 7 hrs and was only 5% complete on GTX660Ti. I had to abort it => 291px6x2-NOELIA_klebe_run2-0-3-RND9489 http://www.gpugrid.net/workunit.php?wuid=4459890

I'm not here to complain, problems are to be expected (I left Seti project after a month of solid outages/problems), and this to me is just minor, business as usual. I just wanted to post so that maybe it can get corrected. Initially I was concerned since it was the very first task I ran on a new Linux build, but I'm currently about 40% done with I2HDQ_35R5-SDOERR_2HDQd-0-4-RND7274 @ 3hrs 45mins, so everything is looking normal.

Steve
____________

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30063 - Posted: 18 May 2013 | 12:55:22 UTC

I had another one which if I had let it run, would have taken 100 hours to complete. 041px44x4-NOELIA_klebe_run2-1-3-RND4186, ran ~6hrs for ~6%, so I aborted again. I haven't seen this happening on any of my other machines. They are all Win7, this system is Linux, so maybe there's something wrong specific to the Linux platform? I'll have to investigate further when I get a chance, too much other things going on right now.http://www.gpugrid.net/workunit.php?wuid=4464500

Steve

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30064 - Posted: 18 May 2013 | 13:50:25 UTC - in response to Message 30063.

This might just be an issue with these specific WU's, they don't run well on Linux.
There are some Downclocking and CPU usage possibilities that might cause this:

If you are not using coolbits to increase the fan speed and the GPU is getting too hot, it would downclock.

The GPU might get downclocked by the OS if the GPU usage isn't perceived as being high enough.

If the CPU's clocks drop (to 1400 MHz) it might starve the GPU enough to cause the GPU's clocks to drop. Top

The problem with Linux is finding out what's going on. I would really like to be able to fill out the Useful Tools area for Linux...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30067 - Posted: 18 May 2013 | 16:21:49 UTC

Has anyone had success in running the current Noelias under Linux? So far I've read a few posts saying it wouldn't work at all.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30068 - Posted: 18 May 2013 | 16:25:38 UTC - in response to Message 30067.
Last modified: 18 May 2013 | 16:35:21 UTC

People tend to complain when things aren't working, rather than when things are working.

There are some successful (and normal) Linux runs for NOELIA_klebe WU's out there:

005px12x2-NOELIA_klebe_run-1-3-RND5943_1 4441647 16 May 2013 | 12:36:44 UTC 16 May 2013 | 23:06:31 UTC Completed and validated 36,427.18 15,679.52 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/result.php?resultid=6878594

255px50x1-NOELIA_klebe_run2-0-3-RND5892_0 4458045 16 May 2013 | 17:37:23 UTC 17 May 2013 | 9:19:35 UTC Completed and validated 36,790.19 18,420.04 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4458045

306px36x4-NOELIA_klebe_run-2-3-RND3942_0 4447878 14 May 2013 | 2:20:57 UTC 14 May 2013 | 12:38:57 UTC Completed and validated 36,208.52 15,933.88 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4447878

All 3 on the same rig.


291px19x1-NOELIA_klebe_run2-0-3-RND1187_1 4459422 18 May 2013 | 1:41:50 UTC 18 May 2013 | 15:27:00 UTC Completed and validated 41,384.82 2,370.90 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4459422

148nx1x4-NOELIA_klebe_run2-0-3-RND6125_1 4455890 16 May 2013 | 21:26:28 UTC 17 May 2013 | 13:28:33 UTC Completed and validated 40,988.13 17,533.30 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4455890

290px25x1-NOELIA_klebe_run-2-3-RND9425_1 4444345 17 May 2013 | 16:11:58 UTC 18 May 2013 | 6:09:31 UTC Completed and validated 38,213.60 3,327.34 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4444345

3 more different Linux rigs, and different WU's, and enough for me.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30077 - Posted: 18 May 2013 | 22:23:14 UTC - in response to Message 30068.

People tend to complain when things aren't working, rather than when things are working.

Sad but true, almost over the entire planet.

____________
Greetings from TJ

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30080 - Posted: 19 May 2013 | 4:39:08 UTC - in response to Message 30077.

People tend to complain when things aren't working, rather than when things are working.

Sad but true, almost over the entire planet.

Why would people complain when things are working?

Trotador
Send message
Joined: 25 Mar 12
Posts: 83
Credit: 1,071,383,799
RAC: 149,994
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30082 - Posted: 19 May 2013 | 6:40:28 UTC
Last modified: 19 May 2013 | 6:41:50 UTC

I have finished succesfullty over 25 new NOELIAS in Linux on my PC with two GTX 660Ti but this morning I've found one at 0% after six hours of processing, stopped it and restarted but still no progress, reboot and start boincmanager but the machine has quickly become unusuable and I had to restart again and abort the unit. These messages were in the log:

Sun 19 May 2013 08:15:45 AM CEST GPUGRID Task 216px32x1-NOELIA_klebe_run2-0-3-RND9100_4: no shared memory segment
Sun 19 May 2013 08:15:45 AM CEST GPUGRID Task 216px32x1-NOELIA_klebe_run2-0-3-RND9100_4 exited with zero status but no 'finished' file

The fist one I hadn't noticed before. The second one is common of suspended and restarted units but before aborting the units it appeared many times so it seems like the unit was restarting itself again and again.

I've seen that all other wingmen have computation errors.

regards

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30083 - Posted: 19 May 2013 | 9:55:17 UTC - in response to Message 30082.
Last modified: 19 May 2013 | 9:58:45 UTC

Trotador, I would suggest you abort it, if you haven't already.

Beyond, I have no idea why 51% of people do anything they do - I don't even ask anymore.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 30084 - Posted: 19 May 2013 | 11:24:37 UTC - in response to Message 30080.

Why would people complain when things are working?

That was worth getting up early to read.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30085 - Posted: 19 May 2013 | 12:09:50 UTC - in response to Message 30083.
Last modified: 19 May 2013 | 12:14:35 UTC

Beyond, I have no idea why 51% of people do anything they do - I don't even ask anymore.

Like electing (sort of) gwb twice?

I've had a couple WUs seem to stall lately and when I vnc to the machine there's an error message saying the acemd app has had an error. If I close that box the WU restarts from zero, but if I shut down BOINC, then hit the X on the box, and then either restart BOINC or reboot the PC the WU progresses normally. It seems better to reboot because restarting BOINC sometimes causes the WU to progress at about 1/2 speed. A restart gets the GPU running normally again. BTW, the above order of the steps is important:

1) Shut down BOINC.
2) Hit the X on the error message.
3) Restart BOINC or (preferably) reboot.

BTW, all these boxes are Win7-64.

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30118 - Posted: 20 May 2013 | 14:39:21 UTC - in response to Message 30068.

I'm sorry if you think that I was complaining, I was under the impression that maybe I'd get some help here. I've had 3 successful SDOERR tasks complete normally in the expected amount of time, and 3 NOELIA_klebe tasks that run painfully slow, most likely to end in error. I have a 4th NOELIA_klebe at 11% that's been running for 13hrs 15mins that I'm about to abort. I am in no way saying that they can't be successfully run on the Linux platform, just trying to find out what's going on so that I can get it corrected.

Here is a link to the problematic host's tasks
http://www.gpugrid.net/results.php?hostid=151979

Also, I did enable coolbits (GPU temps are around 41 degrees C) and set PowerMizer to prefer maximum performance. Also, I decided against aborting the current NOELIA_klebe task in hopes of using it for troubleshooting the problem. I've tried shutting down BOINC and rebooting, nothing's changed & still running slow.

Any suggestions?

Thanks,
Steve

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30123 - Posted: 20 May 2013 | 17:49:12 UTC - in response to Message 30118.

Also, I did enable coolbits (GPU temps are around 41 degrees C) and set PowerMizer to prefer maximum performance. Also, I decided against aborting the current NOELIA_klebe task in hopes of using it for troubleshooting the problem. I've tried shutting down BOINC and rebooting, nothing's changed & still running slow.

Any suggestions?

Thanks,
Steve

Hi Steve, the temp suggests to me that the WU has stopped. That happens now and then on windows too. See my post just above and see if that gets the WU moving again (with or without the error message). I'd try the reboot option as the GPU may have gone into an idle state (slow but still slightly processing). Shut down BOINC first and THEN reboot. Hope it works for you.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30134 - Posted: 21 May 2013 | 2:31:51 UTC

The latest Noelia seem to take more time to finish than an earlier units.

Here is a unit completed just a little while ago. It completed in just over 12 hours:

http://www.gpugrid.net/workunit.php?wuid=4468403

While a unit complete on May 9, finished in a little over 9 hours:

http://www.gpugrid.net/workunit.php?wuid=4438103

Anybody else notice this?

So far, I have not experienced the blue screen, with these unit, and I had only about 4 or 5 error out, but those units were the "Too many errors (may have bug)" units, so that doesn't worry me. At last count, I have 97 completed and valid.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30143 - Posted: 21 May 2013 | 8:24:55 UTC - in response to Message 30134.

The latest Noelia seem to take more time to finish than an earlier units.

Here is a unit completed just a little while ago. It completed in just over 12 hours:

http://www.gpugrid.net/workunit.php?wuid=4468403

While a unit complete on May 9, finished in a little over 9 hours:

http://www.gpugrid.net/workunit.php?wuid=4438103

Anybody else notice this?

So far, I have not experienced the blue screen, with these unit, and I had only about 4 or 5 error out, but those units were the "Too many errors (may have bug)" units, so that doesn't worry me. At last count, I have 97 completed and valid.


On my GTX285 and GTX550Ti they take between 41-42 hours. The previous ones around 30 hours. With no errors, but my systems crunch only a few though.
The ones from Stephen, SDOERR take roughly 28 on my rigs. As of yet without error as well.

____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30151 - Posted: 21 May 2013 | 10:02:03 UTC

Bedrich, your runtimes for the current Noelias vary from 33ks to 43ks on the one host I looked at. That's a lot.. I'd look at GPU utilization fluctuation, try to free some more CPU cores (if they're busy with other tasks) and see if GPu utilization stabilizes. I don't think this strong variation is inherent to the WUs.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30156 - Posted: 21 May 2013 | 13:52:26 UTC - in response to Message 30134.

The latest Noelia seem to take more time to finish than an earlier units.
Here is a unit completed just a little while ago. It completed in just over 12 hours:
While a unit complete on May 9, finished in a little over 9 hours:

Anybody else notice this?

So far, I have not experienced the blue screen, with these unit, and I had only about 4 or 5 error out, but those units were the "Too many errors (may have bug)" units, so that doesn't worry me. At last count, I have 97 completed and valid.

I haven't noticed them getting longer lately but they're definitely longer than is comfortable for my GPUs. In fact I move 4 of my cards to different projects when NOELIAS are the only WUs available. Not sure if the length is necessary or just an arbitrary choice. Strongly wish they were shorter though.

My observations on NOELIA WUs on my GPUs:

1) They're the longest running WUs I've seen at GPUGrid.
2) They're the most troublesome WUs I've seen at GPUGrid.
3) They have the lowest credits/hour of any long WUs.

Something does not compute (pun intended).

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30169 - Posted: 21 May 2013 | 19:14:37 UTC - in response to Message 30156.
Last modified: 21 May 2013 | 19:15:58 UTC

My observations on NOELIA WUs on my GPUs:

1) They're the longest running WUs I've seen at GPUGrid.
2) They're the most troublesome WUs I've seen at GPUGrid.
3) They have the lowest credits/hour of any long WUs.

Something does not compute (pun intended).

Too late to edit my above post. New NATHANs just hit and they're SLOWER than anything I've seen yet at least on my 4 GTX 460/768MB cards. Not sure how the credits/hour will play out but since they won't make 24 hours it won't be pretty (at least on the 460s). Haven't hit the 650 Ti GPUs yet. But really, I'll ask again: Is there a good reason that the WUs have to be this long or is it just an arbitrary setting?

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30171 - Posted: 21 May 2013 | 20:55:31 UTC - in response to Message 30169.

Basically, the amount of information included in a model determines the runtime. The more info you put in, the longer it will take, but the more accurate and meaningful the results can be.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30176 - Posted: 21 May 2013 | 23:28:30 UTC - in response to Message 30151.

Bedrich, your runtimes for the current Noelias vary from 33ks to 43ks on the one host I looked at. That's a lot.. I'd look at GPU utilization fluctuation, try to free some more CPU cores (if they're busy with other tasks) and see if GPu utilization stabilizes. I don't think this strong variation is inherent to the WUs.

MrS


I have 1 cpu dedicated to 1 gpu, and I haven't changed that.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30223 - Posted: 22 May 2013 | 18:56:12 UTC

The way I understand it is that the complexity of the WU determines the time for each time step (in the range of ms). The amount of time steps should be rather arbitrary and choosen so that the server is not overloaded.

MrS
____________
Scanning for our furry friends since Jan 2002

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30232 - Posted: 22 May 2013 | 21:41:21 UTC

This particular unit was a nasty one for me.

http://www.gpugrid.net/workunit.php?wuid=4452849

It caused my computer to shut down and it caused 2 other units, which were running well, to crash.

http://www.gpugrid.net/workunit.php?wuid=4475286

http://www.gpugrid.net/workunit.php?wuid=4474620

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30234 - Posted: 22 May 2013 | 22:34:24 UTC - in response to Message 30169.

My observations on NOELIA WUs on my GPUs:

1) They're the longest running WUs I've seen at GPUGrid.
2) They're the most troublesome WUs I've seen at GPUGrid.
3) They have the lowest credits/hour of any long WUs.

Something does not compute (pun intended).

Too late to edit my above post. New NATHANs just hit and they're SLOWER than anything I've seen yet at least on my 4 GTX 460/768MB cards. Not sure how the credits/hour will play out but since they won't make 24 hours it won't be pretty (at least on the 460s). Haven't hit the 650 Ti GPUs yet. But really, I'll ask again: Is there a good reason that the WUs have to be this long or is it just an arbitrary setting?


I've done 7 total of the new Nathan's & they're giving credit of 167,550. 6 of the tasks ran 47K seconds on GTX660Ti's, and 75K seconds on a GTX560Ti.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30317 - Posted: 24 May 2013 | 20:50:12 UTC

Looks like, I pulled this one out of the fire:

http://www.gpugrid.net/workunit.php?wuid=4473341


HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 4,855,582,826
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30366 - Posted: 25 May 2013 | 19:10:58 UTC - in response to Message 30317.

My 680 is crunching Noelia task in about 300000 s. Other tasks (Sdoerr and Nathan) are ok. I have updated driver to last 319.23.

Are they really so long?

Zdenek

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 30368 - Posted: 25 May 2013 | 19:35:16 UTC

My 680's are taking right around 29,000, that's at 1175MHz Windows XP x64.

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 4,855,582,826
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30369 - Posted: 25 May 2013 | 19:53:31 UTC - in response to Message 30368.

My 680's are taking right around 29,000, that's at 1175MHz Windows XP x64.


I have times around 29000 week ago. Maybe something with driver.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 30370 - Posted: 25 May 2013 | 20:31:06 UTC

What clock speed is you're 680 running at? To be honest, I think you're in the pipe 5x5 (just right).

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30372 - Posted: 25 May 2013 | 23:25:41 UTC - in response to Message 30366.

Zdenek, It might be a driver issue. Others have reported similar problems with 319.x on Linux.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 4,855,582,826
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30384 - Posted: 26 May 2013 | 9:00:05 UTC - in response to Message 30372.
Last modified: 26 May 2013 | 9:00:28 UTC

Zdenek, It might be a driver issue. Others have reported similar problems with 319.x on Linux.


Yes, you are right. Moved back to 310 and all is ok.

I have problem with my own distrrtgen app also. IMHO It stucks on synchronizing between CPU and GPU in cudaDeviceSynchronize().

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30387 - Posted: 26 May 2013 | 9:11:15 UTC - in response to Message 30384.
Last modified: 26 May 2013 | 9:11:48 UTC

I think there has generally been issues with this since about CUDA 4.2 dev.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 4,855,582,826
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30397 - Posted: 26 May 2013 | 9:47:34 UTC - in response to Message 30387.

I have found that 6xx and Titan have problems. 5xx looks ok.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 178
Credit: 132,357,411
RAC: 1,373
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 30408 - Posted: 26 May 2013 | 12:08:47 UTC

Two more NOELIA failures yesterday: one after 4 seconds, the other after 20,927 seconds. I will continue with 'short' tasks.


6895000 4475926 25 May 2013 | 16:07:19 UTC 25 May 2013 | 16:18:17 UTC Error while computing 4.10 3.21 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
6894999 4475925 25 May 2013 | 16:07:19 UTC 25 May 2013 | 22:08:10 UTC Error while computing 20,927.20 9,396.42 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
6894967 4475903 25 May 2013 | 16:18:17 UTC 25 May 2013 | 16:21:06 UTC Aborted by user 0.00 0.00 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 31087 - Posted: 28 Jun 2013 | 8:48:17 UTC

Hi

Just completed what looks like a brand new Noelia

http://www.gpugrid.net/result.php?resultid=6992175

87k sec on a gtx 650 ti, poor gpu utilisation despite a reboot half way thinking there was a problem :(

Hope its a one off?

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31091 - Posted: 28 Jun 2013 | 12:56:11 UTC - in response to Message 31087.

Just completed what looks like a brand new Noelia

http://www.gpugrid.net/result.php?resultid=6992175

87k sec on a gtx 650 ti, poor gpu utilisation despite a reboot half way thinking there was a problem :(

Hope its a one off?

I got one of these too. The first guy aborted it and it took my OCed 650 Ti well over 24 hours (92,919.91 seconds) to run it in Win7-64 (vs yours in XP). GPU utilization was OK, but these are TOO LONG and to add insult to injury only give out about 1/2 the credits they should.

http://www.gpugrid.net/workunit.php?wuid=4527468

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31092 - Posted: 28 Jun 2013 | 12:56:30 UTC - in response to Message 31087.

I had one, too - http://www.gpugrid.net/result.php?resultid=6992593

I aborted it when I noticed it at 16+ hours, 54% completed.
The GPU load was at 95%, but Mem Controller load was only 4% on a 660ti.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 31093 - Posted: 28 Jun 2013 | 13:13:57 UTC
Last modified: 28 Jun 2013 | 13:16:42 UTC

Hmm, Does not sound promising - don't suppose anyone has noticed what the gpu mem utilisation is?
Was wondering if it went over 1gb as seen a 680 complete one in a third of the time.
Agree these are too long, in my opinion they should be in a separate 'bucket' with a clear min hardware spec requirement; The long and short descriptions are far too vague.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31095 - Posted: 28 Jun 2013 | 14:20:05 UTC

There's one thing I'd like to say though, Nathan sure did a bang-up job on those NATHAN_KIDKIXc22's, I'm getting 98% GPU load and 35-38% memory controller utilization on my GTX680's. This buds for you, Nathan! You should have named them KIDKIX_BUTTc22, you really should give a clinic for the fellow researchers (I'm sure they'll get it sorted, not complaining).

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31097 - Posted: 28 Jun 2013 | 14:56:18 UTC - in response to Message 31093.

Hmm, Does not sound promising - don't suppose anyone has noticed what the gpu mem utilisation is?


Mine was around 1045 MB.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 810,073,458
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31099 - Posted: 28 Jun 2013 | 16:19:12 UTC
Last modified: 28 Jun 2013 | 16:19:38 UTC

Wow does that mean this units dont run on 1GB VRAM Hardware? (didnt tried)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Crunching for my deceased Dog who had "good" Braincancer..

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 31100 - Posted: 28 Jun 2013 | 16:54:59 UTC

Thanks, at least we have a plausible explanation, not so sure ruling out the mainstream will be good for GpuGrid. Pity we can't isolate WUs as I can think of an addition to Flashawks naming convention - but lets not go there.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31101 - Posted: 28 Jun 2013 | 17:14:39 UTC

Are you asking about GPU utilization or the size of the work units? I take it that these wu's are from the short queue and if the work unit size is larger than the amount of GDDR memory on the video card, that would not only cause a massive slow down in crunching times it will also make your computer almost unresponsive (mouse, keyboard and such).

Beyond knows what I'm talking about and if that were the case, I'm sure he would have mentioned it. I am confused by petebe's response, is he talking about the work unit size? I haven't done any short queue tasks in sometime and I do know that Noelia's work units are setup differently than Nathans and her wu's typically have a lower CPU and GPU utilization.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31103 - Posted: 28 Jun 2013 | 18:04:00 UTC - in response to Message 31101.

That's what I understood too. All scientists are working on different projects/amino acids and use different algorithms. Thus WU's differ. The latest one from Nathan seems almost optimal as far we can see in error-free and rather fast cycles on the fastest cards.
But I also like to mention that I had very little problems with Noelia's WU's as well, only one beta failed and one because Windows though to update itself (this is now not longer possible).
____________
Greetings from TJ

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31104 - Posted: 28 Jun 2013 | 18:13:35 UTC - in response to Message 31101.

Flashhawk, I was referring to the "Memory Used" figure as reported by GPU-Z.
Memory Used was 1045 mb and Memory Controller Load was 4%.
In contrast, NATHANs usually run around 250-450 mb Memory Used and 35% Mem Controller Load.

I don't know how this relates to a WU size - sorry if this was confusing.

This particular 660ti does GPUGrid crunching only - it's not connected to a monitor. One HT CPU reserved per GPU.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31106 - Posted: 28 Jun 2013 | 18:46:34 UTC - in response to Message 31101.

Are you asking about GPU utilization or the size of the work units? I take it that these wu's are from the short queue and if the work unit size is larger than the amount of GDDR memory on the video card, that would not only cause a massive slow down in crunching times it will also make your computer almost unresponsive (mouse, keyboard and such).

Beyond knows what I'm talking about and if that were the case, I'm sure he would have mentioned it. I am confused by petebe's response, is he talking about the work unit size? I haven't done any short queue tasks in sometime and I do know that Noelia's work units are setup differently than Nathans and her wu's typically have a lower CPU and GPU utilization.

They're long queue WUs yet they credit like the short queue. If I see any more I'll make like a Dalek: EXTERMINATE, EXTERMINATE!!!

BTW, like you mentioned: kudos to Nathan on the new KIX WUs. Nathan, give the other WU generators a class in WU design. Please?

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31107 - Posted: 28 Jun 2013 | 18:47:01 UTC - in response to Message 31104.

NOELIA_Mg WU are Long runs. Most of Noelia's work has used >1GB GDDR and taken longer than other work.
207850 credits would be about right in my opinion (including the 50% bonus).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31108 - Posted: 28 Jun 2013 | 18:51:56 UTC - in response to Message 31107.

NOELIA_Mg WU are Long runs. Most of Noelia's work has used >1GB GDDR and taken longer than other work.
207850 credits would be about right in my opinion (including the 50% bonus).

The ones listed above scored just 69,875. Including only 25% bonus though since they're SO LONG :-(

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31110 - Posted: 28 Jun 2013 | 19:12:28 UTC - in response to Message 31104.
Last modified: 28 Jun 2013 | 19:19:50 UTC

Flashhawk, I was referring to the "Memory Used" figure as reported by GPU-Z.
Memory Used was 1045 mb and Memory Controller Load was 4%.
In contrast, NATHANs usually run around 250-450 mb Memory Used and 35% Mem Controller Load.

I don't know how this relates to a WU size - sorry if this was confusing.

This particular 660ti does GPUGrid crunching only - it's not connected to a monitor. One HT CPU reserved per GPU.


No petebe, I wasn't confused by anything on your part, I was confused because more people aren't complaining about unresponsive computers. If someone is using an older card with only 1GB of onboard GDDR, then the system RAM or swap file would be used slowing computers to a crawl.

No, you're fine buddy, sorry for the confusion, I should have been a little clearer. Frankly, I'm shocked I haven't seen more of this in the forum, that's a huge wu for not much credit. I guess I'll have to turn on the short queue and check them out, that is odd they aren't in the long queue.

Edit: I understand now (I'm pretty slow sometimes), there coming through the long queue, I haven't seen one yet.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31114 - Posted: 28 Jun 2013 | 20:56:00 UTC

GPU-Z "only" reports the overall memory used, which includes the GPU-Grid WU and anything else running. If a card with 1024 MB shows 1045 MB used that won't slow the computer to a crawl. Everything except a whopping 21 MB still fit into the GPU memory. How often can this amount be transferred back and forth between system RAM and GPU at PCIe speeds? (rough answer: a damn lot)

It's only when the amount of memory needed significantly exceeds the amount of memory present on the card that things will become.. uncomfortable.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31117 - Posted: 28 Jun 2013 | 22:16:45 UTC - in response to Message 31114.

GPU-Z "only" reports the overall memory used, which includes the GPU-Grid WU and anything else running. If a card with 1024 MB shows 1045 MB used that won't slow the computer to a crawl. Everything except a whopping 21 MB still fit into the GPU memory. How often can this amount be transferred back and forth between system RAM and GPU at PCIe speeds? (rough answer: a damn lot)

It's only when the amount of memory needed significantly exceeds the amount of memory present on the card that things will become.. uncomfortable.

MrS

It could also be other things. As some of you remember from another thread I was having problems with my new GTX660 on a XFX MOBO. With a laggy system at times with 2GB RAM on the GPU and 12GB on the MOBO. Or the driver, or the driver in combination with another piece of software. I have put the GTX660 in another system and it works like a train (as we say in Dutch).

But how about the question from dskagcommunity about VRAM? His exact question:
Wow does that mean this units don't run on 1GB VRAM Hardware? (didn't tried)

As it can be swapped back and forth from system RAM to GPU than would the message that these WU's won't run on 1GB VRAM cards make no sense, or am I missing something important? I just want to learn here.


____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31120 - Posted: 28 Jun 2013 | 23:42:04 UTC - in response to Message 31117.
Last modified: 28 Jun 2013 | 23:42:34 UTC

This has been discussed before, and to some length - These WU's are only going to be slow on 768MB cards and 512MB cards; a few GTX460's, GTX450's and GT440's. Generally speaking the relative performance of Noelias' WU's on mid-range cards should be better as they won't be burdened with low bus/bandwidths. More the pity they don't have better credit and cant finish inside 24h...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31124 - Posted: 29 Jun 2013 | 9:54:18 UTC - in response to Message 31117.

You're right TJ, these WUs should run on cards with 1 GB VRAM. However, I think the signs are clear: don't anyone buy or recommend a card with 1 GB for GPU-Grid any more.

And there's the issue of algorithm selection. GFD once said that they've got (at least) 2 different algorithms, one is faster but needs more VRAM. The app selects the faster one if possible. Meaning cards with "low" VRAM may see reduced crunching speed even before running out of VRAM completely.

Oh, and regarding the reference to your strange GTX660 problem: the question here was not "what can make a system choppy" (lot's of possibilities, I agree) but rather "would exceeding the available VRAM slightly make a system choppy".

MrS
____________
Scanning for our furry friends since Jan 2002

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 810,073,458
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31125 - Posted: 29 Jun 2013 | 11:23:52 UTC
Last modified: 29 Jun 2013 | 11:24:40 UTC

Ah good to read, hope the 1,28gb on my 24h crunchermachines are working still a longer time without swapping, i buyed them only few month ago ^^
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Crunching for my deceased Dog who had "good" Braincancer..

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31126 - Posted: 29 Jun 2013 | 12:27:32 UTC - in response to Message 31120.

This has been discussed before, and to some length - These WU's are only going to be slow on 768MB cards and 512MB cards; a few GTX460's, GTX450's and GT440's. Generally speaking the relative performance of Noelias' WU's on mid-range cards should be better as they won't be burdened with low bus/bandwidths. More the pity they don't have better credit and cant finish inside 24h...

EXCEPT that no one has been talking about these WUs on sub 1GB cards. The first reports were referring to 650 Ti GPUs and so far reports have been that they're running even worse on the 2GB 660 and 660 TI cards. BTW, you can add a 1280MB 570 to the list of GPUs that don't like these new NOELIAS...

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31355 - Posted: 8 Jul 2013 | 21:42:34 UTC

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31361 - Posted: 9 Jul 2013 | 6:55:53 UTC - in response to Message 31355.

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Well, my 650Ti doesn't seem to like them AT ALL! At least this one.

Slot: 0
Task: 063ppx8x1-NOELIA_klebe_run4-0-3-RND9577_0
Elapsed: 04:29
CPU time: 00:17
Percent done: 03.76
Estimated: 119:17
Remaining: 114:36

So, it will take something like 5 days to finish on my 650Ti! I wonder if there's a card out there that can finish these in the 24h window...
____________

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31362 - Posted: 9 Jul 2013 | 8:30:31 UTC

In Linux I can't check GPU utilization, so can't tell how well this NOELIA is using my card. Judging by the temperature of the card though (52C), utilization must be pretty low, as it normally goes up to 64-67C with NATHANs.

I'm wondering what to do, let it continue or abort it? It seems I'll finish it before its deadline, but looks like such a waste of both time and resources, doesn't it?
____________

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 4,855,582,826
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31363 - Posted: 9 Jul 2013 | 9:17:15 UTC - in response to Message 30384.

Zdenek, It might be a driver issue. Others have reported similar problems with 319.x on Linux.


Yes, you are right. Moved back to 310 and all is ok.

I have problem with my own distrrtgen app also. IMHO It stucks on synchronizing between CPU and GPU in cudaDeviceSynchronize().


All drivers above 319 (incl 325 beta) under linux still have problem with noelia tasks with 6xx gpus. Very slow. Low CPU usage.

I recommend to use 310 under linux.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31364 - Posted: 9 Jul 2013 | 9:43:44 UTC - in response to Message 31363.

You're so right! I removed 319 and installed 310.44 and immediately CPU usage went up (40-45% from 15-20%) and the GPU temp is at the usual 65C! Also, estimated total time is dropping rapidly.

Thanks for the great tip!
____________

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31365 - Posted: 9 Jul 2013 | 11:24:12 UTC - in response to Message 31361.
Last modified: 9 Jul 2013 | 11:25:02 UTC

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Well, my 650Ti doesn't seem to like them AT ALL! At least this one.

Slot: 0
Task: 063ppx8x1-NOELIA_klebe_run4-0-3-RND9577_0
Elapsed: 04:29
CPU time: 00:17
Percent done: 03.76
Estimated: 119:17
Remaining: 114:36

So, it will take something like 5 days to finish on my 650Ti! I wonder if there's a card out there that can finish these in the 24h window...

They are ok here. Finishing without errors in the usual 8:30/9:00 hrs in my 690s and 770s. Driver 320.49

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31366 - Posted: 9 Jul 2013 | 12:43:22 UTC - in response to Message 31365.

Yeah, it seems it was the driver (319.17). I downgraded to 310.44 and it's working much better now. If my calculations are not off, it should take ~18.5h for my 650Ti to complete such a WU, which is very similar to NATHAN_KIDs. This particular NOELIA I'm crunching right now will of course take longer, as I did the first 6% very slowly.
____________

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 4,855,582,826
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31369 - Posted: 9 Jul 2013 | 14:32:22 UTC - in response to Message 31365.

They are ok here. Finishing without errors in the usual 8:30/9:00 hrs in my 690s and 770s. Driver 320.49


It is linux and noelia tasks problem only. Win are ok. Nathans tasks and linux are ok also.

captainjack
Send message
Joined: 9 May 13
Posts: 114
Credit: 823,969,775
RAC: 980,854
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31370 - Posted: 9 Jul 2013 | 15:45:14 UTC

This Noelia task http://www.gpugrid.net/result.php?resultid=7034770 appeared to lock up my Linux box so I aborted it (after two reboots). It had previously failed on two Windows boxes.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31371 - Posted: 9 Jul 2013 | 16:22:46 UTC

On all 3 of my GTX 460/768 GPUs:

CRASH!

Problem signature:
Problem Event Name: APPCRASH
Application Name: acemd.2865P.exe
Application Version: 0.0.0.0
Application Timestamp: 511b9dc5
Fault Module Name: acemd.2865P.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 511b9dc5
Exception Code: 40000015
Exception Offset: 00015ad1
OS Version: 6.1.7601.2.1.0.256.1
Locale ID: 1033
Additional Information 1: ef57
Additional Information 2: ef57694f685d7e60ac50a2030c6fbaf6
Additional Information 3: 907e
Additional Information 4: 907ef510ab2fa0efd4b93de2612b25ed

Seems they need at least 1GB, thanks for the heads up, not...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,686,789,844
RAC: 10,151,546
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31373 - Posted: 9 Jul 2013 | 20:44:53 UTC

They did not failed on my hosts so far.
Their credit per sec and their GPU usage are ok. (on WinXP x64 and x32)
They still don't use a full CPU thread (with Kepler GPUs), however it does not decrease their GPU usage.
No complaints from me this time. So far. :)

pvh
Send message
Joined: 17 Mar 10
Posts: 23
Credit: 432,598,631
RAC: 817,591
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31376 - Posted: 9 Jul 2013 | 22:02:47 UTC

I thought you were going to abort the current Noelia runs...?

I had one that was stuck at 6% and aborted it. But I keep getting other Noelia WUs as replacements. No Nathan runs anywhere in sight...

I am giving up on this project for now...

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31377 - Posted: 10 Jul 2013 | 0:40:12 UTC - in response to Message 31376.

I have 1 box that doesn't like these NOELIA's so I'm going to swap in my 2 GTX670 backup cards and see if that works. I should just switch to Linux Debian now, I've been getting ready for sometime. Microsoft is going to stop supporting Windows XP 32 bit in April and XP x64 in September 2014 even though XP is still running on 38% of the worlds computers (Windows 7 is 44%).

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 12,279,345,996
RAC: 8,209,337
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31378 - Posted: 10 Jul 2013 | 1:35:10 UTC

So far no issues with current Noelia tasks on Linux but this beta had run for 11+ hrs before I noticed it & required a reboot to get the 660 working again.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31380 - Posted: 10 Jul 2013 | 8:57:28 UTC - in response to Message 31361.

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Well, my 650Ti doesn't seem to like them AT ALL! At least this one.

Slot: 0
Task: 063ppx8x1-NOELIA_klebe_run4-0-3-RND9577_0
Elapsed: 04:29
CPU time: 00:17
Percent done: 03.76
Estimated: 119:17
Remaining: 114:36

So, it will take something like 5 days to finish on my 650Ti! I wonder if there's a card out there that can finish these in the 24h window...


Reporting back on this. It turned out (with the help of HA-SOFT, s.r.o., thanks!) that NOELIAs have some trouble with driver 319 under Linux. I downgraded to 310.44 and the NOELIA I was currently crunching started progressing at a much faster rate. It finished in ~25h (previously estimated at 119h!) and, of course, I missed the 24h bonus, but only because I had lost ~7 hours with the newer driver.

The new NOELIA_klebe_run I got has an estimated 18:09, which is about the same with NATHANs on my GTX 650Ti. What's sweet with these NOELIAs is the CPU usage, about 40-45% of my i7 870.
____________

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31381 - Posted: 10 Jul 2013 | 10:58:30 UTC - in response to Message 31380.

The 304.88 repository driver works just fine. In my opinion there are too many issues with the ~320 drivers for both Windows and Linux.

83equ-NOELIA_7mg_restraint-0-1-RND2660_2 4581443 9 Jul 2013 | 22:39:33 UTC 10 Jul 2013 | 8:39:45 UTC Completed and validated 13,625.77 4,782.22 38,025.00 ACEMD beta version v6.49 (cuda42)

The times look similar to last weeks:

53equ-NOELIA_1MG-0-1-RND1933_0 4565739 3 Jul 2013 | 19:01:52 UTC 4 Jul 2013 | 1:19:44 UTC Completed and validated 13,469.70 4,359.40 38,025.00 ACEMD beta version v6.49 (cuda42)

I like that the Beta credit is in line with Long WU's even though the Betas are quite short.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31384 - Posted: 10 Jul 2013 | 14:32:07 UTC - in response to Message 31380.

It turned out (with the help of HA-SOFT, s.r.o., thanks!) that NOELIAs have some trouble with driver 319 under Linux. I downgraded to 310.44 and the NOELIA I was currently crunching started progressing at a much faster rate.

Don't know what's going on with NV drivers lately. Had to switch 3 of my GPUs to other projects because of the NOELIAs and found that while 2 ran fine at SETI, the 3rd did not. Looked at them and sure enough the 3rd had a newer driver (all are Win7-64). Reverted to 310.90 and SETI ran like a charm. So it's not just Linux, it's Windows too with NVidia driver problems.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 810,073,458
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31386 - Posted: 10 Jul 2013 | 15:33:37 UTC
Last modified: 10 Jul 2013 | 15:35:13 UTC

Thats why the admins over several projects often say, the lastest drivers are perhaps good for gaming but not always for crunching ;) i think the latest really stable crunchproofdrivers are 310.xx Im very careful with driver updates because there where too much problems often enough. But thats not only an NVidia Thing. You can always hit the ground hard with actual ATI/AMD Drivers like 13.x too ;)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Crunching for my deceased Dog who had "good" Braincancer..

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 178
Credit: 132,357,411
RAC: 1,373
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 31387 - Posted: 10 Jul 2013 | 17:24:16 UTC
Last modified: 10 Jul 2013 | 17:25:04 UTC

My system builder has threatened me with at least death if I ever update an NVIDIA driver. He carefully selects the driver as he builds the machine and leaves in in place....

I am now running NOELIAs without a problem.

John

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31388 - Posted: 10 Jul 2013 | 17:45:09 UTC - in response to Message 31387.

John,

What driver do you (and your system builder) like these days?
I haven't noticed any problems with recent Nvidia drivers, but I can't say that they don't occur either.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31392 - Posted: 11 Jul 2013 | 7:45:13 UTC

Argh! These NOELIA_xMG_RUN WUs are taking too long on my 650Ti, around 44h!! I aborted two of them, hoping for a NOELIA_klebe or NATHAN, but nope, it was one of these beasts or nothing..

What times do you guys see for these WUs?
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,427,941,620
RAC: 1,315,908
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31393 - Posted: 11 Jul 2013 | 8:16:03 UTC - in response to Message 31392.

Just completed 35x5-NOELIA_7MG_RUN-0-2-RND3709 - only a minor increase in runtime (less than 10%) compared to other recent tasks for host 132158.

But I did see that the final upload was ~150 MB - that's back to pre-compression sizes.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31394 - Posted: 11 Jul 2013 | 8:17:47 UTC

I have 3 Noelia's running estimate is 12-13 hours and definitely one will finish in that time. All run on windows (vista and 7) with 320.18 drivers. I had some Noelia SR in the previous days and they finish all okay.
Yesterday evening one resulted in a automatic system boot. Checking Who Crashed shows that it was the nVidia driver. I haven´t changed it yet as I want to see more crashed or bad results.
____________
Greetings from TJ

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31395 - Posted: 11 Jul 2013 | 8:46:22 UTC
Last modified: 11 Jul 2013 | 8:51:23 UTC

Thanks for the responses guys!

You both have faster cards than my 650Ti, you have 660s and 670s.

TJ, it's one of your 660s that estimates to 12-13h, right?

Shouldn't my 650Ti estimate to about twice that, ~24h? Instead, my estimate is at 44h, which is more than 3x your 660's time!

I hope these NOELIAs don't have a problem with driver 310 under Linux!

Edit: TJ, I guess your estimates are from the BOINC manager, right? My 44h estimate is from a script I have that parses the slots' task state files. My BOINC manager shows ~24h estimate, as expected. Of course, the BOINC manager's estimates are almost always wrong for me...
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31397 - Posted: 11 Jul 2013 | 9:21:59 UTC - in response to Message 31395.

Thanks for the responses guys!

You both have faster cards than my 650Ti, you have 660s and 670s.

TJ, it's one of your 660s that estimates to 12-13h, right?

Shouldn't my 650Ti estimate to about twice that, ~24h? Instead, my estimate is at 44h, which is more than 3x your 660's time!

I hope these NOELIAs don't have a problem with driver 310 under Linux!

Edit: TJ, I guess your estimates are from the BOINC manager, right? My 44h estimate is from a script I have that parses the slots' task state files. My BOINC manager shows ~24h estimate, as expected. Of course, the BOINC manager's estimates are almost always wrong for me...

Hello Vagelis,
Yes indeed the 12-13 hour is for the 660. I have also a 550Ti doing a Noelia and that would take about 46 hours! Already 36.5 hours done.
I do the estimates myself. If I see what % has been done in which time I calculate that towards 100%. So 100 divided by percentage % dome times the time it took to do that percentage.
You have to see if the driver works by let it do a few WU's. I don't switch drivers to often.
____________
Greetings from TJ

Kenneth Larsen
Send message
Joined: 11 Feb 09
Posts: 6
Credit: 162,114,203
RAC: 19
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31398 - Posted: 11 Jul 2013 | 9:24:10 UTC

I have been running GPUgrid for weeks without trouble, suddenly, a few days ago, all the work units I try to run don't utilize the GPU as before. Initially they are calculated to run for abou 13 hours, but after the 13 hours they have only reached about 15% and the time to completion starts rising. At this point I abort them, if not before, as they don't seem to utilize more than a small part of the GPU.
I'm running Boinc 7.0.65 on Linux, nvidia-drivers 319.23.
So far I've wasted about 2-3 days of electricity trying to crunch. Am I the only one experiencing this? Is it due to the new WUs mentioned?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31399 - Posted: 11 Jul 2013 | 9:39:25 UTC - in response to Message 31392.
Last modified: 11 Jul 2013 | 9:57:14 UTC

Argh! These NOELIA_xMG_RUN WUs are taking too long on my 650Ti, around 44h!! I aborted two of them, hoping for a NOELIA_klebe or NATHAN, but nope, it was one of these beasts or nothing..

What times do you guys see for these WUs?

A little under 19 hours on my 650 Ti (980 MHz), using Win7 64-bit. Two have completed successfully, and one is in progress. (The only crash was when I changed a cc_config file on a work in progress; I think it would have completed normally otherwise.)

Did you leave a CPU core free to support the GPU?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31400 - Posted: 11 Jul 2013 | 9:44:11 UTC - in response to Message 31398.

I have been running GPUgrid for weeks without trouble, suddenly, a few days ago, all the work units I try to run don't utilize the GPU as before. Initially they are calculated to run for abou 13 hours, but after the 13 hours they have only reached about 15% and the time to completion starts rising. At this point I abort them, if not before, as they don't seem to utilize more than a small part of the GPU.
I'm running Boinc 7.0.65 on Linux, nvidia-drivers 319.23.
So far I've wasted about 2-3 days of electricity trying to crunch. Am I the only one experiencing this? Is it due to the new WUs mentioned?

Well its hard to say I guess. It are all different Noelia WU´s. You can also read in this thread that the driver you are using could be the issue. The klebe-run seems to be in line with the long runs, but on my 550Ti it is already 36.6 hours working for 79%. That 550Ti uses driver 320,18 but could be an issue with this particular WU.
I see regularly that a SR is about 3 times as long on the 550Ti than on the 660 and a LR twice as long.
Now the Noelia klebe is going to take 4 times as long. I like to see some klebe-runs on my 660 first to decide what to do with the driver.
____________
Greetings from TJ

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,825,500,609
RAC: 966,328
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31401 - Posted: 11 Jul 2013 | 10:07:05 UTC

The NOELIA_xMG_RUN WUs have a very large output file, approximately 147 MB. Are you using the previous application version, again? The units are running otherwise fine, taking me between 10.5 to 11.5 hours to complete on my windows 7 computer, so please don't cancel them, like you did last time.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31402 - Posted: 11 Jul 2013 | 10:35:32 UTC - in response to Message 31401.

The units are running otherwise fine, taking me between 10.5 to 11.5 hours to complete on my windows 7 computer, so please don't cancel them, like you did last time.

I agree. They are running fine on both of my 660's and 650 Ti.
It is too early to see what the error rate is; it may be a little more than the Nathans, but not very much thus far.

Kenneth Larsen
Send message
Joined: 11 Feb 09
Posts: 6
Credit: 162,114,203
RAC: 19
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31403 - Posted: 11 Jul 2013 | 10:50:29 UTC

How long should I let them run, then? If they were using the GPU 100% I'd have let them run, but they don't. That's why I cancelled them, fearing they would take days to run or eventually error out.
Unfortunately the only way I can see how much they use the GPU is by watching the temperature; usually it stays around 60-62 degrees, with the new WUs it's around 50. Idle is 35-40. Oh, and ambient temperature is higher than normal the last few days (35-38 degrees), so that's not the issue.
I'm not going to downgrade the nvidia-drivers as I'm using the computers for many other things beside crunching, like games.

Anyway, I suppose I'll give it another go then and let it run to 100%, then report back here.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 178
Credit: 132,357,411
RAC: 1,373
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 31404 - Posted: 11 Jul 2013 | 10:55:23 UTC - in response to Message 31388.
Last modified: 11 Jul 2013 | 10:58:08 UTC

Hi, Jim:

Both my NVIDIA GTX 650 Ti GPUs show driver 320.18 dated 12 May 2013.

John

John,

What driver do you (and your system builder) like these days?
I haven't noticed any problems with recent Nvidia drivers, but I can't say that they don't occur either.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31405 - Posted: 11 Jul 2013 | 11:11:27 UTC - in response to Message 31404.

Hi, Jim:

Both my NVIDIA GTX 650 Ti GPUs show driver 320.18 dated 12 May 2013.

John

John,

What driver do you (and your system builder) like these days?
I haven't noticed any problems with recent Nvidia drivers, but I can't say that they don't occur either.

Thanks. I was using 320.49 with no problems on my 650 Ti, but thought I would go back to 310.90 as a test.
But in general (unlike AMD drivers), the Nvidia ones all work the same for me.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31406 - Posted: 11 Jul 2013 | 12:36:49 UTC - in response to Message 31403.

I don´t see a difference in temperature still around 66°C just like Nathan´s LR´s.
But I have never seen any use 100% GPU load, now Noelia is around 93%.
All running fine still, but a little slower than Nathan´s but keep in mind that Noelia is using different functionality, so these WU´s can´t be compared one to one I guess.
____________
Greetings from TJ

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31407 - Posted: 11 Jul 2013 | 13:50:12 UTC - in response to Message 31392.

Argh! These NOELIA_xMG_RUN WUs are taking too long on my 650Ti, around 44h!! I aborted two of them, hoping for a NOELIA_klebe or NATHAN, but nope, it was one of these beasts or nothing..

What times do you guys see for these WUs?

Same here. I think Noelia has thrown us another curve without notice. Just as the NOELIA_klebe will not run on cards with less than 1GB, these NOELIA_xMG_RUN WUs look as if they run OK on 2GB cards but extremely slow on 1GB. Some of the earlier NATHAN WUs had a similar behavior on < 1GB GPUs and ran at 1/2 speed. He fixed them and the later NATHANs then ran fine on sub 1GB cards. I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31408 - Posted: 11 Jul 2013 | 14:26:04 UTC - in response to Message 31407.

Same here. I think Noelia has thrown us another curve without notice. Just as the NOELIA_klebe will not run on cards with less than 1GB, these NOELIA_xMG_RUN WUs look as if they run OK on 2GB cards but extremely slow on 1GB. Some of the earlier NATHAN WUs had a similar behavior on < 1GB GPUs and ran at 1/2 speed. He fixed them and the later NATHANs then ran fine on sub 1GB cards. I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

My experiences above were only on the NOELIA_klebe, so I don't know what problems will occur on the NOELIA_xMG_RUN. But my 660s have 2GB, and my 650 Ti has 1GB, so I guess I will find out.

Maybe they should have an opt-in for these larger sizes? I am sure there are plenty of cards around that can do them, it is just a question of getting the right work unit on the right card.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31409 - Posted: 11 Jul 2013 | 14:32:30 UTC - in response to Message 31408.

Same here. I think Noelia has thrown us another curve without notice. Just as the NOELIA_klebe will not run on cards with less than 1GB, these NOELIA_xMG_RUN WUs look as if they run OK on 2GB cards but extremely slow on 1GB. Some of the earlier NATHAN WUs had a similar behavior on < 1GB GPUs and ran at 1/2 speed. He fixed them and the later NATHANs then ran fine on sub 1GB cards. I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

My experiences above were only on the NOELIA_klebe, so I don't know what problems will occur on the NOELIA_xMG_RUN. But my 660s have 2GB, and my 650 Ti has 1GB, so I guess I will find out.

Maybe they should have an opt-in for these larger sizes? I am sure there are plenty of cards around that can do them, it is just a question of getting the right work unit on the right card.

We've asked and asked and it should be simple to do. Maybe don't know how, don't care? Who knows.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 810,073,458
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31410 - Posted: 11 Jul 2013 | 15:58:26 UTC
Last modified: 11 Jul 2013 | 16:00:20 UTC

These MG Units are the first ones that run bit different in relation to all others before here ^^ The single 560TI 448 Cores in the Pentium4 system run MG Units a bit faster then one of the two 570 Cards in a Core2Duo System. I would suggest it is the card in the x4 Slot. But never saw before the 570 with higher runtime then the 560. Seems first time to me that a bit PCIe bandwidth is needed.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Crunching for my deceased Dog who had "good" Braincancer..

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31411 - Posted: 11 Jul 2013 | 16:36:05 UTC - in response to Message 31410.

These MG Units are the first ones that run bit different in relation to all others before here ^^ The single 560TI 448 Cores in the Pentium4 system run MG Units a bit faster then one of the two 570 Cards in a Core2Duo System. I would suggest it is the card in the x4 Slot. But never saw before the 570 with higher runtime then the 560. Seems first time to me that a bit PCIe bandwidth is needed.

Also looks like they run OK in 1279MB, so it seems they need more than 1024MB but somewhere less than 1279MB to run at an acceptable speed. Unfortunately that's too bad for most of us.

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31412 - Posted: 11 Jul 2013 | 16:46:11 UTC

I have a 7MG work unit running on 650 Ti on windows 7, GPU usage is 98% and it is 50% after 20 hours. Similar 7MG units are running on my 470 with about 85% done after 14 hours, and 660 Ti about 55% done after 5.5 hours, so I guess it is the GPU memory that is the problem. 650 Ti has 1GB, 470 has 1280MB, and 660 Ti has 2GB.
____________
XtremeSystems.org - #1 Team in GPUGrid

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31413 - Posted: 11 Jul 2013 | 17:05:42 UTC

It just took 8 hours 27 minutes for my 670 to finish a 7MG_RUN with a 151MB upload, I noticed that the NATHAN's used over 95% of the CPU while the NOELIA's use less than 50% of the CPU. My 680's and 770 take about 7 hours 40 minutes, I have no choice but to use the 320.xx series drivers other wise the 770 won't work, the 320.49 seems to be fine but the other 320's are buggy (it's all over the internet).

It looks like were going to have to buck up and bite the bullet and get through these work units, it's been running in the mid 90's F here where I live in the Sierra's and I've had to shut down half my rigs for 6 - 8 hours every day. The San Joaquin Valley has been hitting 105°-110°F, so I guess I'm lucky I live at 5000 feet, these are normal temperatures for here in the summer and it's tolerable with the humidity at 25%. I just wish it would cool off so I can keep all my rigs running 24/7.

Kenneth Larsen
Send message
Joined: 11 Feb 09
Posts: 6
Credit: 162,114,203
RAC: 19
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31414 - Posted: 11 Jul 2013 | 18:00:49 UTC

Just for your information, my graphics card is a GTX660 with 2GB of memory, and I'm still unable to these WUs well. Maybe it's different on Linux than on Windows?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2671
Credit: 753,908,224
RAC: 504,143
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31415 - Posted: 11 Jul 2013 | 19:01:30 UTC - in response to Message 31414.

Kenneth.. sorry there was no clear response before: nVidia driver 319 has been shown by at least 2 others to cause the issue you describing. Downgrading to 310 has fixed it in both cases, so give it a try.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,686,789,844
RAC: 10,151,546
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31418 - Posted: 11 Jul 2013 | 20:02:13 UTC
Last modified: 11 Jul 2013 | 20:03:01 UTC

A Noelia 7MG is using 1329MB RAM on my GTX 480 (Win7 x64), another one (on a GTX 670 WinXP x64) is started at 1188MB memory usage, and it's slowly rising. So these workunits won't fit in 1GB RAM. I had a stuck workunit, it made no progress after 6 hours, so I've aborted it, however it's page shows 0 sec runtime. The subsequent workunit also stuck at 0% progress, but a system restart fixed this situation.

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31420 - Posted: 11 Jul 2013 | 20:50:59 UTC - in response to Message 31414.

Just for your information, my graphics card is a GTX660 with 2GB of memory, and I'm still unable to these WUs well. Maybe it's different on Linux than on Windows?


One of the problems with Linux is lack of good monitoring and GPU clock adjusting software, in Windows, when one wu finished and another started, especially when going from a NATHAN to a NOELIA, my GPU clock would change. Sometimes it would boost too high and cause errors, I am able to create profiles in PrecisionX and reset everything with one click.

I know there aren't very good apps for Linux (at least that I'm aware of) for doing this, it would certainly help. I wish someone would write a good one soon because I'll be switching to it when NVidia stops making drivers for XP x64 and Microsoft stops supporting it next year.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31422 - Posted: 11 Jul 2013 | 21:29:10 UTC - in response to Message 31407.

I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

Since all 9 of my NVidia GPUs are 1GB or less, and I can't get anything but these #*&$% NOELIA_1MG WUs, I'm off the project till something changes here. Think I'll have a lot of company,,, Sad :-(

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31423 - Posted: 11 Jul 2013 | 23:07:31 UTC - in response to Message 31422.
Last modified: 11 Jul 2013 | 23:29:59 UTC

This batch could well have a GDDR capacity issue for anything other than cards with 2GB GDDR (which suggests CUDA routine selection isn't working/doesn't exist in this run), and possibly a separate issue with Linux... I will plug in a rig tomorrow with a GTX650Ti and 304.88 to confirm this, but it's obviously going to take a while!

I would be reluctant to entirely blame Linux drivers however - my GTX470 WU took 25h 20min (too long), and about 10min to upload, and I'm using the 304.88 driver (which was fine up to now, and on two different systems). The numbers don't look right for a ~2.4 times downclock (My GTX470 will only downclock to 405 or 50MHz) and NVidia X Server tells me that my 470 is at it's FOC settings of 656MHz and 60°C (dual fan @72% and open case).

On Windows 7 and GPU's with 2GB GDDR I've had no issues (314.22 drivers).

The memory controller load is higher on these WU's (38% for a GTX660Ti and 31% on a GTX660, W7) and the app is 6.18 (an older one), so this looks like a basal (marker) run. I don't expect we will be seeing months of work on the 6.18 app. More likely a few days.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 463
Credit: 1,131,150,255
RAC: 42,983
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31424 - Posted: 11 Jul 2013 | 23:45:10 UTC - in response to Message 31422.

Since all 9 of my NVidia GPUs are 1GB or less, and I can't get anything but these #*&$% NOELIA_1MG WUs, I'm off the project till something changes here. Think I'll have a lot of company,,, Sad :-(

A NOELIA_1MG just crashed on my GTX 660 with 2 GB memory after 7 hours run time, so there is no guarantee that even more memory will fix it (Win7 64-bit, 314.22 drivers, supported by a virtual core of an i7-3770).

The other NOELIAs that I have received have been fine, though that is not the entire set. There are some good ones and some not-so-good ones.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 810,073,458
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31425 - Posted: 12 Jul 2013 | 6:33:21 UTC
Last modified: 12 Jul 2013 | 6:37:00 UTC

This one 1MG crashed too on several machines like my superstable one http://www.gpugrid.net/workunit.php?wuid=4583900
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Crunching for my deceased Dog who had "good" Braincancer..

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 31426 - Posted: 12 Jul 2013 | 8:45:32 UTC - in response to Message 31422.

I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

Since all 9 of my NVidia GPUs are 1GB or less, and I can't get anything but these #*&$% NOELIA_1MG WUs, I'm off the project till something changes here. Think I'll have a lot of company,,, Sad :-(

I agree 100% and have already moved my 650Ti to Einstein. Alas, Einstein's credit is SO lame!
____________

tbret
Send message
Joined: 2 Jan 13
Posts: 5
Credit: 233,329,525
RAC: 0
Level
Leu
Scientific publications
watwatwatwat
Message 31428 - Posted: 12 Jul 2013 | 11:41:28 UTC

This is just weird.

NVIDIA 320.18

WinXP computer with two reference 560tis is showing 14.5 hours elapsed, 3.5 hours to go, but 48% done. Ok..., that's not even close. GPU usage, 94-99%.

Another computer, another pair of 560Ti cards, another 320.18 driver -

Win7 Pro 28.25 hours elapsed, 2 hours to go, 92% done, GPU usage 97-99%.

Really? 30 hours on a 560Ti? 30 hours?

The 560s (no Ti) are warm and 96%, looks like they will take 18 hours. Those are 2GB cards.

Looks like 24 hours on a different SOC 560Ti but only 16 hours on a 560Ti-448. So, yeah, it looks like 1GB is just a little too little RAM, but 1.2GB is better. It's not great because a 560Ti-448 should fly compared to a 560.

The GTX470s (1.2GB) are doing them in about 15.3 hours. That makes them faster than the 560Ti-448. Yeah, that was a thing that made me go "Hmmmm."

The 660Tis and 670s are doing much, much better, of course; about 11.5 hours.

I've set NNT on my seven 560Tis. I've had multiple driver crashes and compute errors after 7 or 8 hours of crunching. I'll let what I've got either crunch or crash, but I don't want any more of these work units on a 560Ti and I don't want to have to change my drivers every time the work changes, so while I believe a downgrade might work, I'm unwilling. (call me lazy)

Ordinarily I'd say, "I don't care about the credits" but the fact is I'm in a little race with a friend so this time I do care. I don't care enough to get mad or upset, but I care a little.

Oh, all the CPU cores are idle other than feeding the GPUs in every case.

I'm just reporting-in.

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31430 - Posted: 12 Jul 2013 | 13:30:57 UTC
Last modified: 12 Jul 2013 | 13:31:36 UTC

I didn´t want to complain earlier, but with more than ten WU´s failed since yesterday (not different than the other days) I feel compelled to do it. The Noelias, new and not so new ones, are failing on all my rigs and cards (690s and 770s).
Can we please change them?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31432 - Posted: 12 Jul 2013 | 14:33:00 UTC

On my GTX550Ti with 1GB RAM a Noelia (klebe run) took 46h22m to finish without error. On a WinVista x86 rig with 2 CPU cores doing Einstein@home and nVidia driver 320.49.

On the 660 the Noelia have "normal" run-times. One with Vista x64 and driver 420.49 as well, the other on Win7 x64 and driver 320.18

With 1 error in 12 WU's I don't see the need to update the drivers just yet.

Happy crunching

____________
Greetings from TJ

tbret
Send message
Joined: 2 Jan 13
Posts: 5
Credit: 233,329,525
RAC: 0
Level
Leu
Scientific publications
watwatwatwat
Message 31434 - Posted: 12 Jul 2013 | 14:55:53 UTC

This isn't funny --- ok, so it is funny, but only because nothing burned-up:

I've now caught two computers, both Windows, both running Precision X, one running 3x 560 and one 3x 560Ti, resetting my manually set 100% fan speeds back to "auto" but only on ONE card. Really weird. Really strange. In both cases it was the middle (read: hottest) card (Device 1).

That's never happened before, but I'm guessing it is a driver-related failure caused by these new work units.

Oh, and the "time remaining" is increasing, so the 60% completed I reported earlier is probably better than the estimated time remaining.

AND as if that weren't enough, it's taking close to two hours to upload the 147MB results at 27.5kbps.

I'd say someone needs to take this work either:
A) back to the drawing board
B) out to the woodshed for a serious talking-to

IFRS
Send message
Joined: 12 Dec 11
Posts: 89
Credit: 2,656,811,083
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31435 - Posted: 12 Jul 2013 | 14:59:41 UTC - in response to Message 31434.
Last modified: 12 Jul 2013 | 15:01:01 UTC

This isn't funny --- ok, so it is funny, but only because nothing burned-up:

I've now caught two computers, both Windows, both running Precision X, one running 3x 560 and one 3x 560Ti, resetting my manually set 100% fan speeds back to "auto" but only on ONE card. Really weird. Really strange. In both cases it was the middle (read: hottest) card (Device 1).


Yeah. That happens when a WU fails on my machines too. But since it didn´t restart automatically a new WU, nothing burns. But it will loose all your precision X presets and waste the processing so far. Upseting mode on.

Edit: will say again: can we (ok you, the project guys) change the wus? They aren´t good and are upseting users.

klepel
Send message
Joined: 23 Dec 09
Posts: 138
Credit: 1,837,008,477
RAC: 1,405,486
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31436 - Posted: 12 Jul 2013 | 15:02:44 UTC

I can confirm, the following:
GTX 570, Nvidea 311.06: 56x2-NOELIA_1MG_RUN1-0-2-RND1781_0 success with 71,288.79 s runtime.
GTX 670, Nvidea 311.06: 97x3-NOELIA_1MG_RUN-0-2-RND9119_0 success with 38,157.74 s runtime.
NOELIA_klebe tasks run on all three computers: GTX650 TI (2GB), GTX 570 and GTX 670 without mayor hickups, exept two failed early on the GTX570, since then no problem.

However I noticed that this NOELIA 1MG and klebe tasks need around 900 to 1350 MB of GPU Memory.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,498,672,554
RAC: 418,871
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31446 - Posted: 12 Jul 2013 | 22:12:55 UTC

HELP! Nathan, where are you?

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 181
Credit: 221,824,715
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 31447 - Posted: 12 Jul 2013 | 22:42:04 UTC

These newer NOELIA klebe tasks seem to be taking longer and longer to finish. The old NOELIAs were 9-10hours. Then it went to 12-13 hours. This latest one is going to be in the 15-16 hour range using 750MB of memory on an MSI660TI PE.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,837,071,099
RAC: 365,113
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31451 - Posted: 13 Jul 2013 | 0:20:50 UTC - in response to Message 31446.

HELP! Nathan, where are you?

He's hiding in the Short queue :)

These newer NOELIA klebe tasks seem to be taking longer and longer to finish. The old NOELIAs were 9-10hours. Then it went to 12-13 hours. This latest one is going to be in the 15-16 hour range using 750MB of memory on an MSI660TI PE.

Don't take it personally, the present Looooong NOELIA WU's don't like anyone.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

flashawk
Send message
Joined: 18 Jun 12
Posts: 241
Credit: 1,694,533,797
RAC: 643,557
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 31452 - Posted: 13 Jul 2013 | 0:29:24 UTC - in response to Message 31446.

HELP! Nathan, where are you?


He's on vacation, I see that down clocking my cards a little has helped reduce my error rate. There's only a finite amount of wu's here, we got to bite the bullet and chug through the weekend, I think Nathan will be back on Monday. He should be able to sort things out.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31455 - Posted: 13 Jul 2013 | 7:54:04 UTC - in response to Message 31452.
Last modified: 13 Jul 2013 | 7:55:34 UTC

HELP! Nathan, where are you?


He's on vacation, I see that down clocking my cards a little has helped reduce my error rate. There's only a finite amount of wu's here, we got to bite the bullet and chug through the weekend, I think Nathan will be back on Monday. He should be able to sort things out.

If I have read previous posts from Nathan correctly, every scientist does her or his own WU's with different functionality, thus Nathan would not interfere (at least much).
I have my clocks still high and the Noelia's that do not e