Advanced search

Message boards : Number crunching : BOINC Scheduler broken badly

Author Message
Roland Hughes
Send message
Joined: 21 Sep 14
Posts: 3
Credit: 40,323,086
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 38874 - Posted: 8 Nov 2014 | 15:13:00 UTC

All, I have 64-bit Mint-17 KDE with all updates applied running on a 6-core AMD with 4G RAM and an SSD. I installed a 384 cude core video card and the Nvidia driver for it. I subscribe to a few projects and get lots of work for them. The BOINC scheduler seems to be busted though. Each night after work I reboot this machine to the 64-bit partition running BOINC. It runs either all night or all weekend churning through work units and sending responses in the wee hours of the morning.

For roughly a week there has been a GPUGRID work unit "Ready to start" with an estimated 41+ hour run time. Today is the 8th of November. Due date is 9th of November. BOINC is busily processing stuff for SETI which has due dates the third week of December. The GPUGRID work unit has not run or even attempted to run. Really doesn't matter now since there are not 41 clock hours left. I doubt it could use all of the cuda cores to trim a day off its run-time.

Just wanted to pass this along.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1618
Credit: 8,606,094,351
RAC: 16,317,055
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38875 - Posted: 8 Nov 2014 | 16:12:55 UTC - in response to Message 38874.

I saw your similar post on the BOINC message board, and pointed out there that you have much bigger problems than an errant scheduler: every single one of your recent attempted tasks, on both computers, has failed with an error.

Somebody with experience of Linux Mint NVidia drivers may be able to take a look and help you.

Roland Hughes
Send message
Joined: 21 Sep 14
Posts: 3
Credit: 40,323,086
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 38879 - Posted: 9 Nov 2014 | 12:13:18 UTC - in response to Message 38875.

Thanks.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38880 - Posted: 9 Nov 2014 | 12:52:40 UTC - in response to Message 38874.

Roland,

If your client has work but it's not running that must be a client problem, not a server one.
From looking at your computer's records, it's clear to me that GPUGRID WUs aren't running correctly because you need to update your NVIDIA drivers.

Matt

Roland Hughes
Send message
Joined: 21 Sep 14
Posts: 3
Credit: 40,323,086
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 38881 - Posted: 9 Nov 2014 | 13:37:06 UTC - in response to Message 38880.

Matt,

I installed the recommended 331 version on both machines via the Drivers option in settings. SETI, Rosetta and Lattice are all requesting Nvidia GPU work units and executing flawlessly.

Just before writing this I reverted my 6-core to the default non-proprietary driver and rebooted. Verified no GPU found by BOINC. Then I selected the 331 "updates" driver, installed and rebooted. GPUGrid pulled down 2 work units. One started and failed almost instantly.

For now I simply removed GPUGrid from my 6-core since this is obviously an issue with the project. Either the project is expecting features/functions well beyond the 331 drivers available for Mint 17 or there is something inherently wrong with the project. 3 other projects seem to happily using the configuration.

One oddity of note. The quad-core appears to have a GPUGrid WU in progress with just over an hour left to go. I will wait and see how that completes. If it succeeds I will leave GPUGrid on that machine. That machine has the same video card and "recommended" 331 driver that the 6-core had and given the results so far I'm not expecting a happy ending. It is a shame to waste both my satellite bandwidth and processor time on work units which will fail when so many other work units could use it to succeed.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1618
Credit: 8,606,094,351
RAC: 16,317,055
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38882 - Posted: 9 Nov 2014 | 15:03:19 UTC - in response to Message 38881.

The tasks in progress on your 4-core computer are all multi-threaded CPU tasks, and nothing to do with the CUDA questions at all.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38885 - Posted: 9 Nov 2014 | 19:45:22 UTC - in response to Message 38881.
Last modified: 9 Nov 2014 | 20:03:43 UTC

Roland,


For Linux machines you need either:

* driver 343 or later

or

* a development BOINC client that correctly reports driver version to our server. You can find a copy of a patch client from

http://www.gpugrid.net/forum_thread.php?id=3736&nowrap=true#36577


Matt

mikey
Send message
Joined: 2 Jan 09
Posts: 297
Credit: 5,835,411,115
RAC: 31,130,071
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38893 - Posted: 11 Nov 2014 | 13:01:30 UTC - in response to Message 38881.
Last modified: 11 Nov 2014 | 13:15:10 UTC

Matt,

I installed the recommended 331 version on both machines via the Drivers option in settings. SETI, Rosetta and Lattice are all requesting Nvidia GPU work units and executing flawlessly.
J


MOST new gpu drivers udpates support better gaming functions, not crunching stuff. In fact ALOT of driver updates over the years have actually been BAD for crunching, ie several in a row dropped our output by 10% of more. So unless you are a gamer too it is often best to stick with one that works rather than always upgrading to the latest and greatest. Also each project doesn't always have time to 'get right on it' when the new drivers come out, so the units may not take the best advantage of all the newest features in the latest drivers.

Post to thread

Message boards : Number crunching : BOINC Scheduler broken badly

//