Advanced search

Message boards : Graphics cards (GPUs) : Work units would not start without re-loading BOINC

Author Message
Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 4144 - Posted: 4 Dec 2008 | 19:59:45 UTC

Hi,

I joined about a week ago and have managed to complete 4 work units so far. I have an Intel Q6700 (core 2 quad), 2GB of ram and 4.23GB of virtual memory, an XFX NVIDIA GeForce 8600 GTS running under Vista Home Premium 32 bit, Boinc 6.3.21, CUDA 2.0., NVIDIA driver s 178.08

This evening I saw a GPU task had just started but was in danger of missing its deadline. I aborted that task and the task next in the queue as that was on a similar deadline. Boinc immediately got 2 replacements. However none of the four wu's in the queue would start processing. I reported the aborted tasks and still no GPU task would start. I even reset the project. In the end I shut down the Boinc client and waited for the auto-restart. That did the trick and a GPU task is now running. But what a hassle!

GPU shares the computer with 4 Cosmology tasks usually and these were running this evening. The resource share is 100 for both projects (50:50).

The GPU tasks take just over a day to complete and I'm wondering if I've got the preferred software mix right. Would I be better advised to try CUDA 2.1 Beta and associated drivers or a different version of BOINC?


Thanks in advance,

Phoneman1

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4146 - Posted: 4 Dec 2008 | 20:59:38 UTC

In your case I'd probably upgrade to 178.24 drivers because they just work. The 180 series occasionally had some strange installation issues, so I didn't try them yet.

MrS
____________
Scanning for our furry friends since Jan 2002

Temujin
Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 4148 - Posted: 4 Dec 2008 | 21:21:37 UTC - in response to Message 4144.

This evening I saw a GPU task had just started but was in danger of missing its deadline. I aborted that task and the task next in the queue as that was on a similar deadline. Boinc immediately got 2 replacements. However none of the four wu's in the queue would start processing. I reported the aborted tasks and still no GPU task would start. I even reset the project. In the end I shut down the Boinc client and waited for the auto-restart. That did the trick and a GPU task is now running. But what a hassle!

I used to get that sort of behaviour but only on machines that were new to GPUGrid. My machines run GPUGrid and Seti.
My solution was to suspend Seti which would force a GPUGrid WU to start, then resume Seti.
I had to do this untill GPUGrid had "bedded in" and it now takes care of itself and always runs a GPUGrid WU.




Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 68
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 4151 - Posted: 4 Dec 2008 | 22:10:03 UTC - in response to Message 4144.
Last modified: 4 Dec 2008 | 22:13:33 UTC

Wasn't that a problem with 6.3 Clients that they don't started GPU tasks immediately? I still had a 6.3.x on a Linux machine that did that.

After an upgrade to 6.4.2 it works fine...
____________

pixelicious.at - my little photoblog

pelpolaris
Send message
Joined: 10 Nov 08
Posts: 8
Credit: 876,616,559
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4155 - Posted: 5 Dec 2008 | 10:53:54 UTC - in response to Message 4144.

Almost the same here too with Q6700 + 4GB-ram , NVIDIA 8800GS & Vista HP-64, Boinc 6.3.21, CUDA 2.0, Nvidia driver 178.08

I did notice that High Priority running disappeared for few days ago and one issue of that is the lack of new WU-loading from the server due to the change of reference for the client. Before and as explain somewhere else on this forum, the resource allocation hadn't any impact on the GPU-activity. This seem to have disappeared. However and when the four or two WU (se below) are computed, I only detach and reattach to get new ones.

Those issues are also valid for a less performant system based on PD 915 with XP, NV8800, & so on.

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4167 - Posted: 5 Dec 2008 | 19:41:37 UTC - in response to Message 4144.

Hi,

I joined about a week ago and have managed to complete 4 work units so far. I have an Intel Q6700 (core 2 quad), 2GB of ram and 4.23GB of virtual memory, an XFX NVIDIA GeForce 8600 GTS running under Vista Home Premium 32 bit, Boinc 6.3.21, CUDA 2.0., NVIDIA driver s 178.08

This evening I saw a GPU task had just started but was in danger of missing its deadline. I aborted that task and the task next in the queue as that was on a similar deadline. Boinc immediately got 2 replacements. However none of the four wu's in the queue would start processing. I reported the aborted tasks and still no GPU task would start. I even reset the project. In the end I shut down the Boinc client and waited for the auto-restart. That did the trick and a GPU task is now running. But what a hassle!

GPU shares the computer with 4 Cosmology tasks usually and these were running this evening. The resource share is 100 for both projects (50:50).

The GPU tasks take just over a day to complete and I'm wondering if I've got the preferred software mix right. Would I be better advised to try CUDA 2.1 Beta and associated drivers or a different version of BOINC?


Thanks in advance,

Phoneman1

I would try to upgrade to the 6.4.X client. Many small tweeks have been made. Occasionally there would be a case, like yours, and most of these ahve been fixed (in theory).

Another option you can try is suspend the cosmology project. That sometimes kicks the gpu project and it starts, then resume cosmology. What has happened is due to the canceling of work, the internal numbers are now off and boinc determines you do not need to run a gpu task, even though it is suppose to always run one. This is a minor flaw in some of the internal logic which is suppose to have been corrected in 6.4.2 and above. Sometimes also waiting until the next task switch (within the default hour), kicks things back to normal.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4169 - Posted: 5 Dec 2008 | 20:17:14 UTC - in response to Message 4155.

Almost the same here too...


Odd that you're also running 178.08. I can't remember, was it 178.06 or 178.08 which had issues anyway?

MrS
____________
Scanning for our furry friends since Jan 2002

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 4179 - Posted: 6 Dec 2008 | 9:26:38 UTC

I'm going to up upgrade the Nvidia drivers to 178.24 and Boinc 6.4.2 today.

I'll let the dust settle to decide on 6.4.3 - I see there are more issues with that elsewhere on this forum.

I forgot to mention in my earlier post I did suspend the other project
in an attempt to get a GPU task started. Either I didn't wait long enough (quite possible) or it didn't work for me!!!

Thanks to all who took the time and trouble to reply.

Incidentally my other machine has such a feable graphics card it could not complete a GPU task in 4 days - a solution to that problem is on order!

Phoneman1

Neil A
Send message
Joined: 9 Oct 08
Posts: 50
Credit: 12,676,739
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 4182 - Posted: 6 Dec 2008 | 20:01:21 UTC - in response to Message 4179.

I access one of my home crunching computers a lot by MS Terminal Services. I have a feeling that if I access the computer remotely from my work laptop, that when it finishes completing the current work unit on my CPU (8800GT) that the next one doesn't start correctly and errors out. IF it connect to the computer directly at its console again before the work unit completes, then the next work unit starts. This is a theory at this point only.

I can only guess that since I have connected from a laptop that doesn't have a valid GPU to process this type of work, and if I remain connected to it, that when the current WU completes and the next WU starts, it sees the GPU card on my laptop instead and errs out?

This is only a guess on my part, and I'm still trying to prove this theory. Does anybody else access their crunching home computer a lot via terminal services and see this behavior?

Does anybody know what should happen to a crunching computer if you access it remotely from a non-compatible laptop via terminal services?
____________
Crunching for the benefit of humanity and in memory of my dad and other family members.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4258 - Posted: 11 Dec 2008 | 21:27:41 UTC - in response to Message 4182.

I access one of my home crunching computers a lot by MS Terminal Services. I have a feeling that if I access the computer remotely from my work laptop, that when it finishes completing the current work unit on my CPU (8800GT) that the next one doesn't start correctly and errors out. IF it connect to the computer directly at its console again before the work unit completes, then the next work unit starts. This is a theory at this point only.

I can only guess that since I have connected from a laptop that doesn't have a valid GPU to process this type of work, and if I remain connected to it, that when the current WU completes and the next WU starts, it sees the GPU card on my laptop instead and errs out?

This is only a guess on my part, and I'm still trying to prove this theory. Does anybody else access their crunching home computer a lot via terminal services and see this behavior?

Does anybody know what should happen to a crunching computer if you access it remotely from a non-compatible laptop via terminal services?


I access constantly using remote desktop to a home premium vista 64 and have not seen this problem. It did take me a while to get RD nstalled and working on the vista home premium but that is another story. When your 8600 errors out, does it exhaust its gpugrid WU queue? When my 9800gtx errors, all subsequent wu's error and my queue is emptied plus all the downloads get errors until it reaches the user daily quota limit. Periodically, vista reboots after bill gates get his updates onto my system. I am wondering if that causes a problem since a reboot does not log in???

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5272 - Posted: 4 Jan 2009 | 18:34:36 UTC - in response to Message 4258.


I access constantly using remote desktop to a home premium vista 64 and have not seen this problem.


Sorry, I was talking about remote desktop not working and totally mis-read the original thread.

Indeed, CUDA fails immediately as soon as remote desktop logs in. Ultra VNC works fine, but I have to set autologon and put VNC into the startup task as VNC does not work in session 0 (service session) on Vista ever since SP1.

I went and posted to this old thread just to clear up my error.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5276 - Posted: 4 Jan 2009 | 20:20:52 UTC

All the way back to the original post .... :)

All versions of BOINC Manager will not handle the queuing of work correctly ... the problem is that all versions assume that there is no such thing as a GPU ... and that all tasks run on the CPU ... as far as scheduling goes. So ...

THIS MEANS that if you have 50 / 50 share it is possible that the GPU will run out of work and BOINC Manager will be fine with that ... then it will later try to get work and sadly with the server issues that is still hit and miss ...

Though strangely I got two tasks into my queue almost as if it was working as it should ...

I guess what I am saying is that as far as keeping the GPU running what has been working for me on my XP Pro hosts is 180.84 video drivers, 6.5.0 BOINC Manager (yes it is beta and no it does not have any fixes for what ails us), resource share 500 (highest on the machine indicates 35% RS on the machine) ... the prime machine is running a mix of SIMAP, Cosmology, AI, LHC and Pirates (the last two only when I can get tasks obviously) with the remains of WCG tasks which should be done sometime tomorrow (if my calculations are right I should have enough work queued to get CEP Gold and as I will be still working WCG elsewhere I will still rack up earnings) ...

WIth all of that, I still babysit the machine and at times it will run dry but on the manual Update will pull work ... many times update before the queue is empty (like you have the last one in work and have a few minutes to go ...) you may or may not be successful getting new work ...

Early days ... were it now not a new goal and essentially "free" I would say a Pox on you (see my notes in Rosetta when they crashed half of my tasks after wasting upwards of 20 hours on a 6 hour task), but I digress ...

I am really of a mind that many of the manual interventions we have been doing have been nothing other than a pigeon test and none of them really affect the situation ... we only think they do in that we get work after performing the elaborate ritual ... Mine was up to 4 hours and involved, among other things ... wait a moment, why should I give away the secret of my magic?

:)

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 5280 - Posted: 4 Jan 2009 | 22:24:12 UTC - in response to Message 5276.

All the way back to the original post .... :)

I really should have updated this thread earlier. I haven't had a recurrence of any GPU task sat in the input queue not starting immediately the current GPU task ends, since I moved on to Boinc 6.4.2, 6.4.5 or or on again to 6.5.0.

The latest version seems to have no problem keeping a GPU task running all the time although I haven't tried running a second GPU project or two cards in one machine or both of these complications.

I agree work fetch for GPU tasks is a mess, but that wasn't the subject of my original post. I had work sitting in the queue but it wouldn't start to run when the previouis task finished!

If you do get GPU work downloaded Boinc 6.5.0 seems to handle the rest reasonably well with just the estimated completion time being most noticeably wrong. Having said that it doesn't in itself seem to stop more GPU work being downloaded.

Phoneman1

Post to thread

Message boards : Graphics cards (GPUs) : Work units would not start without re-loading BOINC

//