Advanced search

Message boards : Graphics cards (GPUs) : Massive WU

Author Message
Profile Al Dente
Send message
Joined: 27 Mar 09
Posts: 2
Credit: 2,505,666
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 9359 - Posted: 6 May 2009 | 7:36:40 UTC
Last modified: 6 May 2009 | 7:40:09 UTC

93-KASHIF_HIVPR_dim_ba3-3-100-RND0780, running on a Q6600 with GTS 250.

This WU has been running ~30 hours, showing 15% complete, with a deadline of 9th May. If the % complete is about right, it's going to take another 7+ days.

Is this right, or should I abort it?

I've suspended it for the moment, awaiting a reply.

Al Dente

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9361 - Posted: 6 May 2009 | 8:05:30 UTC - in response to Message 9359.

93-KASHIF_HIVPR_dim_ba3-3-100-RND0780, running on a Q6600 with GTS 250.

This WU has been running ~30 hours, showing 15% complete, with a deadline of 9th May. If the % complete is about right, it's going to take another 7+ days.

Is this right, or should I abort it?

I've suspended it for the moment, awaiting a reply.

Al Dente


Well it could be the "never ending wu" bug in boinc 6.6.20. You might want to try 6.6.23 which seems to have fixed it. Don't bother with the later versions as they have other issues at the moment.

However it may just be a big wu, although the GTS250's usually manage to get through them.
____________
BOINC blog

Profile Al Dente
Send message
Joined: 27 Mar 09
Posts: 2
Credit: 2,505,666
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 9363 - Posted: 6 May 2009 | 8:44:05 UTC - in response to Message 9361.

Thanks.

I looked around for 6.6.23, but couldn't find it. Checked my current version and found it was 6.6.18!!! So upped it to 6.6.20 and the suspect WU changed to show 4 hours run @ 15%!!!

I've left it running to see what happens over the next couple of hours.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9365 - Posted: 6 May 2009 | 9:06:55 UTC - in response to Message 9363.

Thanks.

I looked around for 6.6.23, but couldn't find it. Checked my current version and found it was 6.6.18!!! So upped it to 6.6.20 and the suspect WU changed to show 4 hours run @ 15%!!!

I've left it running to see what happens over the next couple of hours.


6.6.? though 6.6.20 have a nasty bug where tasks, seemingly at random will take up to 4 times longer. It does not hit all tasks, but I have seen the effect on my own systems with GPU Grid and on a friend's with Astropulse on the CPU ...

If you must run a 20 version, 6.6.23 is better ...

Again, some where in the teens, there is another issue with the 6.6.x versions, still not addressed that can cause the debts to get out of whack and suddenly you will not be getting work for various projects. On My i7 I have to reset debts about every 2 days, on my Q9300 it takes a week or more ... YMMV ...

Profile Dieter Matuschek
Avatar
Send message
Joined: 28 Dec 08
Posts: 58
Credit: 231,884,297
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9434 - Posted: 7 May 2009 | 13:56:17 UTC - in response to Message 9363.

So upped it to 6.6.20 and the suspect WU changed to show 4 hours run @ 15%

It can be that a restart would have done the same. These WUs seem to hang sometimes.
(I guess that this is independent from BOINC and due to the algorithm of the application.)

____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9440 - Posted: 7 May 2009 | 15:50:36 UTC - in response to Message 9363.

I looked around for 6.6.23, but couldn't find it.


Just noticed that I did not provide a link to 6.6.23

You have to find the appropriate one in the list as this lists all the versions for all the platforms.

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9442 - Posted: 7 May 2009 | 16:57:32 UTC

I saw the same thing this morning. I'm wondering if it's related to the new set of work units (KASHIF_HIVPR). My card, (Quadra FX 3700) started on its first HIVPR work unit 7.5 hours ago, shows 0.02% done, and the time to completion is continuing to rise. I'm thinking of aborting it and seeing if the next one does this. Docking@home had a similar issue with a recent batch of their work units. Can the project admins comment?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9542 - Posted: 9 May 2009 | 13:58:11 UTC - in response to Message 9442.

Under no circumstances should your g92 based quadro need that long. You're likely another victim of 6.6.20. Try upgrading to 6.6.23 or downgrading to 6.5.0 (recommended by me) or 6.4.7.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9547 - Posted: 9 May 2009 | 14:25:04 UTC - in response to Message 9542.

Under no circumstances should your g92 based quadro need that long. You're likely another victim of 6.6.20. Try upgrading to 6.6.23 or downgrading to 6.5.0 (recommended by me) or 6.4.7.

I think that I can safely say that 6.6.28 is slightly better than 6.6.23 (6.6.29 on the macs) in that it has an improved Resource Scheduler that is less likely to continually change its mind on which tasks to run.

Almost all versions of BOINC back to the 4 series have had this tendency which seems to have finally been adequately addressed. The symptom is that BOINC will ignore TSI and right after tasks download new tasks will be started only to be left in a partly run state as more tasks are downloaded and started ...

Now TSI is respected so that you no longer see this and for those of us with long TSI periods (mine is 12 hours) tasks will run to completion and as resources come free new tasks will be started in a more rational manner.

The LTD imbalance problems of the 6.6.x series is still with us so you may have to reset debts on occasion to keep your queue filled.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9669 - Posted: 12 May 2009 | 11:46:30 UTC

I just uploaded a 54Mb result file for a KASHIF_HIVPR and it was done on a GTS250 at that. Unfortunately took a while so I missed the bonus credits, but at least it didn't get aborted.
____________
BOINC blog

Barraud Denis
Avatar
Send message
Joined: 2 Sep 08
Posts: 15
Credit: 36,207,656
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9709 - Posted: 13 May 2009 | 12:45:57 UTC

For my GTS 250 Sparkles 1Go, 6.6.28 / last drivers nvidia seemed better,
a wu is done in 10 to 11 Hours, when it succed.. for the moment, too many error on these GPU card.. I suggest loading one per one gpu wu or suspend other wu for gpugrip and activate when the gpu is iddle with reactivation of the profil file at 80 to 85% fan speed.

Because I notice the GTS 250, need to have more fan speed, by rewriting the .nsu file with the fan speed i want & double clik on the file the ntune lauch it with the wanted speed. When no gpu unit running the GPU run at 300 Mhz, power save mode i suppose.

I project to change the fan cooler og the GTS later to have more efficient cooling, and adding on ram chip alu radiator to have more perfect cooling on them.

Post to thread

Message boards : Graphics cards (GPUs) : Massive WU

//