Advanced search

Message boards : News : On new fatty WUs

Author Message
ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18415 - Posted: 27 Aug 2010 | 15:23:17 UTC
Last modified: 27 Aug 2010 | 16:47:57 UTC

A batch of 500 WUs *_TRYP_* have been sent and will take TWICE as much to compute than normal.We are talking of 10 nanoseconds of simulation per WU! (for what most people still use big fat supercomputers). We need them for a very interesting publication we are preparing related to this experiment. Many thanks for the effort!

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18416 - Posted: 27 Aug 2010 | 15:23:58 UTC - in response to Message 18415.

So will take the 10 *meta_pYEEI*.

Old man
Send message
Joined: 24 Jan 09
Posts: 42
Credit: 16,676,387
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18417 - Posted: 27 Aug 2010 | 15:52:26 UTC - in response to Message 18415.

Link dont work. Link are:http://www."http.com//www.youtube.com/watch?v=XnjmHYUvFW4%22

Maybe you mean:http://www.youtube.com/watch?v=XnjmHYUvFW4

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18418 - Posted: 27 Aug 2010 | 16:13:11 UTC - in response to Message 18417.

Link dont work. Link are:http://www."http.com//www.youtube.com/watch?v=XnjmHYUvFW4%22

Maybe you mean:http://www.youtube.com/watch?v=XnjmHYUvFW4



You mean: http://www.youtube.com/watch?v=XnjmHYUvFW4

The sad thing about these work units is the 56% GPU usage (on a GTX 480)

Old man
Send message
Joined: 24 Jan 09
Posts: 42
Credit: 16,676,387
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18419 - Posted: 27 Aug 2010 | 16:51:45 UTC - in response to Message 18418.

Link dont work. Link are:http://www."http.com//www.youtube.com/watch?v=XnjmHYUvFW4%22

Maybe you mean:http://www.youtube.com/watch?v=XnjmHYUvFW4



You mean: http://www.youtube.com/watch?v=XnjmHYUvFW4

The sad thing about these work units is the 56% GPU usage (on a GTX 480)


To url: Yes.

My gtx 460 gpu usage are 80 %

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18420 - Posted: 27 Aug 2010 | 16:57:30 UTC
Last modified: 27 Aug 2010 | 17:00:52 UTC

I have one that was already 25% done when I saw this post and turned HT off, shaders turned up to 1756 (mv 1138)... bring on ALL 10 ms WUs!!!
On WinXP Pro 32 - I am now getting 68% usage.

@TAPIO - I don't see any of these new TRYP type of WUs on your machine - what kind of WU are you talking about on the 80% usage?
____________
Thanks - Steve

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18422 - Posted: 27 Aug 2010 | 18:18:31 UTC - in response to Message 18420.

Different Task types utilize the GPU to different extents as the calculations differ. I have seen tasks from around 53% to as much as 93%.
The *_TRYP_* task I have used 66% GPU on a GTX260 when all 4 CPU cores were being used elsewhere. When I freed up one core to allow the GPUGrid task to run faster it only rose to 68% GPU utilization, but the CPU utilization only fell to 84%, not 75%. So the system and WU together are using about 9% of the CPU, or 36% of one core. Typically XP uses about 3% CPU, so the task is using about 24% of one core. When CPU competition is higher (contention) it slows the GPU tasks down; the GPU has to wait for the CPU to run some calculations, so it helps to have a core/thread free, but may not be seen looking at the GPU utilization alone. With the 8 or 12 threads on an i7 this is not as much of a CPU loss as with a Q8400, or a dual core. Obviously the bigger Fermi cards benefit more from a free core, as do systems with multiple GPUs. The *_TRYP_* task should take about 9h, all being well.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18428 - Posted: 27 Aug 2010 | 22:08:09 UTC
Last modified: 27 Aug 2010 | 22:15:30 UTC

I've completed two new 'fatty' WUs on two different computers.
The first one finished in 9 hours 11 minutes (13.23ms/setp). (GTX 480 and C2Q 9550)
The second one finished in 9 hours 51 minutes (14.2ms/setp). (GTX 480 and C2Q 6600)
I realized how much these WUs CPU dependent. (I use the SWAN_SYNC=0 setting on both computers)
Both tasks were 2.5 millon steps.
But I don't understand how could these get the same credit as Snow Crash's 1.82 million step wu, which took only 6 hours 54 minutes (9.1ms/step). (GTX 480 and i7 920)
An i7 920 is faster than a C2Q 9550, but I think not that much. (especially the FP units almost the same fast)
I also don't understand the different GPU clock rates of Snow Crash's task. (1.59GHz, 1.40GHz, 1.74GHz, 1.40GHz, and 1.40GHz) (this computer have only one GPU, maybe the GPU clock changed while crunching this task, and restarted 4 times?)

Old man
Send message
Joined: 24 Jan 09
Posts: 42
Credit: 16,676,387
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18429 - Posted: 27 Aug 2010 | 22:48:56 UTC - in response to Message 18420.

I have one that was already 25% done when I saw this post and turned HT off, shaders turned up to 1756 (mv 1138)... bring on ALL 10 ms WUs!!!
On WinXP Pro 32 - I am now getting 68% usage.

@TAPIO - I don't see any of these new TRYP type of WUs on your machine - what kind of WU are you talking about on the 80% usage?


Oops. Sorry, i read wrong theard. :-) My WUs are IBUCH and these use 80% of gpu.

Profile [AF>EDLS] Polynesia
Avatar
Send message
Joined: 3 Apr 09
Posts: 11
Credit: 5,336,576
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwat
Message 18430 - Posted: 27 Aug 2010 | 23:11:27 UTC - in response to Message 18429.
Last modified: 27 Aug 2010 | 23:12:03 UTC

I like this wu:

http://img.lbzh.fr/images/capturexqx.png

it is not part of the 500 wus?
____________
(boinc since 1/04/09), Alliance Francophone
GA-P55A-UD5, i7 860 2.8 Ghz, Win 7 64 bits, 8 g DDR3 de ram, 2 To DD + 750 ext, boitier Haf 922 + 2 ventilo 120 noctua, ventirad noctua NH-U12P SE2, GTX 470 zotac amp edition, écran 25"

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18432 - Posted: 28 Aug 2010 | 0:27:27 UTC - in response to Message 18430.

It's not.

Profile [AF>EDLS] Polynesia
Avatar
Send message
Joined: 3 Apr 09
Posts: 11
Credit: 5,336,576
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwat
Message 18439 - Posted: 28 Aug 2010 | 10:47:04 UTC - in response to Message 18432.
Last modified: 28 Aug 2010 | 10:52:31 UTC

ok
____________
(boinc since 1/04/09), Alliance Francophone
GA-P55A-UD5, i7 860 2.8 Ghz, Win 7 64 bits, 8 g DDR3 de ram, 2 To DD + 750 ext, boitier Haf 922 + 2 ventilo 120 noctua, ventirad noctua NH-U12P SE2, GTX 470 zotac amp edition, écran 25"

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18443 - Posted: 28 Aug 2010 | 16:54:31 UTC - in response to Message 18428.

I've completed two new 'fatty' WUs on two different computers.
The first one finished in 9 hours 11 minutes (13.23ms/setp). (GTX 480 and C2Q 9550)
The second one finished in 9 hours 51 minutes (14.2ms/setp). (GTX 480 and C2Q 6600)
I realized how much these WUs CPU dependent. (I use the SWAN_SYNC=0 setting on both computers)
Both tasks were 2.5 millon steps.
But I don't understand how could these get the same credit as Snow Crash's 1.82 million step wu, which took only 6 hours 54 minutes (9.1ms/step). (GTX 480 and i7 920)
An i7 920 is faster than a C2Q 9550, but I think not that much. (especially the FP units almost the same fast)
I also don't understand the different GPU clock rates of Snow Crash's task. (1.59GHz, 1.40GHz, 1.74GHz, 1.40GHz, and 1.40GHz) (this computer have only one GPU, maybe the GPU clock changed while crunching this task, and restarted 4 times?)

If you look at the difference between your Q9550 and Q6600 you will get an idea of the importance of the CPU; 40min less on the faster CPU.

The different GPU clock rates are as a result of changing the clock speed.

Norman_RKN
Send message
Joined: 22 Dec 09
Posts: 16
Credit: 23,522,575
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 18446 - Posted: 28 Aug 2010 | 18:57:01 UTC - in response to Message 18443.

hello to all ;)

i have 3 questions atm:

1.)
why i only get applications for cuda ( 6.05 ) and my card is ready for cuda 3.1 ?
win7 x64
gtx 260


2.)
why are so lees wu´s on the server and only 2 WU´s maximum ???

3.) can you optimize the apps to more gpu usage ?

lg
Norman

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18448 - Posted: 28 Aug 2010 | 22:04:36 UTC - in response to Message 18443.

If you look at the difference between your Q9550 and Q6600 you will get an idea of the importance of the CPU; 40min less on the faster CPU.

I realized this difference. I concluded this project should renamed to GPUCPUgrid :)
I have an idea: GPUgrid should be a multi CPU threaded project. In this way a GPUgrid task would get the sufficient CPU power when needed, because GPUgrid runs at higher priority than my other project (Rosetta). Maybe we shouldn't dedicate one CPU core to each GPU then. But if we have to, and the credit gain is more than my other CPU project can earn, I will detach from the other project.
As far as I can recall, other kinds of WUs do not depend on the CPU speed to this extent.

Profile MarkJ
Volunteer moderator
Project tester
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 730
Credit: 189,243,545
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18450 - Posted: 29 Aug 2010 | 5:26:51 UTC - in response to Message 18446.

hello to all ;)

i have 3 questions atm:

1.)
why i only get applications for cuda ( 6.05 ) and my card is ready for cuda 3.1 ?
win7 x64
gtx 260


2.)
why are so lees wu´s on the server and only 2 WU´s maximum ???

3.) can you optimize the apps to more gpu usage ?

lg
Norman


The apps are mostly compiled using cuda (2.3 from memory). The fact that you have a more up to date driver that supports up to cuda 3.1 means its also backwards compatible.

The main changes for cuda 3.1 are targeted to the fermi-based cards (GTX4xx) and don't offer any advantage to the older cards such your GTX260. There is no reason to recompile them for cuda 3.1 if its not going to use any of the new features and still work on older cards. There is a cuda 3.1 app for the fermi-based cards.

There are only a few wu available at the moment as the project is switching to a new type of work unit. See the news item on the front page for more details about that.

The specific limit of 2 wu is so the project can get results back quickly. They use the results from one set of wu to generate the next set, so unless they come back quickly it holds things up. It also stops people hogging the work units and allows more people to get some.

As for optimising the guys do that regularly. If you run a tool call GPU-Z it will show you how hard the graphics card is working. Typically this is quite high. As they say on the home page we are doing work that normally is run on a super computer.
____________
BOINC blog

Profile Mad Matt
Send message
Joined: 29 Aug 09
Posts: 28
Credit: 101,584,171
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 18452 - Posted: 29 Aug 2010 | 7:53:57 UTC

You should adjust bonus lines accordingly if you know something like this is coming. The 'sophisticated' BOINC management makes sure at times my 9800GT will get those monster WUs while the GTX 295 crunches usual WUs.

Difference: 21,239.74 vs 17,699.78 (just because I managed to swap cards somehow by snooze GPU). Otherwise probably just: 14,159.83.

I do not like to see being punished for adding more computing power to the project.
____________

Norman_RKN
Send message
Joined: 22 Dec 09
Posts: 16
Credit: 23,522,575
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 18455 - Posted: 29 Aug 2010 | 9:10:39 UTC - in response to Message 18452.
Last modified: 29 Aug 2010 | 9:21:03 UTC

OK, thank you Mark !

what is with open-cl support to bring up the ATI-power to the project ?

here is a post from timo strunk about open_cl ( developer poem@home ):


Hi everybody,

So, first of all: We will not use CUDA in our app. OpenCL can do everything CUDA can do and there's really no need to use it anymore (however that's my opinion). Apart from that we are working hard to get everything to work in Single Precision. So far our forcefields are exact in single precision with the exception of our Solvent Accessible Surface Area term, which will be deployed on the CPU for the moment. The new SASA forcefield is about 7 times faster than the old one on the CPU already though and during the next time somebody will be working to deploy it also on GPUs.

So the GPU part will be single precision and OpenCL; there's also no question whether we release first ATI or NVidia - these two releases won't be more than 1-2 days apart for sure, because the code is not different apart from optimization parameters. Personally I use the ATI Stream SDK on CPU for debugging bugs usually.

As to release dates: Well we were a bit off, when we estimated POEM++ to be done for the end of CASP, therefore at the moment we are hesitant to specify an immediate release date. The CPU part however is done and we are thinking to release it first, because it also gives already quite a speedup.

Best,
Timo

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18457 - Posted: 29 Aug 2010 | 9:20:53 UTC - in response to Message 18452.

I completed one of the _revlo_TRYP_ Work Units on my GTX470:
2889900 1844414 28 Aug 2010 8:02:33 UTC 28 Aug 2010 21:44:16 UTC Completed and validated 38,875.28 38,863.08 14,159.83 21,239.74 ACEMD2: GPU molecular dynamics v6.11 (cuda31)

These were actually a bit shy when it came to the points:
If my Fermi just crunched these tasks it would only bring 47K per day. Most other tasks get around 60K per day and one or two task types better that. My Fermi's RAC is about 62K despite the shortage and the odd slower task.
It's the same story for the other cards; my GTX260 and GT240 both got less points crunching these _revlo_TRYP_ tasks than the other types.
Anyway, there was not too many of them, and it was for a good cause - to get a publication out.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18476 - Posted: 30 Aug 2010 | 17:55:22 UTC - in response to Message 18457.

ignasi,
ftpd had a failure running a _revlo_TRYP_ task on a reliable GTS250:
2891437 1845526 28 Aug 2010 20:18:12 UTC 30 Aug 2010 15:36:28 UTC Error while computing 128,347.85 15,342.13 14,159.83 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)

The failure occurred close to the expected finish time,
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
Run time 128347.851675 (35h+)
CPU time15342.13
Outcome Client error
Client state Compute error
Validate state Invalid
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 18477 - Posted: 30 Aug 2010 | 21:17:46 UTC - in response to Message 18428.
Last modified: 30 Aug 2010 | 21:20:50 UTC

I've completed two new 'fatty' WUs on two different computers.
The first one finished in 9 hours 11 minutes (13.23ms/setp). (GTX 480 and C2Q 9550)
The second one finished in 9 hours 51 minutes (14.2ms/setp). (GTX 480 and C2Q 6600)
I realized how much these WUs CPU dependent. (I use the SWAN_SYNC=0 setting on both computers)
Both tasks were 2.5 millon steps.
But I don't understand how could these get the same credit as Snow Crash's 1.82 million step wu, which took only 6 hours 54 minutes (9.1ms/step). (GTX 480 and i7 920)
An i7 920 is faster than a C2Q 9550, but I think not that much. (especially the FP units almost the same fast)
I also don't understand the different GPU clock rates of Snow Crash's task. (1.59GHz, 1.40GHz, 1.74GHz, 1.40GHz, and 1.40GHz) (this computer have only one GPU, maybe the GPU clock changed while crunching this task, and restarted 4 times?)


This is not 1.8M steps. It's still 2.5M steps but the last restart till it finished was of 1.8M steps. (maybe the message could be clearer).

gdf

Profile MarkJ
Volunteer moderator
Project tester
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 730
Credit: 189,243,545
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18478 - Posted: 30 Aug 2010 | 21:35:01 UTC - in response to Message 18455.

OK, thank you Mark !

what is with open-cl support to bring up the ATI-power to the project ?

here is a post from timo strunk about open_cl ( developer poem@home ):


Hi everybody,

So, first of all: We will not use CUDA in our app. OpenCL can do everything CUDA can do and there's really no need to use it anymore (however that's my opinion). Apart from that we are working hard to get everything to work in Single Precision. So far our forcefields are exact in single precision with the exception of our Solvent Accessible Surface Area term, which will be deployed on the CPU for the moment. The new SASA forcefield is about 7 times faster than the old one on the CPU already though and during the next time somebody will be working to deploy it also on GPUs.

So the GPU part will be single precision and OpenCL; there's also no question whether we release first ATI or NVidia - these two releases won't be more than 1-2 days apart for sure, because the code is not different apart from optimization parameters. Personally I use the ATI Stream SDK on CPU for debugging bugs usually.

As to release dates: Well we were a bit off, when we estimated POEM++ to be done for the end of CASP, therefore at the moment we are hesitant to specify an immediate release date. The CPU part however is done and we are thinking to release it first, because it also gives already quite a speedup.

Best,
Timo


The main problem with OpenCL is it doesn't have the library support and developer tools that CUDA has. CUDA has been around longer. Eventually that will change. There is an FFT libray supplied with CUDA that a number of the projects make use of. From what I gather there isn't an equivilent one with OpenCL.

From the project point of view OpenCL would be the way to go as you only need one app (however you have to compile with each manufacturers compiler to work on that brand). The coding is different so the apps that have been developed need to be rewritten to work under OpenCL (which all takes time and effort).

It will get there eventually, it just takes a long time for OpenCL to catch up to CUDA.
____________
BOINC blog

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18479 - Posted: 30 Aug 2010 | 21:36:03 UTC - in response to Message 18477.

This is not 1.8M steps. It's still 2.5M steps but the last restart till it finished was of 1.8M steps. (maybe the message could be clearer).

gdf


Oh, now it's perfectly clear.
Thank you.
I overclocked my CPU by 20% since then, and the GPU usage had risen 10%.
So I guess it would be better to upgrade my CPU (+MB and RAM of course) to achieve higher GPU usage.

Speedy
Send message
Joined: 19 Aug 07
Posts: 28
Credit: 3,142,867
RAC: 88
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 18488 - Posted: 1 Sep 2010 | 5:07:53 UTC

Have all the Fatty wu's been sent out & how big are the files that get uploaded?

We are talking of 10 nanoseconds of simulation per WU

I don't understand. 10 nanoseconds of simulation per WU isn't very long at all. how come the Fatty units are taking twice as long?

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18489 - Posted: 1 Sep 2010 | 7:04:09 UTC - in response to Message 18488.
Last modified: 1 Sep 2010 | 7:57:17 UTC

I thought most of the 500 had been sent and returned but there are a few still available/running.

The tasks are about 2 to 4 times as long as normal tasks, though there is quite a variety of task lengths at the present time.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 18490 - Posted: 1 Sep 2010 | 9:14:47 UTC - in response to Message 18489.

Each of those 500 will be reissued 10 times to achieve 100 ns of total simulation time each.

gdf

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18496 - Posted: 2 Sep 2010 | 16:41:02 UTC - in response to Message 18476.
Last modified: 2 Sep 2010 | 16:41:27 UTC

Ignasi,

It is now a few days ago and still no answer on this question.

Can you please give an exploination for this error?

Thanks!
____________
Ton (ftpd) Netherlands

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 18498 - Posted: 2 Sep 2010 | 17:39:10 UTC - in response to Message 18496.

Ignasi was abroad, but what error are you referring to?

gdf

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18501 - Posted: 3 Sep 2010 | 8:41:45 UTC - in response to Message 18498.

gdf,

Thanks for the reply. See message 18476 from skgiven!


____________
Ton (ftpd) Netherlands

jjwhalen
Send message
Joined: 23 Nov 09
Posts: 29
Credit: 17,591,899
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18523 - Posted: 3 Sep 2010 | 23:25:31 UTC

I'm curious. I'm currently crunching TRYP WU 1859786 which I got at the end of the server outage. Its info page says its's not a resend but a first-time d/l. Is this part of the original batch of 500 or a new batch?

Oh...belated congratulations on getting the servers back up again: you were missed, but I'm sure Collatz appreciated the overflow ;D
____________

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18526 - Posted: 4 Sep 2010 | 10:23:15 UTC - in response to Message 18523.

To quote GDF, "Each of those 500 will be reissued 10 times to achieve 100 ns of total simulation time each".

Basically a batch of 500 was created, sent out, returned, and used to build another 500. When these are returned they will create another 500 and so on for 10 times.

This highlights the importance of fast turnaround.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18546 - Posted: 6 Sep 2010 | 10:53:12 UTC - in response to Message 18476.

Skgiven & Ignasi,

Again two WU aborted after a lot of hours processing at the same machine/card.

Can you please take a look and reply?

Thanks


____________
Ton (ftpd) Netherlands

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18547 - Posted: 6 Sep 2010 | 13:03:46 UTC - in response to Message 18546.

@ftpd

The errors you are referring to do not come exclusively from WUs from the 'fatty' batch. This and this are HIVPR and only this one is a fatty WU.
Similar errors are reported in 9800 GT here.

App developers are already aware of it.

thanks

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18550 - Posted: 6 Sep 2010 | 18:30:08 UTC - in response to Message 18547.

Motivated to make a systemic suggestion here. Thanks,

M J Harvey
Send message
Joined: 12 Feb 07
Posts: 9
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 18553 - Posted: 6 Sep 2010 | 21:00:01 UTC - in response to Message 18501.

That's a funny error - it looks like a driver bug affecting GTX9800 (which is what a GTS250 is, too). What driver version do you have?

Matt

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18554 - Posted: 7 Sep 2010 | 7:50:12 UTC - in response to Message 18553.
Last modified: 7 Sep 2010 | 7:58:58 UTC

Well, if it’s a driver bug then the same bug in driver 19713 is still there in driver 25896, and effects both the 9800 GT (511MB) and the 1GB version of the GTS250, suggesting this one is not related to memory size.
It would be interesting to see if the bug was present for 6.11 tasks; With the CUDA 3.1 based app rather than 2.3 files.
3.1 might be slower for the older cards but if there were fewer errors, overall it could be faster.

ftpd, at the time of the errors did you have other (non-GPUGrid) GPU tasks running?

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18556 - Posted: 7 Sep 2010 | 11:23:48 UTC - in response to Message 18554.

@skgiven,

At that time only gpugrid was running, gts250 can only do one job.
RNA World was running using CPU.
Later that week two other WU also were aborted. HIVPR?

Enough information?

Good luck
____________
Ton (ftpd) Netherlands

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18570 - Posted: 8 Sep 2010 | 11:16:57 UTC - in response to Message 18556.
Last modified: 8 Sep 2010 | 11:32:55 UTC

Hi Ton,

At that time only gpugrid was running, gts250 can only do one job.
Thanks, I was just trying to determine if Collatz or other GPU tasks were overwriting your GPUGrid WU memory, can occasionally happen if you run two GPU projects on the same computer. They don’t run at the same time, but Boinc can stop one running, and suspend it in memory, to let the other run. Sometimes when this happens it can corrupt the task, but I doubt that this is the case (you only ran one Collatz task on that GPU, probably when there was a task shortage).

RNA World was running using CPU.
Unless it was running Beta tasks it would not mess up the GPUGrid task. If you had lots of RNA failures on the same system, then it would be fair to say it could mess up the system and cause problems for GPU tasks (as they also have to use the CPU too).

Later that week two other WU also were aborted. HIVPR?
Enough information?
Well, I expect in your case you were not video editing or playing a computer game, or you would have said. So I think so.

I see you are having to User Aborting these tasks, and I can fully understand why, they all crashed on that card. A real pain for the cruncher to sit over a system that way. I'm not keen on this situation.

As Matt said, there is a driver "related" bug:
(SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700]
Assertion failed: 0, file swanlib_nv.cpp, line 121)

I just wanted to make sure there was not something else. However the same problem has been raised in two other threads,
http://www.gpugrid.net/forum_thread.php?id=2274
http://www.gpugrid.net/forum_thread.php?id=2278

We can’t do any more than ignasi did - inform the developers, and throw in a few more suggestions, if only to treat the symptoms.
The researchers have been working on new applications for a while now, 5% faster last I heard. Hopefully they can find a work around for this, as well as make the app much faster for GTX460’s and so on.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18571 - Posted: 8 Sep 2010 | 12:21:12 UTC - in response to Message 18570.

@skgiven,

Hi Kev,

I never aborted wu myself. There are never played games on this machine.
Just Outlook and Internet Explorer.
RNA no Beta-jobs. There were Linux-beta-jobs. No faillures.

Success!
____________
Ton (ftpd) Netherlands

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19719 - Posted: 30 Nov 2010 | 17:51:08 UTC

I've discovered a new type fatty WU on one of my computers called *_IBUCH_1_pYEEI_long_*. It's running for 4 hours 15 minutes and completed 55% (GTX 480 @ 800MHz, 63% GPU usage, SWAN_SYNC=0, C2Q 9550 @ 3.71GHz).
I'm a little surprised.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19722 - Posted: 30 Nov 2010 | 21:23:03 UTC - in response to Message 19719.

I've discovered a new type fatty WU on one of my computers called *_IBUCH_1_pYEEI_long_*. It's running for 4 hours 15 minutes and completed 55% (GTX 480 @ 800MHz, 63% GPU usage, SWAN_SYNC=0, C2Q 9550 @ 3.71GHz).
I'm a little surprised.

It's finished in 7 hours and 47 minutes.
Details here.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,425,102,570
RAC: 1,358,780
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19724 - Posted: 30 Nov 2010 | 22:02:54 UTC

I had a _pYEEI_long_ back in June, and posted about it. Nobody took any notice then, either.

Profile Fred J. Verster
Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19725 - Posted: 30 Nov 2010 | 23:28:58 UTC - in response to Message 19724.
Last modified: 30 Nov 2010 | 23:40:17 UTC

Only have 145KASHIF_HIVPR_no_boundxxxx types and take 3:30 hours, depending on de # of steps, some take longer, 4 hours. CPU is an C2EX9650 @ 3.5GHz.
GPU =GTX480 @ 1.4GHz.
I've seen many wingmen, if one is used, with exit status 98 (0x62)
or 1, especially on 200 series cards, but also good ones. And FERMI's errors, too, sometimes

If I up the clock on the GPU to 1.6 or higher, it gonna make an error in a GPUgrid WU, heat isn't an issue, since the mobo is not in it's case, which makes an awfull difference, cause no heatbuild up, CPU temp is 30C lower and runs at 3.5GHz, while one of my C2Q6600's, of which one is constantly on 90C. (According to a program capable of reading the mobo's sensors and CPU diodes, which came with the mobo (ASUS P5E,X38)).
____________

Knight Who Says Ni N!

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19730 - Posted: 2 Dec 2010 | 7:14:58 UTC - in response to Message 19725.

These are different tasks from those released in June; even different apps.
They may include some of the new subroutines being worked on (as with the GIANNI_DHFR1000), intended to increase turnover, or they might just need to get through more work, for a paper. Perhaps these are over-taxing the Fermi controller when the cards are overclocked, but they seem to work on GT200 series cards that are OC'd.

The double length is clearly just to annoy me, I have 6 GT240's.
On a high end Fermi 8h should not be an issue, even if you get two back to back you will complete the task in 16h (if running 24/7). If you don’t crunch that often reduce your cache.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19732 - Posted: 2 Dec 2010 | 11:40:35 UTC - in response to Message 19730.
Last modified: 2 Dec 2010 | 11:43:56 UTC

These are different tasks from those released in June; even different apps.

That's great, but now they came out of the blue.

They may include some of the new subroutines being worked on (as with the GIANNI_DHFR1000), intended to increase turnover, or they might just need to get through more work, for a paper. Perhaps these are over-taxing the Fermi controller when the cards are overclocked, but they seem to work on GT200 series cards that are OC'd.

These *_IBUCH_1_pYEEI_long_* WUs not as stressing to the card as the GIANNI_DHFRs, they run at 30% lower GPU usage, therefore generating much less heat so overclocking is not an issue with those. BTW I've installed a larger cooler (Arctic Cooling Xtreme Plus + WR004 kit) on each of my GTX480s, so the larger heat dissipation of GIANNI_DHFRs is not an issue anymore, even when overclocked to 800MHz.

The double length is clearly just to annoy me, I have 6 GT240's.
On a high end Fermi 8h should not be an issue, even if you get two back to back you will complete the task in 16h (if running 24/7). If you don’t crunch that often reduce your cache.

Not just to you... I was shocked when I saw that it's been running for more than 4 hours, and it's barely completed the half of the WU. The first thing came to my mind was that my brand new coolers didn't do their job so well, and the card's been locked up. But then I saw the progress indicator is running, and it's eased my mind a little, so I could read the name of the WU :) So the "_long_" suffix was the right explanation for the long running time. I've posted about it to warn my fellow crunchers, once the crew didn't. BTW some *_IBUCH_GCY_* WUs are appearing as well, those are very long too (about 5h30m to complete on my GTX480s)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19743 - Posted: 3 Dec 2010 | 14:36:03 UTC

New type of long WUs been arrived: *-IBUCH_?_BCC_long_*
Processing of them is not started yet, so I don't know how long they will be.

Siegfried Niklas
Avatar
Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 19744 - Posted: 3 Dec 2010 | 15:29:18 UTC
Last modified: 3 Dec 2010 | 15:30:52 UTC

p9-IBUCH_1_pYEEI_long_101130-2-10-RND3169_0

GeForce GTX 260/216 (Shader 1566 MHz): Runtime = 53,900.07 (~15 h), Granted credit = 23,863.26

Crystal Pellet
Send message
Joined: 25 Mar 10
Posts: 18
Credit: 2,568,073
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 19745 - Posted: 3 Dec 2010 | 15:34:43 UTC - in response to Message 19743.

New type of long WUs been arrived: *-IBUCH_?_BCC_long_*
Processing of them is not started yet, so I don't know how long they will be.

No problem on your foureighty.
I have the half of that: a twofourty and 1 BCC_long running.
Estimated runtime: 33 hours and 50 minutes and a 30% CPU-use.
GPU-load: 62% Fan-speed: 41% Temperature: 63C.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19746 - Posted: 3 Dec 2010 | 17:05:25 UTC - in response to Message 19745.

I had a _long_ IBUCH task on a GT240 (32h), and another aet1_IBUCH task that was also equally lengthy (33h). aet :)
This long IBUCH task took over 10h on a GTX470.

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19748 - Posted: 3 Dec 2010 | 18:22:30 UTC - in response to Message 19746.

Dear all,

Appologies for not warning you. We should have done it some days ago.
The truth is that, as always, when it comes to publishing everything is for yesterday. The only way we can speed up the time-to-answer now is by extending workunit size. As we have explained many times, our WUs are little pieces of large calculations, and one depends from the previous completed one, and so on. In this chain-process WUs are queued again. By extending the length of a WU we reduce the waiting time.

We really appreciate your special effort in computing these monsters. For this, we will grant 40% BONUS in credits with the *long* WUs. We'll do it through a command we have never used that will be tested over the weekend. If successful, we will use it for ALL NEW *long* SUBMISSIONS starting NEXT WEEK. There is a batch of 1000-1500 in preparation that will benefit from it...

Thanks a lot guys,
ignasi

PD: By the way, next week Dec 10th we're presenting some unpublished (and unprecented I have to say) results (*TRYP*,*reverse_TRYP*) in London at the Brunei Gallery for the 2010 UK Young Modellers Forum.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 402
Credit: 169,933,246
RAC: 309,519
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19757 - Posted: 3 Dec 2010 | 21:12:16 UTC
Last modified: 3 Dec 2010 | 21:18:24 UTC

Is there a way to tell the server not to send me any of the new fatty workunits? My 9800 GT, running nearly 24/7, can complete the normal workunits in time, but not any workunits that try to take twice as long with the same allowed time to completion. Even now, about half of the workunits it tries complete with computation errors.

No problem expected with them if you also increase the allowed time to completion in proportion to the expected length, though.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19761 - Posted: 3 Dec 2010 | 22:47:53 UTC

Bring em on ... I love when you guys push the limits ... I'm working on a IBUCH_RED right now!!!

____________
Thanks - Steve

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19764 - Posted: 4 Dec 2010 | 12:49:45 UTC - in response to Message 19761.

Remember to keep your cache level low; no point in having a task sitting waiting for half a day before it starts.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 402
Credit: 169,933,246
RAC: 309,519
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19767 - Posted: 4 Dec 2010 | 17:04:27 UTC

My cache level was already low enough to delay getting a new workunit to perhaps 1 hour before the last one finishes.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19771 - Posted: 4 Dec 2010 | 20:40:48 UTC - in response to Message 19764.
Last modified: 4 Dec 2010 | 20:41:27 UTC

One of the big ones cancelled after 18 hrs on GTX295.
Little bit pitty!

Send me some more!

All wu's i downloaded were NO big-ones!
____________
Ton (ftpd) Netherlands

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 791
Credit: 1,425,102,570
RAC: 1,358,780
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19773 - Posted: 5 Dec 2010 | 1:44:59 UTC - in response to Message 19748.

Dear all,
...
We really appreciate your special effort in computing these monsters. For this, we will grant 40% BONUS in credits with the *long* WUs. We'll do it through a command we have never used that will be tested over the weekend. If successful, we will use it for ALL NEW *long* SUBMISSIONS starting NEXT WEEK. There is a batch of 1000-1500 in preparation that will benefit from it...

Just completed one of your test WUs - hope this is what you expected to see.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19776 - Posted: 5 Dec 2010 | 7:30:25 UTC - in response to Message 19773.

We'll do it through a command we have never used that will be tested over the weekend.
It looks like the normal 50% bonus was applied. Not sure if the never used before command will be run subsequent to the task completion. I guess this will be a manual test.

Richards task,
Claimed credit 127.270717592593,
Granted credit 190.906076388889.

We will see if these numbers change.

Hypernova
Send message
Joined: 16 Nov 10
Posts: 22
Credit: 24,712,746
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 19780 - Posted: 6 Dec 2010 | 7:24:36 UTC - in response to Message 19764.

Remember to keep your cache level low; no point in having a task sitting waiting for half a day before it starts.


I do not understand the cache level issue. You get two WU's and that is all. One computes and one is in cache. There is no way to change that it seems to me. On WCG I have a 3 day cache. But on GPUGrid I have not seen any cache management function.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19783 - Posted: 6 Dec 2010 | 10:09:02 UTC - in response to Message 19780.

Remember to keep your cache level low; no point in having a task sitting waiting for half a day before it starts.


I do not understand the cache level issue. You get two WU's and that is all. One computes and one is in cache. There is no way to change that it seems to me. On WCG I have a 3 day cache. But on GPUGrid I have not seen any cache management function.

While you could do it here, this is a Boinc wide setting and will apply to all projects. As local settings override this you might as well just set it on Boinc Manager,
Advanced Preferences, Network Usage, Additional Work Buffer, set to 0.1days or 0.01 if you prefer.

As you can see there is no GPU tab or separate GPU cache setting on Boinc and the default cache setting of 0.50days is bad for this project.

For a high end Fermi you should still get away with a 3day cache, if the GPU is optimized for performance (Free CPU core/thread, swan_sync=0, always use GPU...); you can only have one running task (per GPU) and one queued task, no matter how many days cache you set, so if your high end Fermi finishes a task in under 10h then even the queued task will finished in 20h. So 0.5days or 9days makes no odds. You just have to consider the consequences of a 4h local Internet outage, system restarts and the effect of your use on the system.

On the other hand for an entry level GPU such as a GT240, these long tasks take around 30 to 32h, so if you have a high cache the queued task will have sat waiting to run for 32h before starting, and therefore will not finish until 64h after it was sent. So you are better off with a 0.01day cache. Ideally tasks get returned within 24h and if not then within 48h. Beyond that you are late and your efforts are of less value to the project. After about 4 days your efforts become worthless to the project and might even slow it down. After 5days you don't even get any credit. Personally I would prefer this credit cut-off was 4days, to better reflect the reality of the situation.

Werkstatt
Send message
Joined: 23 May 09
Posts: 116
Credit: 122,267,748
RAC: 29,512
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19784 - Posted: 6 Dec 2010 | 10:33:41 UTC - in response to Message 19783.

Remember to keep your cache level low; no point in having a task sitting waiting for half a day before it starts.


I do not understand the cache level issue. You get two WU's and that is all. One computes and one is in cache. There is no way to change that it seems to me. On WCG I have a 3 day cache. But on GPUGrid I have not seen any cache management function.

While you could do it here, this is a Boinc wide setting and will apply to all projects. As local settings override this you might as well just set it on Boinc Manager,
Advanced Preferences, Network Usage, Additional Work Buffer, set to 0.1days or 0.01 if you prefer.

As you can see there is no GPU tab or separate GPU cache setting on Boinc and the default cache setting of 0.50days is bad for this project.

For a high end Fermi you should still get away with a 3day cache, if the GPU is optimized for performance (Free CPU core/thread, swan_sync=0, always use GPU...); you can only have one running task (per GPU) and one queued task, no matter how many days cache you set, so if your high end Fermi finishes a task in under 10h then even the queued task will finished in 20h. So 0.5days or 9days makes no odds. You just have to consider the consequences of a 4h local Internet outage, system restarts and the effect of your use on the system.

On the other hand for an entry level GPU such as a GT240, these long tasks take around 30 to 32h, so if you have a high cache the queued task will have sat waiting to run for 32h before starting, and therefore will not finish until 64h after it was sent. So you are better off with a 0.01day cache. Ideally tasks get returned within 24h and if not then within 48h. Beyond that you are late and your efforts are of less value to the project. After about 4 days your efforts become worthless to the project and might even slow it down. After 5days you don't even get any credit. Personally I would prefer this credit cut-off was 4days, to better reflect the reality of the situation.

@skgiven: pls check your PM.

JohnMD
Avatar
Send message
Joined: 4 Dec 10
Posts: 1
Credit: 8,012
RAC: 0
Level

Scientific publications
wat
Message 19805 - Posted: 8 Dec 2010 | 11:00:05 UTC - in response to Message 19783.

After about 4 days your efforts become worthless to the project and might even slow it down. After 5days you don't even get any credit. Personally I would prefer this credit cut-off was 4days, to better reflect the reality of the situation.

My 7-year old desktop finally hit the dust, and now I have a replacement with a CUDA-card! So without reading the small print I attached to gpugrid. 4½ days later my WU completed - and now I learn that I'm probably wasting the projects time.
So maybe it's good idea to shorten the deadline to prevent GT218-hopefuls like me getting in the way :-)

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,835,229,774
RAC: 291,018
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19807 - Posted: 8 Dec 2010 | 12:18:02 UTC - in response to Message 19805.

Your NVidia ION only has 16 cuda cores, it is well short of the recommended minimum of 96 cuda cores.
Although your ION can finish some tasks within 5days, the task was reissued after 2days to a GTX260 and returned 2days before your task, making your efforts redundant.

http://www.gpugrid.net/workunit.php?wuid=2135446

As you can see the GTX260 is 10times as fast as an ION.

Consider using your Ion on Einstein, and your CPU cores on WCG's HCMD2 or HCC projects.
I see you run FreeHal. You could also run WorkUnitProp.

Good luck,

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19881 - Posted: 13 Dec 2010 | 0:21:01 UTC - in response to Message 19748.

For this, we will grant 40% BONUS in credits with the *long* WUs. We'll do it through a command we have never used that will be tested over the weekend. If successful, we will use it for ALL NEW *long* SUBMISSIONS starting NEXT WEEK.

Looks like those never used before commands are still not working.
I've just completed another *_long_* monster wu, normal 50% bonus been applied on it.

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19887 - Posted: 13 Dec 2010 | 16:17:37 UTC - in response to Message 19881.
Last modified: 13 Dec 2010 | 16:19:22 UTC

Looks like those never used before commands are still not working.
I've just completed another *_long_* monster wu, normal 50% bonus been applied on it.


Hi Retvari Zoltan,
This WU you are linking is actually a from a batch that is two weeks old, about to finish: 9 of 10 steps completed. By new ones submitted this week we meant brand new long batch submissions.
Which I am about to do in the next couple of days.

Thanks,
ignasi

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19890 - Posted: 13 Dec 2010 | 21:21:26 UTC - in response to Message 19887.

By new ones submitted this week we meant brand new long batch submissions.
Which I am about to do in the next couple of days.

Ok. I can't wait to meet those :)

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19900 - Posted: 14 Dec 2010 | 12:55:42 UTC - in response to Message 19890.

First batch of 500 is out.
*IBUCH_AcPYEEIP_long*

With a 40% extra bonus.

happy crunching
i

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19901 - Posted: 14 Dec 2010 | 13:58:39 UTC - in response to Message 19900.
Last modified: 14 Dec 2010 | 14:10:20 UTC

First batch of 500 is out.
*IBUCH_AcPYEEIP_long*

With a 40% extra bonus.

happy crunching
i


Negative.
We encourage you to discontinue any IBUCH_AcPYEEIP_long.
They will compute well but the experiment is not correct.

My fault. I appologise.

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19902 - Posted: 14 Dec 2010 | 14:16:27 UTC - in response to Message 19901.
Last modified: 14 Dec 2010 | 14:26:39 UTC

We have cancelled all UNSENT WUs for the wrong batch.
We won't force cancel any running WU (only 230 running now), they should complete fine as normally. However, we discourage their continuation.

Correct ones will be issued in few hours with a new bonus for all will be now of 50%.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,654,773,244
RAC: 9,959,303
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19905 - Posted: 14 Dec 2010 | 14:50:58 UTC - in response to Message 19901.

First batch of 500 is out.
*IBUCH_AcPYEEIP_long*

With a 40% extra bonus.

happy crunching
i


Negative.
Please discontinue any IBUCH_AcPYEEIP_long.

They will compute well but the experiment it is not correct. It is wasted computing. And we all hate that.

My fault. I appologise.

I didn't mean to rush you in any way.
I've aborted 2 of those.

Post to thread

Message boards : News : On new fatty WUs