Advanced search

Message boards : Number crunching : Granted credit for Long NATHAN_FA4 workunit is way too low

Author Message
Profile Retvari Zoltan*
Avatar
Send message
Joined: 20 Jan 09
Posts: 1000
Credit: 3,760,201,187
RAC: 4,815,659
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 22455 - Posted: 2 Nov 2011 | 20:20:23 UTC

NATHAN_FA4........... 1.29 credits/sec (99% GPU usage)
TONI_SH2MS3_....... 2.13 credits/sec
GIANNI_KKFREE5... 2.37 credits/sec
IBUCH_2_nmEGFR.. 2.43 credits/sec
IBUCH_PYRT........... 1.35 credits/sec (this is also very low, but it has a very low GPU usage, and high CPU usage by an iteration algorithm)
IBUCH_freePYRT..... 1.73 credits/sec
I think this situation is similar to what we had with the crediting of previous GIANNI_KKFREE workuntis.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3506
Credit: 914,290,807
RAC: 1,147,265
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22457 - Posted: 2 Nov 2011 | 22:22:27 UTC - in response to Message 22455.

Running a long IBUCH_PYRT task now on a GTX470.
Without SWAN_SYNC it was only utilizing about 59% of the GPU.
With SWAN_SYNC in place (and 2 threads freed up) the task used around 63 or 64% (about 4 or 5%) more.
This is quite low compared to the average 85% on this system and the 99% for Nate's tasks, but the tasks are very different.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Retvari Zoltan*
Avatar
Send message
Joined: 20 Jan 09
Posts: 1000
Credit: 3,760,201,187
RAC: 4,815,659
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 22458 - Posted: 3 Nov 2011 | 1:11:46 UTC - in response to Message 22457.

Based on the very high GPU usage of this NATHAN_FA4 workunits, I expected the credits/sec ratio will exceed the GIANNI_KKFREE5's, and the IBUCH_2_nmEGFR's. But - to my disappointment - it is the half of what I would think proportional.
I'm actually running without SWAN_SYNC, because I'm crunching for SIMAP with all of my CPU cores.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3506
Credit: 914,290,807
RAC: 1,147,265
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22462 - Posted: 3 Nov 2011 | 11:52:35 UTC - in response to Message 22458.

The present mix of tasks makes it difficult to decide what to do re SWAN_SYNC on Windows (it's still worth using on Linux with a Fermi). I think the tasks that use more CPU are awarded more credit because they use more CPU (and you can't use that CPU for crunching on other CPU projects). That makes sense, but the delta between the credit awarded it too large in my opinion, and ignores the fact that a GPU running at 99% is doing a lot more than a GPU at 59% or even 85%. I'm happy with the present diversity of task types (the more the merrier) but it's a challenge when some tasks utilize 40% more of the GPU.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Retvari Zoltan*
Avatar
Send message
Joined: 20 Jan 09
Posts: 1000
Credit: 3,760,201,187
RAC: 4,815,659
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 22465 - Posted: 3 Nov 2011 | 15:48:25 UTC - in response to Message 22462.

... I think the tasks that use more CPU are awarded more credit because they use more CPU (and you can't use that CPU for crunching on other CPU projects).

According to this post IBUCH_PYRT tasks do not have the credit compensation you've mentioned. Even so the NATHAN_FA4 tasks have lower credit/time ratio than IBUCH_PYRT tasks, while they have 40% more GPU usage.
Back in the times of CPU only crunching I was know when my tasks were failing during processing by monitoring my hosts' RAC, but nowadays this much diversity in the credit/time ratio between different GPU tasks makes it difficult and time consuming.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3506
Credit: 914,290,807
RAC: 1,147,265
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22468 - Posted: 4 Nov 2011 | 0:16:50 UTC - in response to Message 22465.

Perhaps credit compensation is not being applied to some of the tasks in the way we are used to?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile nate
Volunteer moderator
Project scientist
Send message
Joined: 6 Jun 11
Posts: 119
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwat
Message 22470 - Posted: 4 Nov 2011 | 6:25:45 UTC

My apologies, everyone. The lower credit on my tasks is in part due to a short command I was not aware I needed to include for long tasks. New guy mistake. I'll try to get it fixed as soon as I can. Thanks for pointing this out.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 281
Credit: 428,000,826
RAC: 821,397
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 22471 - Posted: 4 Nov 2011 | 13:36:35 UTC - in response to Message 22470.

Sorry Nate,

I'm aborting your tasks because they use 98% GPU and make it difficult to do anything else when they are running.
____________

Profile SMTB1963
Avatar
Send message
Joined: 27 Jun 10
Posts: 35
Credit: 142,831,243
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 22472 - Posted: 4 Nov 2011 | 15:58:14 UTC

I was wondering about the points awarded for these new tasks as well. I wouldn't normally worry about it too much, except that our team (XtremeSystems) is having a GPUGRID event this week.

Nate, would you mind posting here when you get it fixed? Thanks!

Best Regards

Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 22483 - Posted: 7 Nov 2011 | 20:37:31 UTC

Hi Betting Slip,

I see that you cancelled a task. (I1R64-NATHAN_FA4-4-100-RND6005_0)

And I was surprised about the use of the NATHAN work units. My GTX 580 is used at 92 %
but, I also see a high use of memory.
So I think this is what is cousing you problems.

This is a log (from GPUZ) for a NATHAN WU

Date = GMT+2

Date , GPU Core Clock [MHz] , GPU Memory Clock [MHz] , GPU Shader Clock [MHz] , GPU Temperature [°C] , Fan Speed (%) [%] , Fan Speed (RPM) [RPM] , Memory Used [MB] , GPU Load [%] , Memory Controller Load [%] , Video Engine Load [%] , VDDC [V] ,

2011-11-06 02:29:01 , 781.1 , 1002.0 , 1564.0 , 75.0 , 30 , 1650 , 1310 , 91 , 23 , 0 , 1.0630 ,

It is using 1310 MB of memory, so I have 226 MB (1536 - 1310) to show some pixels on the screen :)

comparing with GIANNI_KKFREE5 WU it only uses 532 MB.

Your GeForce GTX 460 has only 1024 MB so I expect that you will have some problems with this WU.

Nate:
Is it correct what am I thinking?

Best regards,
Alejandro


Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3506
Credit: 914,290,807
RAC: 1,147,265
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22485 - Posted: 7 Nov 2011 | 22:47:38 UTC - in response to Message 22483.
Last modified: 7 Nov 2011 | 23:18:56 UTC

Alejandro, 1310MB sounds a bit excessive, but GPUZ will not show how much the task is using, it shows how much everything is using. If you close Boinc then you can work out what the system and everything else is using. Are you using Aero? How much do your other tasks use? I typically see about 492MB DDR5 when running long tasks on a GTX470 on XP/2003. As the amount of GDDR5 used depends on the number of active shaders, less is required for lesser cards.

PS. Your GPU is at 75degC but your fan is only at 30%. I suggest you use one of the recommended programs to increase your fan speed and reduce your GPU temperature. You should be able to run a GTX580 at 55 to 60 degC and run the fan at <60%
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile nate
Volunteer moderator
Project scientist
Send message
Joined: 6 Jun 11
Posts: 119
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwat
Message 22487 - Posted: 8 Nov 2011 | 21:26:32 UTC - in response to Message 22485.

Hey all,

I decided to just kill that entire set of jobs last friday since I started it only a few days before. Though, any that were in progress or sent out before I cancelled the series would still be completed (the data is still useful). I re-submit the continued jobs today under NATHAN_FA5. I believe I have solved the problems, but let me know ASAP if any problems arise.

As to your specific questions/comments:

BettingSlip: No worries. You guys should never feel compelled to complete a task that is causing you problems. Do what you must and let us know about it so we can try to fix it.

SMTB1963: The points were unnecessarily low and the usage too high on these tasks due to a command that is not automatically active in the long tasks. I was unaware of this, and sent them without this command. :-/

Alejandro, BettingSlip, et al: It is possible you are correct, Alejandro, and I am looking into this now. The system I am running is quite large, in fact the largest we have ever sent to GPUGRID, and the memory usage is substantial. I had no issues running locally with our various 400 and 500 series cards when I set up the system, but obviously there are a large variety of cards I can't test. It is possible that cards with only 1024MB will have little or no free memory with my tasks. I will look at that tomorrow when I get in and try to post guidance as soon as possible.

Once again, I'd like to apologize for these issues. We strive to make crunching for GPUGRID as seamless and painless as possible (we all benefit). I will update our SOP here so we can avoid the low credit/high GPU usage issue in the future, and I will also include a section about memory considerations for large systems.

Nate

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 281
Credit: 428,000,826
RAC: 821,397
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 22489 - Posted: 9 Nov 2011 | 8:01:20 UTC - in response to Message 22487.

Appreciate that Nate. I don't think it was a memory issue as when I checked your running units where using about 500 MB. more a GPU usage or priority issue. Have seen this before when someone here altered priority to get more efficient use of badly configured systems and ended up with a similar problem or it could just be with using 98% there is nothing left for the system to do other work.

Anyway, whatever it is I'm sure you'll sort it out.
____________

Profile nate
Volunteer moderator
Project scientist
Send message
Joined: 6 Jun 11
Posts: 119
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwat
Message 22492 - Posted: 9 Nov 2011 | 13:28:06 UTC
Last modified: 9 Nov 2011 | 13:31:13 UTC

Hi all,

I wanted to update on the possible memory issues that were being discussed. I have run some tests here and discussed it with the more hardware/software focused members of our team, and memory should not be a problem. In short, the software is capable of adjusting its function to accommodate a wide range of memory capacities. For example, while running my large system on a GTX 580 here, the memory usage is 1138MB out of 1.5GB. But the exact same system run on a GTX 275 uses only 580MB of 1024GB. The code adjusts to the memory available (with a small performance sacrifice for lower memory capacities). Perhaps we'll post more on this later, but based on current empirical results and the (expected) function of the code, it should not cause any unexpected or uncontrolled performance issues on your cards. But as with everything, bugs, errors, and mistakes are possible so keep posting here when you have problems.

Happy crunching
Nate

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 22510 - Posted: 11 Nov 2011 | 17:24:15 UTC - in response to Message 22492.
Last modified: 11 Nov 2011 | 17:32:54 UTC

Unlike then, more recent *PYRT_111019* tasks DO contain credit compensation. However, compared to *nmEGFR* that Zoltan was pointing out, they don't contain a simulation-related perfomance factor that gives ~30% increase. It is such way as PYRT WUs are reproduction jobs of old TRYP experiments. And those, didn't use the perfomance factor... These are the drawbacks of trying to do rigorous science. :)

cheers,
ignasi

Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 22511 - Posted: 11 Nov 2011 | 19:13:17 UTC

Hi Skgiven, Nate,

I disabled aero, and incremented the fan speed to 40%

NATHAN_FA2 with aero, fan 40%
2011-11-11 19:28:48 , 781.1 , 1002.0 , 1564.0 , 68.0 , 40 , 2040 , 1322 , 92 , 22 , 0 , 1.0630 ,


NATHAN without aero , fan 40%

2011-11-11 19:28:58 , 781.1 , 1002.0 , 1564.0 , 68.0 , 40 , 2040 , 1275 , 91 , 23 , 0 , 1.0630 ,

fan 50%
2011-11-11 20:03:50 , 781.1 , 1002.0 , 1564.0 , 65.0 , 50 , 2430 , 1313 , 93 , 23 , 0 , 1.0630 ,

fan 61%
2011-11-11 20:08:25 , 781.1 , 1002.0 , 1564.0 , 62.0 , 61 , 2790 , 1272 , 93 , 23 , 0 , 1.0630 ,


The temperature goes from 75C (fan on auto mode) to 68C (fan set to 40%). I tried to put the fan at 100% , but it sounds like a vacuum cleaner :)


GIANNI_KKFREE5 with aero and fan on auto mode.
2011-11-11 13:19:30 , 781.1 , 1002.0 , 1564.0 , 65.0 , 26 , 1500 , 746 , 64 , 16 , 0 , 1.0630 ,

Alejandro

Profile SMTB1963
Avatar
Send message
Joined: 27 Jun 10
Posts: 35
Credit: 142,831,243
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 22541 - Posted: 18 Nov 2011 | 19:03:59 UTC - in response to Message 22487.

...

SMTB1963: The points were unnecessarily low and the usage too high on these tasks due to a command that is not automatically active in the long tasks. I was unaware of this, and sent them without this command. :-/

...


Well Nate, you definitely fixed the issue! My 570s are cranking the points on your tasks now.

Best

Post to thread

Message boards : Number crunching : Granted credit for Long NATHAN_FA4 workunit is way too low