Advanced search

Message boards : Graphics cards (GPUs) : Beware: BOINC 7.0.45 and later

Author Message
Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28533 - Posted: 15 Feb 2013 | 15:50:03 UTC

I was wondering why my GPU WUs take longer time than before, then I found it.
I installed latest beta client v7.0.52 and BOINC 7.0.45 and later according to change log applies the CPU throttling % to GPUs now too! Since I had this set to 80% the GPU was not fully utilized.
It would be nice if the CPU and GPU utilization was separated in the client though.

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28534 - Posted: 15 Feb 2013 | 16:49:15 UTC - in response to Message 28533.
Last modified: 15 Feb 2013 | 16:49:27 UTC


It would be nice if the CPU and GPU utilization was separated in the client though.


Maybe open a ticket on the BOINC bugpage?
____________

Team Belgium

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28539 - Posted: 16 Feb 2013 | 5:57:31 UTC

Oh believe me we requested it. Unfortunately Dr A didn't want to do it properly and have a separate preference for GPU. The more people that ask for it the more likely it will get added.

By the way I don't throttle my machines, I figure if they are too hot then I need better cooling or wait until weather conditions improve for crunching (its the middle of summer in Sydney, so it gets hot).
____________
BOINC blog

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28540 - Posted: 16 Feb 2013 | 9:14:17 UTC - in response to Message 28539.
Last modified: 16 Feb 2013 | 9:18:06 UTC

I think I read those requests for a GPU throttle and I believe the response was it's not possible through BOINC which to me translates to "it's possible but it's so messy we don't want to do it". We need to keep in mind BOINC needs to run on 3 different OSs so what seems trivial at first isn't always so.

BOINC's CPU throttle is a very poor throttle to begin with. How bad is it? Well, even the BOINC devs recommend that you don't use it. If one needs a CPU throttle then one should use TThrottle, the link is available at the BOINC dev forums. It's Windows only. For Linux CPU throttle I don't know, there's probably a way to do it with some simple bash script and psensors but why not just improve the cooling instead.

For GPU throttle I ran across a simple bash script that reads the GPU temp from the supplied GPU manager app and adjusts the clock down if the GPU is too hot or adjusts it up if too cool. It's actually for AMD but it would be easy enough to adapt to nVIDIA and to Windows as well, I'm sure. Actually with nVIDIA the route to go would be adjust the fan up, an option that doesn't seem to exist with AMD.
____________
BOINC <<--- credit whores, pedants, alien hunters

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28542 - Posted: 16 Feb 2013 | 12:50:42 UTC - in response to Message 28540.

The simple and blatantly obvious answer to 'cant separate CPU and GPU crunching in Boinc' is two different programs; One for CPU and one for GPU.

In Linux you can run two instances of Boinc and set up a very reliable system that way.

Another solution for GPU crunchers that do a bit of CPU crunching is to only run the GPU in Windows and use a Linux VM to run some CPU tasks. What's good about this is that the CPU apps that crash out in Windows tend to run better in Linux, and even if they mess up your VM the Windows system running GPU tasks won't be impacted.

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28549 - Posted: 17 Feb 2013 | 0:30:43 UTC - in response to Message 28542.

Two or even more instances of BOINC are possible on Linux and in the T4T forums Crystal Pellet claims he is able to do so on Windows as well though I've never been able to replicate his results, probably because I didn't try very hard.

I'm not sure I would agree multiple instances of BOINC is the simplest solution or even the most effective. In fact I doubt whether CPU and GPU throttling can be separated entirely since the GPU depends entirely upon the CPU to feed it data and to dispose of the results of operations upon that data. Therefore for tasks which require a lot of interaction between CPU and GPU, if you throttle the CPU you indirectly throttle the GPU as well.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28555 - Posted: 17 Feb 2013 | 11:37:19 UTC - in response to Message 28549.
Last modified: 17 Feb 2013 | 11:39:27 UTC

Many GPU projects are now telling Boinc that they need a full CPU. This avoids a lot of unwanted issues that had previously arisen, but it requires cross-project participation and adherence to 'the rules' (there aren't any). The situation is improving but some projects still operate in a way that stresses other projects.

CPU project adherence to good principles of crunching is an issue but the demands from different GPU projects is a different type of problem. There are very different GPU project system requirements (high vs low CPU requirements, high or low PCIE requirements, system memory bandwidth, GPU GDDR usage...). The impacts of these on each other and CPU projects isn't something that Boinc can readily ascertain never mind accommodate. You would need Boinc to be kitted out with a tool that can read these things (something I asked for years ago).
So if you run several CPU and GPU projects from one instance of Boinc, you're going to land in all sorts of trouble. For example, a GPU project such as POEM starts running, is setup to run several tasks, suddenly starts to use all the CPU cores, does some disk reading and writing (just when the CPU projects want to do this) and pop, the system or Boinc crashes. Run one GPU project and you can watch video, play games... Run another GPU project and you struggle to even web browse.

An easy way to have two instances configured in Linux, is to tell one to only run GPU tasks and set it to use a specific number of CPU cores. Then tell the other client to use the remaining CPU cores. This way the scheduler doesn't do stupid things like stop running GPU tasks so that badly packaged CPU tasks can run. If your GPU is only attached to one project then you're not going to experience 'not highest priority' issues. If you attach to more projects (say with an ATI card) then increase the cache a bit, to allow Boinc to sort its feet out. With a very low cache Boinc has 'bar-stool moments' (when someone who appears ok, tries to walk after sitting on a bar-stool too long). Too high a cache and it's all over the place, disk I/O soars, RAM usage rises and some projects don't get a look in.

I guess you could even set a GPU app to be an exclusive application on one instance, to stop the GPU being used in another; use different Boinc instances for different GPU project and thus exclude GPU crunching when one GPU app is running and you start a video, but allow the other GPU app to run.

Anyway, you have to know your projects and the demands they make on your system. I struggle to work these out, most people don't know much, and Boinc hasn't a notion.

We could probably do with a chart that discloses system requirements by different GPU projects. For example,

Project ---- Requirements
Name ---- CPU ----- Sys RAM Freq - PCIE --- Power/Heat
GPUGrid -- High ----- High -------------- Med ----- High (one task)
POEM ----- V.High -- V.High ------------ V.High -- Med (multiple tasks)
MW -------- Low ----- Low --------------- Low ---- High (one task)
Albert ------ Low ----- Low --------------- Low ---- Med (multiple tasks)
Einstein --- Low ----- Low --------------- Low ---- Med (multiple tasks)
...

Of course this varies considerably depending on the number of tasks you have running and GPU type. So it would need to be more detailed and contain GDDR usage (for multiple task/project crunching).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28556 - Posted: 17 Feb 2013 | 11:39:11 UTC

This time-based throttling is very ineffective anyway. It's like driving your car at 6000 rpm, then seeing that the load on the engine is too high.. and instead of running it at 5500 or 5000 rpm you still run at 6000 rpm most of the time and no throttle every few seconds.

To go back to the PC world: if your GPU is too hot / loud, on modern nVidia GPUs (600 series) you canset it to consume less power. This will not only throttle clocks but also voltage automatically, which is far more efficient than both, pure clock throttling and on/off throttling.

Intel could implement something similar with their user-configurable TDP (cTDP) and Turbo.. they'd just have to expose the fuinctionality to the actual user, not only the OEM.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28568 - Posted: 17 Feb 2013 | 18:22:38 UTC - in response to Message 28556.

While the time-based throttling might not be the best method in terms of scheduling and task processing, it's universal for any kind of CPU. Moreover, even if the CPU is idle 1/10 of time, it's able to cool down during this short period. cTDP is only available on few certain CPU models and there are other factors limiting CPU power limiting (PL1, PL2).

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28571 - Posted: 17 Feb 2013 | 19:54:42 UTC - in response to Message 28555.
Last modified: 17 Feb 2013 | 19:56:19 UTC

Many GPU projects are now telling Boinc that they need a full CPU. This avoids a lot of unwanted issues that had previously arisen, but it requires cross-project participation and adherence to 'the rules' (there aren't any). The situation is improving but some projects still operate in a way that stresses other projects.


I have a somewhat related question. I am running both a GTX 560 and a GTX 650 Ti on the same motherboard, with only long que jobs selected. The GTX 560 gets along with only 5% CPU utilization (E8400 dual core at 3.0 GHz), and allows WCG jobs to run on that core at the same time. But the GTX 650 Ti reserves a whole core (50% CPU), and does not allow other projects to run on that core.

Does anyone know the reason why?

(I am running BOINC 7.0.52 x64 on Win7 64-bit, Nvidia 310.90 drivers.)

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 28573 - Posted: 17 Feb 2013 | 19:58:33 UTC - in response to Message 28571.

Are you allowing BOINC to use all CPU time?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28574 - Posted: 17 Feb 2013 | 20:03:48 UTC - in response to Message 28571.
Last modified: 17 Feb 2013 | 20:07:44 UTC

Different GPU architecture. On the GTX600 cards more CPU usage is required than with the previous generation of GPU. You would need to ask the researchers exactly what they are running on the CPU.
It's also worth noting that leaving a CPU thread free for the GPU tends to result in faster GPU runs, faster CPU runs on the other CPU threads and less errors all round. For the stability reason GPU projects are encouraged to set 1 CPU core aside for the GPU project.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28575 - Posted: 17 Feb 2013 | 20:04:22 UTC - in response to Message 28571.


I have a somewhat related question. I am running both a GTX 560 and a GTX 650 Ti on the same motherboard, with only long que jobs selected. The GTX 560 gets along with only 5% CPU utilization (E8400 dual core at 3.0 GHz), and allows WCG jobs to run on that core at the same time. But the GTX 650 Ti reserves a whole core (50% CPU), and does not allow other projects to run on that core.

Does anyone know the reason why?

(I am running BOINC 7.0.52 x64 on Win7 64-bit, Nvidia 310.90 drivers.)


I have a GTX 650 Ti too and the same BOINC version. Long tasks (NOELIA) reserve only 0.594 CPU. So I'm currently running WCG CPU tasks on all CPU cores + NOELIA on GPU.
Maybe you have an app_config.xml file not properly set?

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28576 - Posted: 17 Feb 2013 | 20:04:33 UTC - in response to Message 28568.

While the time-based throttling might not be the best method in terms of scheduling and task processing, it's universal for any kind of CPU.


True but it still sucks so bad even the BOINC devs recommend using TThrottle instead. It works at the OS level rather than the app level. It's the kind of throttling the BOINC devs would like to do but it requires different code for each OS so they decided to not do it and give us the crappy app level throttling instead. If the president of Ford Motor Company said "our cars suck, don't buy them, buy a Chevy instead" would you then buy a Ford?

Moreover, even if the CPU is idle 1/10 of time, it's able to cool down during this short period.


Yeah but then it heats up again when it's not idle and you get a continuous cycling between hot and cold which induces cyclic expansion and contraction which is a known cause of hardware failure.

TThrottle gives much finer grained on/off or idle/run cycles which yields a far more even temperature and virtually eliminates expansion/contraction. And it will allow you to run any version of BOINC and still have CPU throttling independent of GPU throttling. Unfortunately it only runs on Windows but that seems to be your OS anyway.

____________
BOINC <<--- credit whores, pedants, alien hunters

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28577 - Posted: 17 Feb 2013 | 20:11:48 UTC - in response to Message 28573.

Are you allowing BOINC to use all CPU time?

Yes, 100% on both cores.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28578 - Posted: 17 Feb 2013 | 20:15:27 UTC - in response to Message 28575.

I have a GTX 650 Ti too and the same BOINC version. Long tasks (NOELIA) reserve only 0.594 CPU. So I'm currently running WCG CPU tasks on all CPU cores + NOELIA on GPU.
Maybe you have an app_config.xml file not properly set?

I have no app_config.xml, but am using a cc_config.xml to get both cards to run:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

That might have something to do with it. Whether it is a bug or feature I have no idea.



Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28579 - Posted: 17 Feb 2013 | 20:17:31 UTC - in response to Message 28574.

Different GPU architecture. On the GTX600 cards more CPU usage is required than with the previous generation of GPU. You would need to ask the researchers exactly what they are running on the CPU.
It's also worth noting that leaving a CPU thread free for the GPU tends to result in faster GPU runs, faster CPU runs on the other CPU threads and less errors all round. For the stability reason GPU projects are encouraged to set 1 CPU core aside for the GPU project.


Yes, I have found that on my CPU, leaving it free is a good idea for various reasons. Even my Video LAN player does not like it when I use both cores. The deck may be stacked differently when Haswell comes along.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28580 - Posted: 17 Feb 2013 | 21:32:52 UTC - in response to Message 28568.
Last modified: 17 Feb 2013 | 21:37:21 UTC

Sure, cTDP is not yet the solution. However, it would only be a matter of marketing (i.e. the will to implement this), the technology is already there and is in the chips anyway.

What do you mean by PL1 and PL2? The predefined power states?

Edit@Jim: if I remember correctly GPU-Grid decided that the 600 series GPUs were becoming so fast, that not reserving an entire core would slow the GPU down too much.

MrS
____________
Scanning for our furry friends since Jan 2002

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28581 - Posted: 17 Feb 2013 | 22:02:26 UTC - in response to Message 28580.

Edit@Jim: if I remember correctly GPU-Grid decided that the 600 series GPUs were becoming so fast, that not reserving an entire core would slow the GPU down too much.

MrS


Thanks. I vaguely remember seeing something along those lines too, but couldn't find it in a search. It will all be irrelevant when Haswell comes along and provides all the cores I need anyway.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28582 - Posted: 17 Feb 2013 | 22:23:37 UTC - in response to Message 28580.
Last modified: 17 Feb 2013 | 22:40:34 UTC


What do you mean by PL1 and PL2? The predefined power states?


PL1/PL2 are long and short duration Power Limits for limiting turbo boost in Intel CPUs. These work dynamically by limiting clocks using EWMA of measured actual CPU power (IMON). Additionally there's an "On-demand clock modulation" feature in most CPUs, which provides kind of static throttling (unlike dynamic power throttling using PLs).


Edit@Jim: if I remember correctly GPU-Grid decided that the 600 series GPUs were becoming so fast, that not reserving an entire core would slow the GPU down too much.


Then why does it for me reserve "0.594 CPUs + 1 NVIDIA GPU" on 650 Ti ?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28584 - Posted: 17 Feb 2013 | 22:36:05 UTC - in response to Message 28582.
Last modified: 17 Feb 2013 | 22:36:37 UTC

Mumak,

Others can answer that better than I can, but I do know that my E8400 Core2 Duo handles GPUs somewhat differently than my i5-3550 Ivy Bridge does. For example, I can run multiple POEM tasks on a single Ivy Bridge core for my HD 7770 by using an app_info (or app_config) file, but when I try that trick on my E8400 it insists on using two separate cores. So it seems that those CPUs handle multiple threads (or whatever they are) differently. I wouldn't generalize from my experience to all CPUs.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28585 - Posted: 17 Feb 2013 | 22:39:41 UTC - in response to Message 28584.
Last modified: 17 Feb 2013 | 22:43:16 UTC

Mumak,

Others can answer that better than I can, but I do know that my E8400 Core2 Duo handles GPUs somewhat differently than my i5-3550 Ivy Bridge does. For example, I can run multiple POEM tasks on a single Ivy Bridge core for my HD 7770 by using an app_info (or app_config) file, but when I try that trick on my E8400 it insists on using two separate cores. So it seems that those CPUs handle multiple threads (or whatever they are) differently. I wouldn't generalize from my experience to all CPUs.


That could be the reason... Maybe those tasks depend on CPU type (or certain features) and then the system decides how much CPU resources are required. My CPU is an i5-750 (Lynnfield, no HT).
But I don't have much experience with GPUGrid yet - I joined only 5 days ago...

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28587 - Posted: 18 Feb 2013 | 4:58:37 UTC - in response to Message 28555.
Last modified: 18 Feb 2013 | 5:03:39 UTC

Many GPU projects are now telling Boinc that they need a full CPU. This avoids a lot of unwanted issues that had previously arisen, but it requires cross-project participation and adherence to 'the rules' (there aren't any). The situation is improving but some projects still operate in a way that stresses other projects.


I've been saying it for a long time... we crunchers hold more power than all of the projects and David Anderson combined but we don't exercise it. All we have to do is open discussions, hear the concerns, establish some rules and then ostracize any project that doesn't wanna play nice. It's our hardware, we pay the power bills, we spend time installing, configuring and fixing stuff when the scheduler throws a wobbly. We should have a major say in how things work and what projects are allowed to do and if they don't like it they can go get their CPU cycles from someone else. I think David would support that along with 95% of the projects. The 5% that won't will when they realize the consequences. I think a lot of people are starting to realize the train is off the rails and something needs to be done. The difficult part will be getting crunchers to change their attitude from the prevailing "oh I am just so privileged to have you scientist gods use and abuse my hardware, power and time any way you want" to something more realistic that recognizes the needs of all the projects and all the crunchers. And in the end maybe it doesn't matter if they change. The power belongs to those who take it and rightfully so. We attempt to establish a participatory democracy, we nurture that and grow it always but if it doesn't happen then a benevolent cadre takes power and wields it as it sees fit in consultation with the projects. Get the right players in the cadre and there will no arm that cannot be twisted.

CPU project adherence to good principles of crunching is an issue but the demands from different GPU projects is a different type of problem. There are very different GPU project system requirements (high vs low CPU requirements, high or low PCIE requirements, system memory bandwidth, GPU GDDR usage...). The impacts of these on each other and CPU projects isn't something that Boinc can readily ascertain never mind accommodate. You would need Boinc to be kitted out with a tool that can read these things (something I asked for years ago).


Let's get the discussions going and let all the players explain what they need and what their concerns are. There can be a consensus and agreement upon what needs to be done and rules established and procedures for punting rogues off the playing field. I think if we give David Anderson that then he will respond with appropriate code and kit the client out as you describe. The situation now is mayhem... how can he code for that? It needs rules as well as code. And if he doesn't rise to the task then we fork a branch and code it ourselves. Easy? Hell no, it'll take a lot of time, effort and dialogue. But it needs doing and soon. we can have anything we want including top notch in built throttling for CPU and GPU independent. The only limit is our imagination, time and the willingness to maintain it after it's coded. Not easy but we are a big community chock full of talent.

So if you run several CPU and GPU projects from one instance of Boinc, you're going to land in all sorts of trouble. For example, a GPU project such as POEM starts running, is setup to run several tasks, suddenly starts to use all the CPU cores, does some disk reading and writing (just when the CPU projects want to do this) and pop, the system or Boinc crashes. Run one GPU project and you can watch video, play games... Run another GPU project and you struggle to even web browse.


I don't have the technical expertise to supply the answers/fixes but I suspect many others here do. And if there is nobody then we recruit the talent we need or we form a study group to research and find the answers ourselves. We call in AMD, nVIDIA and Intel if we have to. The other option is do nothing and continue to let the train plough up the dirt beside the rails it's supposed to be on.

An easy way to have two instances configured in Linux, is to tell one to only run GPU tasks and set it to use a specific number of CPU cores. Then tell the other client to use the remaining CPU cores. This way the scheduler doesn't do stupid things like stop running GPU tasks so that badly packaged CPU tasks can run. If your GPU is only attached to one project then you're not going to experience 'not highest priority' issues. If you attach to more projects (say with an ATI card) then increase the cache a bit, to allow Boinc to sort its feet out. With a very low cache Boinc has 'bar-stool moments' (when someone who appears ok, tries to walk after sitting on a bar-stool too long). Too high a cache and it's all over the place, disk I/O soars, RAM usage rises and some projects don't get a look in.


That would be an option too except that 90% or more of the crunchers are lemmings firmly in the control of the Gates Crime Family and won't have anything to do with Linux, won't even discuss it. As you are well aware they won't even run Linux in a VM unless it's at T4T where they don't have to stick their little paws in it.

I guess you could even set a GPU app to be an exclusive application on one instance, to stop the GPU being used in another; use different Boinc instances for different GPU project and thus exclude GPU crunching when one GPU app is running and you start a video, but allow the other GPU app to run.


Very clever. I see you've been giving this all a lot of thought. I might tinker with some of those ideas myself soon. Got my GTX 570 and now my AMD 7970 to experiment with and putting together an order for more.

Anyway, you have to know your projects and the demands they make on your system. I struggle to work these out, most people don't know much, and Boinc hasn't a notion.

We could probably do with a chart that discloses system requirements by different GPU projects. For example,

Project ---- Requirements
Name ---- CPU ----- Sys RAM Freq - PCIE --- Power/Heat
GPUGrid -- High ----- High -------------- Med ----- High (one task)
POEM ----- V.High -- V.High ------------ V.High -- Med (multiple tasks)
MW -------- Low ----- Low --------------- Low ---- High (one task)
Albert ------ Low ----- Low --------------- Low ---- Med (multiple tasks)
Einstein --- Low ----- Low --------------- Low ---- Med (multiple tasks)
...

Of course this varies considerably depending on the number of tasks you have running and GPU type. So it would need to be more detailed and contain GDDR usage (for multiple task/project crunching).


That's good stuff, I like it! That chart should be developed further and published in a prominent place like the official BOINC wiki. I can do that. Actually anybody can do it but I happen to have the account and password already. Want it in the wiki? I'll write it, you proof read it, let me know when it needs updating. Tell me what to test and how and I'll help with that too.
____________
BOINC <<--- credit whores, pedants, alien hunters

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28599 - Posted: 18 Feb 2013 | 21:37:42 UTC - in response to Message 28582.

Then why does it for me reserve "0.594 CPUs + 1 NVIDIA GPU" on 650 Ti ?

How much CPU is it actually using? I bet one full logical core, as is the case on my i7 with a GTX660Ti. BOINC says "0.69 CPU + 1 GPU" but those are meaningless numbers. It's enough to make the BOINC scheduler not assign another task to this core.
Not sure if it would be better if GPU-GRid actually set this to 1.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28603 - Posted: 18 Feb 2013 | 21:47:06 UTC - in response to Message 28599.

Hard to say how much CPU is really utilized, because I'm running WCG CPU tasks there too. But the CPU has 4 cores (no HT) and WCG runs on all CPU cores + NOELIA on the GPU.
If I would run a WCG GPU task there (which requires 1 CPU + 1 GPU), then 3 cores would be utilized by WCG CPU tasks and one CPU core would be reserved for GPU.

Then why does it for me reserve "0.594 CPUs + 1 NVIDIA GPU" on 650 Ti ?

How much CPU is it actually using? I bet one full logical core, as is the case on my i7 with a GTX660Ti. BOINC says "0.69 CPU + 1 GPU" but those are meaningless numbers. It's enough to make the BOINC scheduler not assign another task to this core.
Not sure if it would be better if GPU-GRid actually set this to 1.

MrS

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28604 - Posted: 18 Feb 2013 | 22:42:43 UTC - in response to Message 28603.

Dagorath, I think we have hijacked this thread for long enough. Certainly warrants another thread, if not site.

MrS, I think we need a readily configurable option to assign CPU core count to GPU apps; sometimes 0.25 CPU cores is sufficient to accommodate 2 or more GPU tasks (MW), while other GPU tasks (including some that might run here) could do with more than one core. Am I thread-jacking again?

Mumak, suspend your CPU projects and look at Task Manager. That way you will know exactly how much CPU is required for your GPUGrid WU.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28609 - Posted: 19 Feb 2013 | 6:32:12 UTC - in response to Message 28604.
Last modified: 19 Feb 2013 | 6:32:37 UTC

with app_info.xml or app_config.xml you can specify manual cpu/gpu count per task. app_info is pretty confusing but app_config seems to be much simpler. but i don't use it I just see it being discussed over on xtremesystems WCG section.
____________
XtremeSystems.org - #1 Team in GPUGrid

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28610 - Posted: 19 Feb 2013 | 8:00:30 UTC

So I suspended CPU WCG tasks and let only ACEMD_LONG run. Now this is interesting - the task utilizes 2 cores: 1 for ~50% and another one ~30% (up/down).
So I think the default resource allocation requested by these tasks is wrong.
I used app_config to assign 1 CPU to these tasks and will see how it performs when it finishes the next task. Will let you know...

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28611 - Posted: 19 Feb 2013 | 12:08:25 UTC - in response to Message 28610.

So I suspended CPU WCG tasks and let only ACEMD_LONG run. Now this is interesting - the task utilizes 2 cores: 1 for ~50% and another one ~30% (up/down).


Everybody talks as if apps start running on a certain core and never change to another core but that isn't the way it actually works. The task scheduler in the OS shifts apps/tasks around from one core to another according to a very complicated scheduling algorithm designed to use resources in an optimal way to keep the work flowing as fast/efficiently as possible. Not talking about the BOINC scheduler, talking about the OS's task scheduler. BOINC can specify to the OS that an app should run on 2 cores, 1 core, 5 cores or whatever but it cannot say which cores. In a pre-emptive multi-tasking OS it is possible that for brief periods of time an app, any app not just BOINC apps, actually isn't executing on any cores at all.
____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28612 - Posted: 19 Feb 2013 | 12:47:41 UTC - in response to Message 28611.

There's an opportunity to assign a software thread to a particular CPU thread using the CPU Affinity Mask (SetThreadAffinityMask, SetThreadIdealProcessor, SetThreadIdealProcessorEx, SetThreadGroupAffinity), so I disagree that BOINC is unable to do this. The question is whether it does that, but that's easy to determine using Task Manager...


Everybody talks as if apps start running on a certain core and never change to another core but that isn't the way it actually works. The task scheduler in the OS shifts apps/tasks around from one core to another according to a very complicated scheduling algorithm designed to use resources in an optimal way to keep the work flowing as fast/efficiently as possible. Not talking about the BOINC scheduler, talking about the OS's task scheduler. BOINC can specify to the OS that an app should run on 2 cores, 1 core, 5 cores or whatever but it cannot say which cores. In a pre-emptive multi-tasking OS it is possible that for brief periods of time an app, any app not just BOINC apps, actually isn't executing on any cores at all.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28618 - Posted: 19 Feb 2013 | 19:08:25 UTC

Fixing affinity usually doesn't yield any benefits for BOINC crunching. THe reason is that the OS scheduler works on the time scale of ms, which is an eternity from the viewpoint of a CPU (1 million times slower at 1 GHz). There are coner-cases where where manually fixed core affinity did help considerably.. but this mainly applied to ill-balanced older multi-cpu hardware.

Mumak, take a look at the "processes" tab of the task manager. The CPU column shows the CPU utilization in percentage of the entire CPU (100% = all cores at once), irregardless of which core the app is running on. I bet this will show 25% for your i5 if you suspend WCG.

MrS
____________
Scanning for our furry friends since Jan 2002

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28620 - Posted: 19 Feb 2013 | 20:54:21 UTC - in response to Message 28612.

There's an opportunity to assign a software thread to a particular CPU thread using the CPU Affinity Mask (SetThreadAffinityMask, SetThreadIdealProcessor, SetThreadIdealProcessorEx, SetThreadGroupAffinity), so I disagree that BOINC is unable to do this. The question is whether it does that, but that's easy to determine using Task Manager...


Everybody talks as if apps start running on a certain core and never change to another core but that isn't the way it actually works. The task scheduler in the OS shifts apps/tasks around from one core to another according to a very complicated scheduling algorithm designed to use resources in an optimal way to keep the work flowing as fast/efficiently as possible. Not talking about the BOINC scheduler, talking about the OS's task scheduler. BOINC can specify to the OS that an app should run on 2 cores, 1 core, 5 cores or whatever but it cannot say which cores. In a pre-emptive multi-tasking OS it is possible that for brief periods of time an app, any app not just BOINC apps, actually isn't executing on any cores at all.



Thank you. I stand corrected.


____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28621 - Posted: 19 Feb 2013 | 22:22:57 UTC

I still don't understand where the "0.594 CPUs" number comes from.. Any idea?
Also, what does the "Maximum CPU % for graphics" setting in GPUGRID preferences exactly mean?

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28624 - Posted: 20 Feb 2013 | 1:32:40 UTC - in response to Message 28621.

I still don't understand where the "0.594 CPUs" number comes from.. Any idea?


I've always thought the number referred to how much of 1 core's time is needed to drive the GPU?

Also, what does the "Maximum CPU % for graphics" setting in GPUGRID preferences exactly mean?


Many projects have that setting and I think it refers to how much CPU time should be allocated to their screensaver.

____________
BOINC <<--- credit whores, pedants, alien hunters

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28641 - Posted: 20 Feb 2013 | 19:52:33 UTC - in response to Message 28621.

I still don't understand where the "0.594 CPUs" number comes from.. Any idea?

It's automatically generated, I think by the GPU-Grid app by some algorithm, and passed to BOINC to display it.

MrS
____________
Scanning for our furry friends since Jan 2002

S@NL - JBG
Send message
Joined: 19 Sep 12
Posts: 1
Credit: 15,998,800
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 28701 - Posted: 23 Feb 2013 | 13:21:35 UTC - in response to Message 28621.
Last modified: 23 Feb 2013 | 13:28:36 UTC

I still don't understand where the "0.594 CPUs" number comes from.. Any idea?


i think that the number "0.594" means that 1 GPU is using aslo a bit more than 50% CPU to crunch correctly

For example my own rig is using the same amouth %cpu power..
Iam only crunching the 6.17 Long runs 8-12 hours wu versions
My GTX 690 is crunching 2 wus at once and is using 1 cpu in total for that.

Here you have a copy off the boinctasks:

GPUGRID 6.17 Long runs (8-12 hours on fastest card) (cuda42) Ann093_r1-TONI_AGGd8-7-100-RND8194_0 02:41:07 (02:32:40) 94.76 42.137 05:36:39 04d,14:59:09 0.818C + 1NV (d1)

GPUGRID 6.17 Long runs (8-12 hours on fastest card) (cuda42) nn195_r1-TONI_AGGd8-23-100-RND7363_0 01:57:19 (01:50:41) 94.34 27.512 08:39:14 04d,14:59:09 0.818C + 1NV (d0)


Iam using the program BoincTasks to overview all my crunching
And to keep the temperture of the CPU/GPU in hand iam also using
the addon programm TThrottle..
BoincTask - http://www.efmer.eu/boinc/boinc_tasks/index.html
TThrottle - http://www.efmer.eu/boinc/index.html

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28704 - Posted: 23 Feb 2013 | 15:28:26 UTC - in response to Message 28701.
Last modified: 23 Feb 2013 | 15:35:09 UTC

I still don't understand where the "0.594 CPUs" number comes from.. Any idea?

I think that the number "0.594" means that 1 GPU is using aslo a bit more than 50% CPU to crunch correctly

It means that amount of CPU resource is allocated (on a per core basis), or reserved if you like. It's probably actually using much less. So on a 6 core machine that particular BOINC process is reserving .592 of 1 CPU core and 5.418 cores are still available for other processes. In practical terms you could still have 6 BOINC CPU processes running in addition to the GPU process. If you have two GPUs with processes claiming .592 CPU then only 5 BOINC CPU processes will be allowed to run as 2 x .592 > 1.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28715 - Posted: 23 Feb 2013 | 20:25:16 UTC
Last modified: 23 Feb 2013 | 20:25:30 UTC

Unfortunately the reality looks different. When I turn off all CPU tasks, the GPUGrid (Noelia) task utilized almost a full core (total CPU load on 4 cores was 25%, which means an entire core is under full load). So I believe that allocation number (0.594 CPU) is not correct and these tasks should reserve 1 CPU (thread/core).

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28721 - Posted: 24 Feb 2013 | 0:29:43 UTC - in response to Message 28715.
Last modified: 24 Feb 2013 | 0:35:42 UTC

This is a strange phenomenon with this project. Older GPUs use a very low amount of CPU time while the newer GPUs such as your 650 TI use a lot. I can see it with my various machines. Skgiven also has this going on with his GPUs: the 470 uses very little CPU while the 660 TI uses a great deal. In fact, looking through the database it looks like only the 6xx series GPUs exhibit this high CPU usage. Maybe he can give us an idea why this is happening.

Edit: BTW I tried reserving an extra CPU core on my machine with the 650 TI, it made no difference in completion time.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28735 - Posted: 24 Feb 2013 | 10:11:15 UTC

As I said above: "GPU-Grid decided that the 600 series GPUs were becoming so fast, that not reserving an entire core would slow the GPU down too much."

And "I don't think it's a good choice for GK107 based cards" in that thread called 100% CPU use.

@Mumak: you're totally correct now.. and that's what I tried to tell you in my previous posts :)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28744 - Posted: 24 Feb 2013 | 12:57:17 UTC - in response to Message 28735.

It's also about stability; too many people were trying to run high end cards while using every last percentage of their CPU's. The result was system and Boinc instability, task failures and more problems for the projects to deal with. Failures aside, this CPU over-commitment also resulted in GPU performance reductions and in some cases a decline in CPU project performances.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28745 - Posted: 24 Feb 2013 | 13:12:10 UTC - in response to Message 28735.

As I said above: "GPU-Grid decided that the 600 series GPUs were becoming so fast, that not reserving an entire core would slow the GPU down too much."

I missed that. MrS, thanks for the explanation. Question, why is it saying 0.481C and really reserving 1 CPU core? How does that work?

And "I don't think it's a good choice for GK107 based cards"

I would agree. I would much rather manage and reserve the core myself if it helps the speed.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28747 - Posted: 24 Feb 2013 | 16:10:08 UTC - in response to Message 28745.
Last modified: 24 Feb 2013 | 16:10:46 UTC

A few posts above:

Mumak wrote:
I still don't understand where the "0.594 CPUs" number comes from.. Any idea?

It's automatically generated, I think by the GPU-Grid app by some algorithm, and passed to BOINC to display it.

MrS

The number shown is probably based on some estimation used for the older cards with SWAN_SYNC and not yet updated for Keplers.

Reserving a CPU core your self is fine, but not everyone running a fast GPU (in need of this) will know he/she should do this. I guess that's where the idea came from to do it automatically.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28750 - Posted: 24 Feb 2013 | 21:49:36 UTC

Thanks guys, now it makes sense to me :-)
That number should be updated to match real scenario, however I'm not sure who or what decides that estimation.. Maybe it's not easy to implement such change in the system (to have different estimations for pre-Kepler and later GPUs).

Post to thread

Message boards : Graphics cards (GPUs) : Beware: BOINC 7.0.45 and later

//