Advanced search

Message boards : Number crunching : Linux client cpu utilization

Author Message
fractal
Send message
Joined: 16 Aug 08
Posts: 60
Credit: 113,558,313
RAC: 194,237
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 16383 - Posted: 17 Apr 2010 | 23:47:53 UTC

I reported strange behavior of the linux client last month. It is about time to bump that topic and see if there has been any progress.

Initially I had a gt240 in a dual core ubuntu machine. GPUGRID took one full core and the two cpu tasks from a different project evenly split the other. I resolved that issue by pulling the GPU out of the linux machine, putting it in a windows machine and putting that machine on another project :(

Last week I picked up a shell shocker deal gt240 and decided to see how much had changed. This time I selected a quad core q6600. The distro on it was too old to be supported by nvidia so I installed ubuntu 9.04, boinc 6.10.32 and NVIDIA UNIX x86_64 Kernel Module 190.42.

This time things are even weirder. The GPUGRID task takes one full core just like before, but the other four tasks split TWO of the remaining three cores.

A top snapshot looks like:

top - 22:54:25 up 20 min, 1 user, load average: 4.90, 4.35, 2.93
Tasks: 98 total, 6 running, 92 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 75.1%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 75.1%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3972060k total, 309792k used, 3662268k free, 9152k buffers
Swap: 1253028k total, 0k used, 1253028k free, 127932k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3619 boinc 30 10 151m 101m 21m R 100 2.6 1:04.80 acemd2_6.04_x86
3359 boinc 39 19 13892 4164 3092 R 50 0.1 13:00.85 casinoAlpha_5.0
3360 boinc 39 19 13880 4156 3092 R 50 0.1 12:56.17 casinoAlpha_5.0
3362 boinc 39 19 14448 4664 3088 R 50 0.1 12:48.23 casinoAlpha_5.0
3361 boinc 39 19 14468 4672 3088 R 50 0.1 13:03.51 casinoAlpha_5.0
1 root 20 0 4104 924 632 S 0 0.0 0:01.23 init

I suspend GPUGRID and everything goes back to normal (100% of a core on each work unit, each core 100% nice)

One needs to be somewhat careful with top. There are some slight inaccuracies due to sampling intervals and it is not always clear what is a process and what is a thread, but it gives an indication.

So, on to my questions.

a. Is everyone running linux seeing the same odd behavior with acemd2_6.04_x86?

b. Is there any plan on updating the linux clients to stop using a full core, or in my case, almost two cores?

fractal
Send message
Joined: 16 Aug 08
Posts: 60
Credit: 113,558,313
RAC: 194,237
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 16417 - Posted: 19 Apr 2010 | 3:46:32 UTC

Additional update on the above--

I installed a 9600gso in a different quad, this time a q8200. The behavior is the same. I even verified that the CPU tasks took twice as long to complete just in case top was lying to me.

I have verified the behavior with

NVIDIA UNIX x86_64 Kernel Module 195.36.15 Fri Mar 12 00:29:13 PST 2010

as well as the original

NVIDIA UNIX x86_64 Kernel Module 190.42 Tue Oct 20 20:25:42 PDT 2009

and with a 9600gso as well as a gt240.

I will try a different kernel next as both tests have been run with linux 2.6.28-11-server x86_64

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 160,540,037
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 16419 - Posted: 19 Apr 2010 | 6:45:48 UTC - in response to Message 16417.
Last modified: 19 Apr 2010 | 6:48:02 UTC

Hi Fractal.
As told several times in different threads, the new GPUGrid release is now using 1 full core instead of 30% before (Something like that)
But with this new release, your GPU can crunch up to 60% higher (in my case it was close to 30% with a GTX275).
I was quite disappointing with it as I like running my CPU on several others project. But with a gain of 30% I think it's worth it.
____________
Self hosting GNU/Linux distribution: Beedbox
This project needs your help to succeed !

fractal
Send message
Joined: 16 Aug 08
Posts: 60
Credit: 113,558,313
RAC: 194,237
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 16436 - Posted: 19 Apr 2010 | 17:35:22 UTC - in response to Message 16419.

Hi Fractal.
As told several times in different threads, the new GPUGrid release is now using 1 full core instead of 30% before (Something like that)
But with this new release, your GPU can crunch up to 60% higher (in my case it was close to 30% with a GTX275).
I was quite disappointing with it as I like running my CPU on several others project. But with a gain of 30% I think it's worth it.


Is what you just said what you are seeing on your q8200 with the gtx275? Does top show the four CPU tasks on your system each taking 75% and acemd taking 100% as one would expect or are you seeing the same 50% per CPU task, 100% for the acemd task and 100% of one core idle that I am?

While I do not know if the new client is crunching GPU work any faster (the data base does not contain pre-6.04 results), it "appears" to be taking two full cores on quad core machine away from other processing and one full core on a dual core machine. Reserving a core for the GPU (preferences->processor usage->use 75% of the processors) does not get three CPU tasks running at 100% either (the workaround from a while back that worked well for such a long time). Suspending GPUGRID does get all cores utilized again, but regrettably, none helping GPUGRID.

fractal
Send message
Joined: 16 Aug 08
Posts: 60
Credit: 113,558,313
RAC: 194,237
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 16645 - Posted: 29 Apr 2010 | 21:42:27 UTC

Any ETA on a new client? It has been a month and a half now.

I want to get back to this project but not if I have to give up two cores on a quad just to feed one GPU.

Heck, even a "buzz off" is better than the cold shoulder treatment.

bigtuna
Volunteer moderator
Send message
Joined: 6 May 10
Posts: 80
Credit: 98,784,188
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 16928 - Posted: 9 May 2010 | 16:23:46 UTC - in response to Message 16645.

I've got 2 different "green" 9800GT cards in 2 different computers.

On the Vista machine the CPU is hardly used at all while running GPUGRID. This shows up on the CPU monitor "Widget" and also in the GPUGRID results under CPU time. CPU time is far less than the total time to complete the work unit.

On the Ubuntu Linux machine one core is maxed out and the second runs at about half load. CPU time is the same as the time it takes to do a work unit.

Thing is that the Linux box is faster and gets more credit so it seems the CPU cycles are not being wasted...

The difference in behavior is curious.

It would be interesting to hear more about the differences in clients between Windows and Linux if anyone knows.





Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3494
Credit: 834,217,132
RAC: 1,222,872
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16937 - Posted: 10 May 2010 | 11:11:05 UTC - in response to Message 16928.

On identical XP and Linux systems, Linux will use a full CPU core due to the code, making Linux significantly faster.

The more recent 6.73 tasks for Fermi can also be configured to use a full CPU core, by setting the swan_sync=0 environmental variable.
That said, Linux is still slightly faster.

The same was true of the illusive 6.72 tasks for CC1.1 - CC1.3 cards:
The swan_sync=0 variable expedited the GPUGrid task, but even on XP, it was slightly slower than Linux. The variable works on XP, Vista and W7.

Vista and W7 are slower (11 or 12% reportedly) than XP for CC1.1, CC1.2 and CC1.3 tasks.
Fermi tasks on W7/Vista are very slow compared to XP or Linux - About 45% slower than XP and 50% slower than Linux.
The general consensus is that this is due to poor drivers for W7/Vista. Driver revisions are common, so there will be new releases that might improve the situation for W7/Vista Fermi users.

Profile Saenger
Avatar
Send message
Joined: 20 Jul 08
Posts: 134
Credit: 3,965,226
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17532 - Posted: 6 Jun 2010 | 13:37:43 UTC
Last modified: 6 Jun 2010 | 13:39:28 UTC

On my ubuntu the GPU-WUs are doing 2 not-nice things, that must not happen:
a) they use a full core while pretending to be non-CPU intensive (0,05 CPUs + 1 NVIDIA GPUs)
b) they have a nice value of 10 for the CPU part instead of the required 19

If they would play fair and admit for the use of a full CPU, BOINC would not start another 4 CPU-WUs on my machine. It's OK to use a full CPU if that's needed, but don't pretend not to use one.

If the nice value was the right 19, all 5 CPU-WUs would get the same amount, and GPUgrid would not claim a full one while the others have to keep up with 3 cores for 4 WUs.

The nice value could be changed manually by me, and of course I've done so. Now all 5 CPU-hogs get about the same amount of CPU-time, brings GPUgrid down to ~75% average, alternating between 30% and 100%, and the others up from ~70% to ~80%, some CPU is needed for non-BOINC stuff of course.

How I could tell the GPUgrid WU not to pretend to be non-CPU-intensive O don't know, but it's something that should be done quick by the project team. (0,05 CPUs + 1 NVIDIA GPUs) is a plain lie, (1CPUs + 1 NVIDIA GPUs) is the truth. To tell the BOINC ,manager the truth about the WUs behaviour is an absolute must for a good project.
____________
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3494
Credit: 834,217,132
RAC: 1,222,872
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17536 - Posted: 6 Jun 2010 | 20:33:21 UTC - in response to Message 17532.
Last modified: 6 Jun 2010 | 20:39:04 UTC

Some of us are quite happy to optimise our systems to crunch GPUGrid tasks. I usually leave a core free (Boinc Manager) to do just that. So I am happy to lose 100 credit points elsewhere to speed up this project, and overall by doing so I get more points anyway!
The stated amount of CPU use is a calculated estimate done by a formula. With application development being high here it is likely to be less accurate following recent improvements. This project tries to cater for both Linux and Windows and tt is a small team, so it can’t be all things to all people at all times.

PS. As you have a NVIDIA GeForce 8600 GT, I would suggest you set your cache (via Boinc Manager) to be about 0.01 days, or manually control tasks for this project.

Good luck,

Profile Saenger
Avatar
Send message
Joined: 20 Jul 08
Posts: 134
Credit: 3,965,226
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17537 - Posted: 6 Jun 2010 | 21:09:45 UTC

As I said, I have no problem with GPUgrid using a full CPU besides my GPU, it only should say so openly, not pretend to use only 0.05.
A project, that uses one full CPU, should not let 4 other WUs run in parallel on a quad (or 2 on a duo, or any other on a single core).
____________
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3494
Credit: 834,217,132
RAC: 1,222,872
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17538 - Posted: 6 Jun 2010 | 21:39:59 UTC - in response to Message 17537.
Last modified: 6 Jun 2010 | 22:00:29 UTC

Glad you are not bothered about GPUGrid using a full core.
For an NVIDIA GeForce 8600 GT on Windows it would use about 0.1 cores of an average CPU, however on Linux it cannot presently be reduced to this number.
Previous versions of applications used less CPU, but following significant application optimisations (and it is worth noting that Linux is faster than Windows here) a problem arose for the Linux app that caused it to use a full core. GPUGrid is not deliberately trying to mislead you, and I am not sure how much this is down to GPUGrid and how much it is down to Boinc and their calculation systems.
I think a recent attempt to fix this failed but I expect they are still working on a fix. Perhaps if there were 50 people on the research project things like this would be resolved faster, but as I said it is a small team so it is inevitable that they struggle with such problems and that prioritisation sometimes prevents things from being resolved quickly; at the present time they may be struggling with some task type errors, they have just developed a Fermi app, are probably working on an upgraded CUDA 3010 app, and are slowly working towards an ATI app. This is on top of trying to keep the project up and running (regular maintenance and support).

Hope I shed some light on the situation and that you are not too annoyed about it. This is the best GPU project on the net, by a long way. I hope you continue to support it for a long time.

Again,
Good luck

fractal
Send message
Joined: 16 Aug 08
Posts: 60
Credit: 113,558,313
RAC: 194,237
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 17540 - Posted: 7 Jun 2010 | 17:25:56 UTC - in response to Message 17537.

As I said, I have no problem with GPUgrid using a full CPU besides my GPU, it only should say so openly, not pretend to use only 0.05.
A project, that uses one full CPU, should not let 4 other WUs run in parallel on a quad (or 2 on a duo, or any other on a single core).

You share the essence of my issue with GPUGrid. GPUGrid, like many projects, is time sensitive. It wants work returned as soon as possible as each result contributes to the next work. Hence the bonus on fast returns.

But, for linux, it currently fails to respect that other projects may also be time sensitive.

It is one thing to take one full core like skgiven suggests it should, or one and a half cores from a quad core as several of us have documented for them. It is yet another to take one and a half cores on a quad core and not bother to tell BOINC that it is doing so. BOINC trys to run four CPU tasks on the two remaining cores and they take almost twice as long.

I am sad that GPUGrid which was once the project for linux users to come home to has abandoned Linux as it has. All my linux gpu's have been on other projects the past few months since I discovered the bug that has been so succinctly ignored. I tried reporting it in March, then in April. It is now June and the only responses we have seen are from other contributors who deny the existence of the problem. I understand that Fermi cards do more work than my old 9 series cards so it is justified that it get attention first, but having legitimate bug reports consistently ignored is tiring.

Profile Saenger
Avatar
Send message
Joined: 20 Jul 08
Posts: 134
Credit: 3,965,226
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17545 - Posted: 8 Jun 2010 | 15:52:30 UTC

I just saw that it's using even more. I wondered, why there was no 400% as a sum in my system monitor, and changed the view to see all processes, not only my own, an suddenly the missing percent were there: Xorg is using something between 30 and 60%, in addition to the 90 - 110% of the regular app that's ~130 - 150 %, that's far too much, especially as the WU still pretends to only use 0.05%.

It would be OK to use as much CPU as is needed, if the app would say so. I can even live with 120%, if only 1 core is "officially" claimed. But to grab 150% while lying to the system with proclaimed 0.05% is far off being good.

As long as you need CPU, say so open and take it open. As long as you officially only want 0.05%, don't dare to take more than 1%. Your current behaviour under Linux is plain stealing from other projects.
____________
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki

fractal
Send message
Joined: 16 Aug 08
Posts: 60
Credit: 113,558,313
RAC: 194,237
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18729 - Posted: 21 Sep 2010 | 2:31:28 UTC

Well, it has been half a year since the bug was first reported and it still exists..

Any status update?

Post to thread

Message boards : Number crunching : Linux client cpu utilization