Advanced search

Message boards : Number crunching : low GPU utilization with recent Gerard CXCL12?

Author Message
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42593 - Posted: 12 Jan 2016 | 20:02:39 UTC
Last modified: 12 Jan 2016 | 20:27:31 UTC

Dear fellow crunchers,

recently my system with a GTX970 became rather unproductive, taking almost as long as a GTX670 for a similar WU! The GPU utilization and power consumption is down to 50 - 60%, despite using SWAN_SYNC to maximize performance. It used to be >90%! I'm seeing this behaviour since about christmas.. unfortunately quite a few things changed back then:

- I had to switch from an i7 6700 to an i3 6100 (same thing with 2 instead of 4 cores and 3 MB L3 instead of 8 MB)

- Installed driver 361.43. I switched back to the previous one upon seeing the bad performance, but that didn't help.

- The WUs may have changed around that time. I didn't observe things closely, but there were WU shortages and some new names, I think.

Is anyone else seeing such behaviour or is it a problem on my side? Since there are no other reports yet I suspect the latter. Strangely running 2 concurrent WUs or an Einstein along with GPU-Grid doesn't help either: GPU utilization seems to improve by just ~1% on average, whereas the actual crunching seems to slow down by more than a factor of 2.

Edit: I did a clean deinstallation of the nvidia driver and reinstalled 361.43. this brought the GPU utilization back to 67%. I re-enabled a 2nd concurrent WU -> 73%. Power draw is up by 40 W - which is a good sign in this case. Removing the last WCG thread (2 initially) didn't change things, as expected. So it seems I have repaired something, but the performance is still pretty lacking compared to what it used to be.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42594 - Posted: 12 Jan 2016 | 23:10:31 UTC - in response to Message 42593.
Last modified: 12 Jan 2016 | 23:11:22 UTC

I'm experiencing lower GPU usage (~87% on a Core i3-4130, GTX 980Ti, WinXPx64) by the GERARD_A2AR batch.
This batch is less GPU overclock tolerant as well.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 42596 - Posted: 13 Jan 2016 | 1:25:33 UTC

Around 82 - 84% GTX 660 Ti GPU usage with AMD FX-8350 (Win 7 64).

WUs error almost immediately on same PC running Linux Mint 17 64 bit.

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42599 - Posted: 13 Jan 2016 | 19:43:27 UTC
Last modified: 13 Jan 2016 | 19:45:37 UTC

Seeing the same thing here ETA. Only 60% usage on 2 GTX 970s running on Xeon 2683 V3 and Win7 64 bit. Is running 2 at a time a possibility to boost the GPU load?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42601 - Posted: 13 Jan 2016 | 22:18:03 UTC

Thanks, guys!

Running 2 WUs concurrently boosts the GPU utilization. I would say a bit less than 10% (depending on several factors, of course). The catch is that even after fixing my initial driver problem, I still can't hit the full bonus time running 2 of them on my overclocked GTX970. So overall it's counter-productive to do so.

MrS
____________
Scanning for our furry friends since Jan 2002

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,346,966
RAC: 10,534,102
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42604 - Posted: 14 Jan 2016 | 2:41:01 UTC

The GERARD_A2AR WUs are definitely slower than the other GERARD units. On my new windows 10 machine, they finish in about 7 to 8 hours versus about 6 hours or less for the other GERARD units. On my old windows xp machine, they finish in about 12 hours versus the same 6 hours or less average for the other GERARD units.


See link below:


https://www.gpugrid.net/hosts_user.php?userid=19626



Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42605 - Posted: 14 Jan 2016 | 9:52:04 UTC - in response to Message 42604.
Last modified: 14 Jan 2016 | 10:00:53 UTC

The GERARD_A2AR WUs are definitely slower than the other GERARD units.
I would say these need more DP operations (done by the CPU) than others.

On my old windows xp machine, they finish in about 12 hours versus the same 6 hours or less average for the other GERARD units.
The performance of the GTX980Ti in your host is significantly derogated by the lack of CPU power. While your host (Athlon64 X2 Dual Core 5000+ + GTX980Ti + WinXPx86) process a GERARD_A2AR_luf6806_b in 42.449 sec, my Core2duo E8500 + GTX980 + WinXPx64 process a GERARD_A2AR_luf6632_b in just 33.928 sec, while my i3-4160 + GTX980Ti + WinXPx64 process a GERARD_A2AR_luf6632_b in 22.385 sec. This derogation is in effect for other workunits as well, but less significantly. I would suggest you to cease any CPU crunching on this host to make the GPU crunch faster. Since it has 4GB DDR2 memory, it's probably in dual channel mode already, but it's worth checking by the CPU-Z utility.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42609 - Posted: 14 Jan 2016 | 19:54:51 UTC - in response to Message 42605.

Oh damn it, and this happens right after I switched to a non-overclockable Skylake! It would be cool if they'd use SSE2 - AVX2 for those CPU calculations.. but considering not even Einstein@Home, which is known for good optimizations, has switched to AVX1 yet, I don't expect such a move from GPU-Grid. Even recompilations targeting different CPUs require significantly more work & validation on the project side, whereas they're not exactly starved for more crunching power right now.

MrS
____________
Scanning for our furry friends since Jan 2002

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,346,966
RAC: 10,534,102
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42611 - Posted: 14 Jan 2016 | 23:10:13 UTC - in response to Message 42605.

The GERARD_A2AR WUs are definitely slower than the other GERARD units.
I would say these need more DP operations (done by the CPU) than others.

On my old windows xp machine, they finish in about 12 hours versus the same 6 hours or less average for the other GERARD units.
The performance of the GTX980Ti in your host is significantly derogated by the lack of CPU power. While your host (Athlon64 X2 Dual Core 5000+ + GTX980Ti + WinXPx86) process a GERARD_A2AR_luf6806_b in 42.449 sec, my Core2duo E8500 + GTX980 + WinXPx64 process a GERARD_A2AR_luf6632_b in just 33.928 sec, while my i3-4160 + GTX980Ti + WinXPx64 process a GERARD_A2AR_luf6632_b in 22.385 sec. This derogation is in effect for other workunits as well, but less significantly. I would suggest you to cease any CPU crunching on this host to make the GPU crunch faster. Since it has 4GB DDR2 memory, it's probably in dual channel mode already, but it's worth checking by the CPU-Z utility.



I am not doing any CPU crunching on this host and the memory is in dual channel mode. I checked it with the CPU-Z utility.


This is just an old machine. That is still crunching.





Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42612 - Posted: 15 Jan 2016 | 0:00:58 UTC - in response to Message 42609.

Oh damn it, and this happens right after I switched to a non-overclockable Skylake!
Actually your CPU has enough power to drive a GTX 970, but in your host the WDDM overhead has the most impact on GPU performance (since you are using Windows 10).
AMD Athlon 64 X2 5000+ (89W, rev. F3) vs Intel Core i3-6100
https://cpubenchmark.net/compare.php?cmp[]=83&cmp[]=2617

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42613 - Posted: 15 Jan 2016 | 0:24:20 UTC - in response to Message 42611.

I am not doing any CPU crunching on this host and the memory is in dual channel mode. I checked it with the CPU-Z utility.

This is just an old machine. That is still crunching.
I didn't thought that there is that much difference between the AMD Athlon 64 X2 5000+ and the Intel Core 2 Duo E8500 by looking at this page. However the passmark score seems to be more accurate:
https://cpubenchmark.net/compare.php?cmp[]=83&cmp[]=5
The other difference between our hosts that your MB has only PCIe1.x (I think), while my DQ45CB has PCIe2.0.
To achieve optimal performance of the GTX980Ti the CPU should have integrated PCIe controller (there's not that much difference if it's 2.0 or 3.0), and the OS should not have WDDM.

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42614 - Posted: 15 Jan 2016 | 0:49:19 UTC - in response to Message 42601.

Thanks, guys!

Running 2 WUs concurrently boosts the GPU utilization. I would say a bit less than 10% (depending on several factors, of course). The catch is that even after fixing my initial driver problem, I still can't hit the full bonus time running 2 of them on my overclocked GTX970. So overall it's counter-productive to do so.

MrS

I did a complete removal (including running driver sweeper)and reinstall of the drivers. That raised the GPU usage to 68%. I then started running 2 tasks per card and the usage went to 92-94%. It's been 24 hours and I haven't had any tasks(6)error out so far. The only issue is that every task gets stuck downloading with the same http error. Sometimes it takes an hour to get all the files for a task to run. Very annoying.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42634 - Posted: 16 Jan 2016 | 12:08:43 UTC

I have made similar experience with GERARD_A2AR batches in the last few days.

GPU-Z shows some less GPU usage (between 84 and 87% of my GTX750Ti) than with other WUs before; and, a proof of more CPU usage seems to be that the acemd.848-65.exe uses some 365MB of RAM, which is significantly higher (at least on my host) that with other WUs before.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42635 - Posted: 16 Jan 2016 | 16:08:42 UTC
Last modified: 16 Jan 2016 | 16:11:39 UTC

I fired up two GTX 750 Tis just to see what was going on. I am currently running:

GERARD_CXCL12_CCdimer3-0-1-RND1870_0
GERARD_CXCL12_DIM_APO_UNPROTO_adapt2-0-1-RND3277_1

One is using 75% GPU with 325 MB memory, and the other 80% GPU with 417 MB memory. This is on an i7-4770 board (Win7 64-bit) with a CPU core free for each GPU. So while the GPU usage is a little low, I think the variation in the CXL12s is across the board, though the A2ARs do take more time.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42637 - Posted: 16 Jan 2016 | 17:41:36 UTC - in response to Message 42635.

... though the A2ARs do take more time.

that's what I forgot to mention in my posting above.

fractal
Send message
Joined: 16 Aug 08
Posts: 87
Credit: 1,248,879,715
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42639 - Posted: 16 Jan 2016 | 19:02:15 UTC

I am seeing 89% GPU utilization on a GTX 970 running a GERARD_CXCL12_TRIM_HEP_DIM2-0.

boinc@joe:~$ nvidia-smi Sat Jan 16 10:49:11 2016 +------------------------------------------------------+ | NVIDIA-SMI 355.11 Driver Version: 355.11 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 970 Off | 0000:01:00.0 Off | N/A | | 27% 71C P2 145W / 201W | 455MiB / 4094MiB | 89% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 31973 C ...projects/www.gpugrid.net/acemd.848-65.bin 439MiB | +-----------------------------------------------------------------------------+

I can't easily compare to earlier work since it has all scrolled from the database during the downtime.

This is in a box ( https://www.gpugrid.net/show_host_detail.php?hostid=257647 ) with the slowest, low power processor I can find and it is never leaving its lowest speed, lowest power mode ( Average Processor Power_0(Watt)=7.4779 ). Processor utilization in the underclocked mode is around 20%. So I call the "you need a better CPU talk" FUD.

top - 10:55:47 up 54 days, 19:32, 1 user, load average: 0.18, 0.24, 0.26 Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.8%sy, 9.1%ni, 89.9%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 8152388k total, 1697288k used, 6455100k free, 166632k buffers Swap: 16678908k total, 0k used, 16678908k free, 1127352k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31973 boinc 30 10 28.4g 227m 105m S 21 2.9 2:17.59 acemd.848-65.bi 29645 boinc 20 0 81960 1732 892 S 0 0.0 0:00.17 sshd 29646 boinc 20 0 26312 7628 1772 S 0 0.1 0:00.54 bash 31943 boinc 20 0 124m 7216 3564 S 0 0.1 0:00.77 boinc 32041 boinc 20 0 17336 1436 1096 R 0 0.0 0:00.03 top

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42641 - Posted: 16 Jan 2016 | 21:25:23 UTC - in response to Message 42639.

I am seeing 89% GPU utilization on a GTX 970 running a GERARD_CXCL12_TRIM_HEP_DIM2-0.
I see 95%~97% GPU usage on Gerard tasks except for the A2AR.

This is in a box ( https://www.gpugrid.net/show_host_detail.php?hostid=257647 ) with the slowest, low power processor I can find and it is never leaving its lowest speed, lowest power mode ( Average Processor Power_0(Watt)=7.4779 ). Processor utilization in the underclocked mode is around 20%.
That's because you don't use the swan_sync environmental variable to reserve a full CPU core, to maximize the GPU usage. (which is recommended for a GTX 980 Ti, especially for workunits like the GERARD_A2AR)

So I call the "you need a better CPU talk" FUD.
Your system is optimized for low power, not for performance. Your CPU has integrated PCIe controller (just as I suggested), and 6 years younger than the one which the "better CPU talk" regards.

Profile Logan Carr
Send message
Joined: 12 Aug 15
Posts: 240
Credit: 64,069,811
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 42643 - Posted: 16 Jan 2016 | 22:01:09 UTC - in response to Message 42641.

Hello everyone,

I'm a bit confused here as I read this thread.

So.. in order to have my gpu usage up to 100%, do I just have this project use my cpu as well (i don't now), or do I just need a better PC?

My pc is a business one in which I just upgraded for gaming, but now I no longer game so I just use it to donate.

Any more info needed please tell me, thanks.

fractal
Send message
Joined: 16 Aug 08
Posts: 87
Credit: 1,248,879,715
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42644 - Posted: 16 Jan 2016 | 22:53:31 UTC - in response to Message 42641.

That's because you don't use the swan_sync environmental variable to reserve a full CPU core, to maximize the GPU usage. (which is recommended for a GTX 980 Ti, especially for workunits like the GERARD_A2AR)

I tried swan_sync a few years ago and it made no difference. I just tried it gain, setting both swan_sync and SWAN_SYNC since there is some confusion which to use, and it made no difference in either CPU usage or GPU usage.

Your system is optimized for low power, not for performance. Your CPU has integrated PCIe controller (just as I suggested), and 6 years younger than the one which the "better CPU talk" regards.

I seem to have skipped over your discussion of older processors. The OP's processor came out Q3'15. The celeron I used came out Q4'11. His is a bit newer than mine.

I built this series of systems with the goal of optimizing GPU performance. Reduced power consumption is a freebe. Every test build that ran even a single CPU work no matter how many cores were available resulted in reduced GPU performance. The celeron with 0 CPU projects outperformed an i7 with one core reserved in every test. Reserving a core improved GPU performance. Reserving the entire processor improved it a little more. I could just as easily have used an i7 as the celeron. The result would likely be the same. The processor is sleeping most of the time. A more modern processor has even more sleep states so would likely use even less power idling. But, why idle a 300 dollar processor when I can do the same job idling a 30 dollar processor. There may be some value in debating whether limiting boinc to 75% to reserve 2 threads is the same as reserving a core on an i7, or whether it is necessary to disable hyperthreading to guarantee an idle core. Likewise there may be value in debating whether a more modern processor with a faster clock speed will do a better job when it actually runs than the old celeron. But I believe that gets pretty far down into the weeds. I would get out and push if I thought it would make it go any faster, so no, I did not optimize for power. I optimized for performance. And this GTX970 did better all alone on the celeron than it did fed by an i7 that had 75% of the cpu assigned to CPU tasks.

The point I tried to make, and apparently failed, is that the processor does not appear to be offloading any floating point work from the GPU as was suggested and thus a faster processor is unlikely to boost GPU utilization. At least, even if it is, any offload is easily performed by a 1.6 GHz celeron in its sleep.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42649 - Posted: 17 Jan 2016 | 10:24:17 UTC - in response to Message 42644.

I tried swan_sync a few years ago and it made no difference. I just tried it gain, setting both swan_sync and SWAN_SYNC since there is some confusion which to use, and it made no difference in either CPU usage or GPU usage.
When the SWAN_SYNC is in effect, the CPU time is near to the run time. If the CPU usage of the ACEMD app is lower than a full core (or thread), then the SWAN_SYNC is ignored for some environmental reasons.

I built this series of systems with the goal of optimizing GPU performance. Reduced power consumption is a freebe. Every test build that ran even a single CPU work no matter how many cores were available resulted in reduced GPU performance. The celeron with 0 CPU projects outperformed an i7 with one core reserved in every test. Reserving a core improved GPU performance. Reserving the entire processor improved it a little more.
That aligns with my experience. Jeremy Zimmerman did extensive tests on the number of threads vs performance almost 2 years ago, and published his results. (You have to open the link two times to get directly to his post.)

I could just as easily have used an i7 as the celeron. The result would likely be the same. The processor is sleeping most of the time. A more modern processor has even more sleep states so would likely use even less power idling. But, why idle a 300 dollar processor when I can do the same job idling a 30 dollar processor. There may be some value in debating whether limiting boinc to 75% to reserve 2 threads is the same as reserving a core on an i7, or whether it is necessary to disable hyperthreading to guarantee an idle core.
I agree.

Likewise there may be value in debating whether a more modern processor with a faster clock speed will do a better job when it actually runs than the old celeron. But I believe that gets pretty far down into the weeds.
If both have integrated PCIe controllers, the difference will be minimal (at least it won't worth the higher price of the better CPU). But this difference changes between workunit batches (so it would be higher for 'GERARD_A2AR's)

I would get out and push if I thought it would make it go any faster, so no, I did not optimize for power. I optimized for performance.
Now I understand your performance optimization, but there's one more thing you can do to push your GPU harder: a working SWAN_SYNC environmental variable. There's some advice on the forum how to do it on Linux. Jeremy Zimmerman tested the effect of SWAN_SYNC, and published his results.

And this GTX970 did better all alone on the celeron than it did fed by an i7 that had 75% of the cpu assigned to CPU tasks.
The faster the GPU the more advisable not to do any CPU tasks on that host, and to choose a cheaper CPU with higher clocks (and less cores/threads). I've built my latest PC on these principals, and actually it can do any workunit faster than any other host on this project. There's only one way to build a faster host: to use a GTX TITAN X, but I consider that GPU not worth to buy because its price/performance ratio.

The point I tried to make, and apparently failed, is that the processor does not appear to be offloading any floating point work from the GPU as was suggested and thus a faster processor is unlikely to boost GPU utilization. At least, even if it is, any offload is easily performed by a 1.6 GHz celeron in its sleep.
If the CPU is fairly state-of-the-art (i.e. it has integrated PCIe), there won't be much difference. If it's not, there could be as high as 2 times.

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42651 - Posted: 17 Jan 2016 | 14:30:00 UTC

Has anyone tested the SWAN_SYNC variable verses using an app_config file in the project folder to assign 1 CPU core per task?

fractal
Send message
Joined: 16 Aug 08
Posts: 87
Credit: 1,248,879,715
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42653 - Posted: 17 Jan 2016 | 18:08:30 UTC - in response to Message 42649.

I tried swan_sync a few years ago and it made no difference. I just tried it gain, setting both swan_sync and SWAN_SYNC since there is some confusion which to use, and it made no difference in either CPU usage or GPU usage.
When the SWAN_SYNC is in effect, the CPU time is near to the run time. If the CPU usage of the ACEMD app is lower than a full core (or thread), then the SWAN_SYNC is ignored for some environmental reasons.

I am pretty sure SWAN_SYNC was getting to ACEMD. It was in the environment for the acemd process according when I looked in /proc/1879/environ when 1879 was the PID for acemd..

My google foo is failing me but my memory, which is probably worse than my google foo, recalls that Linux ignores SWAN_SYNC and that Linux with or without the ignored SWAN_SYNC was as fast as Windows XP with SWAN_SYNC which was faster than windows XP without SWAN_SYNC which was faster than Windows <anything newer>.

But, don't trust my memory ... I don't.

Re: nanoprobe - see the links in Retvari's post.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,605,311,851
RAC: 8,715,209
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42675 - Posted: 22 Jan 2016 | 13:24:05 UTC - in response to Message 42651.

Has anyone tested the SWAN_SYNC variable verses using an app_config file in the project folder to assign 1 CPU core per task?

Two completely different purposes.

SWAN_SYNC would make the ACEMD application use extra CPU cycles, whether or not a core was free - it would simply overcommit the CPU and cause a lot of thrashing if the CPU was filled with other tasks.

app_config (or simply reducing the number of cores BOINC is allowed to schedule) would make space available on the CPU, but do nothing at all to encourage ACEMD to use it.

If you normally run your CPUs full to the brim, you should probably use both.

Post to thread

Message boards : Number crunching : low GPU utilization with recent Gerard CXCL12?

//