Pascal Settings and Performance

Message boards : Graphics cards (GPUs) : Pascal Settings and Performance

Author	Message
Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45114 - Posted: 2 Nov 2016 \| 13:44:45 UTC
	nanoprobe wrote: JoergF wrote: Strange. My 1070 does not get better than 90% and the 1080 is even worse, maybe 75%. No matter how many tasks and the CPU/GPU ratio. As if the algorithm is not yet Pascal optimized. But this gets off topic a little... maybe I should bring that question forward somewhere else. FWIW my 1060 runs @ 95% with 2 tasks at a time at stock settings. What's truly amazing is that according to my UPS it's only pulling 45 watts. Nanoprobe, you are using your card under Linux, which doesn't have WDDM, while JoergF using his card under Windows 7 which has WDDM.
	ID: 45114 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45115 - Posted: 2 Nov 2016 \| 14:34:35 UTC
	One at a time ADRIA_1JWP_dist (Win8.1 OS) on GTX 1070 and GTX 1060 (3GB) have higher GPU usage than SDOERR CASP (a3d) or GERALD CXCL12. ADRIA_1JWP_dist: -- GTX 1070 (PCIe 3.0 x8) @ 2.1GHz: 78% GPU / 32% MCU / 52% BUS / 109W power -- GTX 1060 (PCIe 3.0 x4) @ 2.1GHz: 82% GPU / 32% MCU / 63% BUS / 96W power GTX 1070 ADRIA_1JWP_dist completed runtime will be ~30% faster. (8hr vs. 11.6hr) Non-ADRIA (GTX 1060) WU's are closer to 20~25% slower than GTX 1070. Pascal's power consumption really shine though even when the card OCed. My 970's are starting to feel old as it's power consumption / performance ratio struggle keeping up with Pascal. Pascal already have a very high out of the box boost negating overclocking. Even at 2.1GHz Pascal core scale for slight performance with PCIe 3.0 x4. For those who fine tune their cards locating it's highest stable operating point - Pascal is the easiest and least fun to manage.
	ID: 45115 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45116 - Posted: 2 Nov 2016 \| 15:16:13 UTC - in response to Message 45115. Last modified: 2 Nov 2016 \| 18:28:06 UTC
	Thanks, The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)... Even within the CASP runs there are different types of sub_run: e13s6_e9s4p0f5-SDOERR_CASP22SnoS_crystal_ss_5ns_ntl9_0-0-1-RND7413_0 1,337.92 - runtime e12s1_e11s5p0f27-SDOERR_CASP22SnoS_crystal_ss_contacts_5ns_a3D_1-0-1-RND6418_0 2,152.56 - runtime The runs that include atomic contacts take longer because they involve calculating contact interactions too. In theory these should utilize the GPU more (but I haven't looked to see if that's the case or not). The CASP runs have rather low PCIe Bandwidth Utilization (for all GPUs). For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15%! - edit - see I'm getting 88% GPU utilization on a non_contacts CASP22SnoS_crystal_ss task. PCIE bandwidth ~ 25%. Temps are higher too. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45116 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45118 - Posted: 2 Nov 2016 \| 17:24:00 UTC - in response to Message 45116.
	Thanks, You're welcome! The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)... [..]The runs that include atomic contacts take longer because they involve calculating contact interactions too. In theory these should utilize the GPU more (but I haven't looked to see if that's the case or not). What I reported in the CASP thread: CASP runtimes (atom and step amount) vary so this just a general reference. GTX 1070 (PCIe 3.0 x8) CASP runtimes: -- 1ns (ntl9) 600 credits = 240/sec @ 2.1GHz / 59% GPU / 15% MCU / 37% BUS / 78W power -- 1ns (a3d)= 1,350 credits = 330/sec @ 2.1GHz / 70% GPU / 24% MCU / 39% BUS / 96W power -- 5ns (ntl9) 3,150 credits = 1,200/sec @ same usage and power numbers as 1ns -- 5ns (a3d) 6,900 credits = 1,600/sec @ same usage and power numbers as 1ns GTX 1060 (3GB) PCIe 3.0 x4 CASP runtimes: -- 1ns (ntl9) 600 credits = 300/sec @ 2.1GHz / 63% GPU / 17% MCU / 51% BUS / 74W power -- 1ns (a3d) 1,350 credits = 450/sec @ 2.1GHz / 74% GPU / 24% MCU / 59% BUS / 88W power -- 5ns (ntl9) 3,150 credits = 1,500/sec @ same GPU usage and power as 1ns -- 5ns (a3d) = 6,900 credits = 2.275/sec @ same GPU usage and power as 1ns IMO: a (1152CUDA GTX 1060) is on par with (2048CUDA GTX 980) and ~20% faster than a (1664CUDA GTX 970). The (1920CUDA GTX 1070) is as (if not) ~5% faster than a (2816CUDA GTX 980ti). CASP WU on (2) GTX 970 at 1.5GHz are 2.5~2.7x slower with PCIe 2.0 x1 compared to PCIe 3.0 x4. WU's require PCIe 3.0 x8 for proper OC scaling. -- 1ns: 900/sec vs. 350/sec -- 5ns: 4770/sec vs. 1761/sec -- PCIe 2.0 x1: 46% GPU / 7% MCU / 80% BUS usage / 75W GPU power -- PCIe 3.0 x4: 57% GPU / 12% MCU / 40% BUS / 107W I haven't noticed any difference between GPU usage with contacts or having none. Though CPU usage 5% higher at ~15% per WU when the contacts are included. For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15% I have >10% CPU usage on each Pascal WU. Mostly around 25% average (4C/4T Haswell S series) crunching (2) WU's. When shooting for the most efficient runtimes it help's to have CPU clock speed above 3GHz (Preferably >3.5GHz). Every GTX 1070 host faster than mine have a overclocked 'K' series even though my Pascals are at 2.1GHz (1.5GHz on Maxwell). WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers.) PCIe 2.0 has an overhead of 20% (8bit/10b line-code encoding) while PCIe3.0 is 128bit/130b. In reality PCIe2.0 has an available bandwidth max of 80%. PCIe3.0 provides 98.4% available bandwidth. Intel (4) bi-directional lanes at 1GB/s per lane DMI link on Haswell and Ivy Bridge is suppose to be faster than AMD's 500MB/s? per lane. Skylake doubled the bandwidth with (4) lanes at 2GB/s per lane. Maybe during MT CPU compute the DMI link became a bottleneck causing dramatic GPU utilization loss? Or PCIe flooded out completely.
	ID: 45118 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45121 - Posted: 2 Nov 2016 \| 19:37:40 UTC
	Since the 9.14 app is CUDA 8.0, and there is a couple of CUDA8.0 drivers for Windows XP I've installed my GTX 1080 under Windows XP x64 with the latest XP driver available for GTX 960 (368.81), but the 9.14 app did not work with this setup. It said that the Task blablabla exited with zero status but no 'finished' file If this happens repeatedly you may need to reset the project. But the task did not run into an error, so these two lines repeated infinitely until I've suspended the task. When I booted this host to Windows 10 the task resumed normally. So now I really have to learn to install a Linux based BOINC host.
	ID: 45121 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45128 - Posted: 2 Nov 2016 \| 23:44:48 UTC Last modified: 2 Nov 2016 \| 23:45:47 UTC
	Is there anyone, who is using a GTX 1080 or TITAN X (Pascal) under Linux with swan_sync on? Is the new (9.14) Linux app supports swan_sync?
	ID: 45128 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45138 - Posted: 3 Nov 2016 \| 15:08:22 UTC - in response to Message 45118. Last modified: 3 Nov 2016 \| 15:09:06 UTC
	WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers. I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45138 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45139 - Posted: 3 Nov 2016 \| 16:01:12 UTC - in response to Message 45138.
	WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers. I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080. Yes by moving to Linux. There are a few remedies on WDDM OS's that help gain GPU utilization - enable SWAN-SYNC (I don't use this). Have a CPU above 3.5GHz with a GPU + CPU PCIe3.0 x16 connection. (single GPU set-ups seem be faster than a system with 2 or 3 of the same CPU and GPU's.) Or compute 2 tasks at a time with 30 to 50% longer runtime than single at a time. PCIe3.0 x8 is the bare minimum for overclock scaling. GTX 1060 and above with PCIe3.0 x4 will encounter a 4~12% performance drop off from x8 depending on type of WU.
	ID: 45139 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45154 - Posted: 4 Nov 2016 \| 8:08:37 UTC Last modified: 4 Nov 2016 \| 8:30:22 UTC
	Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason? My other system is less affected, maybe because the board is a new 1151 and the GTX 1070 GPU therefore utilizes PCIe3.0x16. The CPU is also a little faster (6700K) but I guess it has nothing to do with it. Besides, it would be of interesting to know how the task algorithms work. If there are large data blocks transferred, the bandwith doesn't really matter. (provided that the work time of the GPU per subtask is long enough). You would see some downward peaks in the GPU utilization occasionally. But in case the CPU/Memory and GPU are permanently exchanging little data slices, it could explain why limited bus bandwith would slow down the entire performance. If so, maybe something can be done on the code side in order to keep the Pascal (and the upcoming Volta) type GPU busy. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45154 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45156 - Posted: 4 Nov 2016 \| 9:44:18 UTC - in response to Message 45154.
	Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason? The i7-3770S is PCIE3x16 capable, but I guess there could be some LGA 1055 motherboards that are PCIE2 only. Which motherboard model do you have? If it is PCIE2 that could be the issue or one of the main issues. You could probably get a replacement PCIE3 capable motherboard if that's the case. IF you crunch using your integrated HD Graphics 4000 gPU, that would impact on the GTX1080's performance, as would crunching lots of CPU projects. Basically for optimal performance for GPUGrid (especially for such a high end GPU) you want to be crunching for as few CPU projects as possible. MT apps are a no-no and running apps in a VM can bog the systems down. CPU speed and RAM speed also impact, but while there are faster processors, that CPU isn't bad and it does have a PCIE3 controller on board (but probably just isn't using it). HT off and SWAN_SYNC might help a little too. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45156 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45158 - Posted: 4 Nov 2016 \| 10:13:53 UTC
	Thanks ... the mainboard is an ASUS P8P67-M (socket 1155) and definitely PCIe2.0 only. Yes, the CPU does support 3.0 but the board doesn't and so it could be part of the issue, as you wrote. No, i dont use the iGPU or VM in the background. So I guess I should simply upgrade my system components. Frankly I am waiting for AMD Zen or Intel Cannonlake in order to get noticable extra speed. Upgrading from Ivy Bridge to Kaby Lake now does not make much sense to me, maybe aside from DDR4 Memory which could be some advantage. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45158 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45164 - Posted: 4 Nov 2016 \| 11:47:53 UTC - in response to Message 45158.
	If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now. Zen will facilitate more PCIE3 lanes but wont be out for a while yet. A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition. Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45164 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45165 - Posted: 4 Nov 2016 \| 12:06:32 UTC - in response to Message 45164.
	If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now. I have already considered that but sometimes chaning the motherboard will also lead to different SATA contollers and drivers and therefore Windows will no longer boot. Which means, well, it is surely possible but not that easy. Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. yes, I have noticed that as well. Does this affect the performance in any way? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45165 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45167 - Posted: 4 Nov 2016 \| 13:04:14 UTC - in response to Message 45165.
	Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. yes, I have noticed that as well. Does this affect the performance in any way? I think it's Boinc that reports this and the app reads the details directly from the hardware/system itself. So it wouldn't impact upon performance. Most tasks tend to use less than 1GB of GDDR and the most I can recall is around 1.7GB. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45167 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45168 - Posted: 4 Nov 2016 \| 13:29:28 UTC - in response to Message 45164. Last modified: 4 Nov 2016 \| 13:30:57 UTC
	A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition. yes... and not to forget the AMD RX490. If this one performs well, which I hope, it will have positive influence on Nvidia pricing in general. Which means: Prices DOWN ;-) ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45168 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45169 - Posted: 4 Nov 2016 \| 13:29:57 UTC - in response to Message 45167.
	Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB. Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Obviously this is a comparison of only two tasks, but they are similar task types and give the same amount of credit: e47s4_e35s2p0f0-SDOERR_CASP10_crystal_ss_5ns_ntl9_1-0-1-RND0325_1 : 1,454.48 804.24 3,150.00 v9.14 (cuda80) e25s5_e20s7p0f45-SDOERR_CASP22S_crystal_ss_5ns_ntl9_2-0-1-RND0908_0 : 1,498.21 736.19 3,150.00 v8.48 (cuda65) When watching the runs, CPU usage, GPU usage and PCIE usage all looked about the same for each GPU. I expect the Pascal uses less power to do the same work, but I haven't measured the power draw just yet... I think a few other people have reported increased CPU usage on the Pascal's. If CPU is increased with the CUDA8.0 app/Pascal cards it's probably going to be more noticeable with larger GPU's. Certainly something to take account of when building a system for GPU crunching. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45169 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45170 - Posted: 4 Nov 2016 \| 13:37:57 UTC - in response to Message 45169.
	Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Great... which means the comparison by SP GFLOPs out from the specification works for this kind of jobs, more or less. What is the average GPU usage of the 1060? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45170 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45175 - Posted: 4 Nov 2016 \| 15:12:50 UTC - in response to Message 45170. Last modified: 4 Nov 2016 \| 15:39:55 UTC
	The GPU usage is ~78% - similar but the utilization is spiky and might vary during the run. When I had the 970 in I used NVidia X Server Settings to observe the GPU Utilization. However, I've just observed that keeping the NV X Server Settings window open increases the apparent GPU utilization: When I watch the graphics % using Psensor, X Server windows maximized or minimized there is an approximate 10% difference in GPU utilization. CPU usage is also ~65% when the NV X Server Settings window is open and ~22% when it's minimized. So I would conclude that threads to both the CPU and GPU are kept live when X Server isn't minimized. Power usage at the wall doesn't change. That sort of throws a spanner in the works for some of my observations! - update - leaving the Boinc Manager window open has the same effect (increased GPU utilization). I suspect using SWAN_SYNC or a nice value might improve performance a bit as would running two tasks. My GPU clock is supposed to be 1911MHz but it's 1904MHz (a bit shy of 2000 but not far off). The memory is at 7604MHz. My GTX1060-3GB power usage is about 75W, so it's about 45% more efficient than a GTX970 for here: System idle using 38W at the wall. With Boinc maximized but only running some nci apps the systems power usage is 50W. When I start to crunch on the GPU the system uses ~125W, so the GPU is using ~75W. Of note is that my 1060 only has one 6-pin power connector (which only delivers up to ~75W)! So perhaps I'm being power capped? It's a small PCIe-2x16 motherboard, and although there's a 12-pin ATX power connector I wouldn't be surprised if the GPU isn't drawing power from the PCIE slot. I had to re-enter cool-bits=4 & reboot to allow me to alter the fan speed and manually control the GPU’s temp, but before I did this the GPU temps rose to 63C and the fan didn’t go over 50%. Now I'm running the fans at 60% and the GPU is ~58C. The fan's aren’t audible over the systems case fan. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45175 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45177 - Posted: 4 Nov 2016 \| 15:53:25 UTC - in response to Message 45169. Last modified: 4 Nov 2016 \| 16:01:59 UTC
	Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB. Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Obviously this is a comparison of only two tasks, but they are similar task types and give the same amount of credit: e47s4_e35s2p0f0-SDOERR_CASP10_crystal_ss_5ns_ntl9_1-0-1-RND0325_1 : 1,454.48 804.24 3,150.00 v9.14 (cuda80) e25s5_e20s7p0f45-SDOERR_CASP22S_crystal_ss_5ns_ntl9_2-0-1-RND0908_0 : 1,498.21 736.19 3,150.00 v8.48 (cuda65) When watching the runs, CPU usage, GPU usage and PCIE usage all looked about the same for each GPU. I expect the Pascal uses less power to do the same work, but I haven't measured the power draw just yet... I think a few other people have reported increased CPU usage on the Pascal's. If CPU is increased with the CUDA8.0 app/Pascal cards it's probably going to be more noticeable with larger GPU's. Certainly something to take account of when building a system for GPU crunching. The difference is more noticible with the 20ns tasks: e35s7_e32s8p0f82-SDOERR_CASP10_crystal_ss_20ns_ntl9_2-0-1-RND9465_0 : 5,365.27 3,215.99 12,750.00 v9.14 (cuda80) e14s4_e9s6p0f34-SDOERR_CASP22S_crystal_ss_20ns_ntl9_0-0-1-RND4064_0 : 5,946.69 2,911.63 12,750.00 (cuda65) e15s3_e14s5p0f90-SDOERR_CASP22S_crystal_contacts_20ns_ntl9_2-0-1-RND8066_0 : 5,966.27 2,947.58 12,750.00 v8.48 (cuda65) In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45177 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 45178 - Posted: 4 Nov 2016 \| 16:48:57 UTC - in response to Message 45177.
	Thanks. You have saved me the trouble. That is a nice improvement for the GTX 1060, but not enough to buy a new card. I will leave my GTX 970 on GPUGrid, and my 1060 on Folding, where it gets as much improvement, if not a little more, due to the Quick Return Bonus.
	ID: 45178 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45182 - Posted: 4 Nov 2016 \| 20:03:17 UTC - in response to Message 45177.
	In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt I think on WDDM system's the 3GB GTX 1060 ~20% faster than GTX 970 - at least this what I've observed on PCIe 3.0 x4 GTX 970 at 1.5GHz and 2.1GHz GTX 1060. The 1152CUDA is a great cruncher for here and GTX 1070 even more so from a purely watt/performance point of view. As you mentioned in another thread: the GTX 1060 (3GB) is hands down cost/performance king. IMO: both the GTX 1060 and GTX 1070 are going to be the most efficient GPUGRID GPU until Pascal refresh or Volta - ACEMD scaling a major factor (maybe someday the app will make GTX 1080 work at 95% on WDDM). My GTX 1070 hasn't risen past 110W (80% GPU usage) while staying mostly under 100W. My (2) GTX 970 would hit 170W on some GERALD's with 86% GPU usage. When I start to crunch on the GPU the system uses ~125W, so the GPU is using ~75W. Of note is that my 1060 only has one 6-pin power connector (which only delivers up to ~75W)! So perhaps I'm being power capped? It's a small PCIe-2x16 motherboard, and although there's a 12-pin ATX power connector I wouldn't be surprised if the GPU isn't drawing power from the PCIE slot. A true MiniFit.JR connectors 6 pin (not the one that missing a 12V pin like 4 pin molex adapter to 6-pin type.) can provide more than 75W. Check PSU wire gauge to determine it's amperage limit and you'll find out what the (3) 12V PCIe 6 pin wires are capable of. Tomshardware website has detailed power consumption tests showing how each card draws it's power. Some vBIOS software from AIB (Zotac / MSI / Gigabyte / some EVGA) draw mostly all of it's power from PSU - <25W from PCIe slot that controls up to 3 phases though mostly 1 or 2 on GPU board. If you have a laser thermometer or do simple old fashion skin method - check the PCIe capacitors. If the PCIe is providing most of the power (66W) they'll be hot - if barely warm then 6-pin is main provider. My 4+1 phase Gigabyte windforce OC GTX 1060 (3GB) get's majority of power from PSU with a 6-pin at 116% power limit (140W) = Primegrid Genefer program and si software scientist benchmark max's out power. Quoted from xdev.com Pascal OC guide (link in the x80 Pascal thread) 13A per contact (16AWG wire in small connector) to 8.5A/contact (18AWG wire). This means that using common 18AWG cable, 6-pin connector specified for 17A of current (3 contacts for +12V power, 2 contacts for GND return, one contact for detect). 8-pin have 25.5A current specification (3 contacts for +12V power, 3 contacts for GND return and 2 contacts for detection). 6-pin is 204W at +12.0V level or 306W for 8-pin accordingly.
	ID: 45182 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45184 - Posted: 4 Nov 2016 \| 21:48:14 UTC - in response to Message 45178. Last modified: 4 Nov 2016 \| 22:29:14 UTC
	After completing a 50nm long SCOERR_CASP22SnoS task it's looking more like the GTX1060-3GB can do a Long task in 73% of the time a 970 can (though I'm not certain my settings were identical back on the 30th Oct when using the 970; might have been running a CPU task then). If setup was identical that would make the 1060-3GB 36% faster at long runs, but others would have to demonstrate that too before I'd accept it. I've got a Long PABLO SH2 now and should be able to compare that tomorrow to 3 similar task I ran a few days ago when I definitely had the same setup. Still the same +10% CPU usage. e9s8_e8s1p0f0-SDOERR_CASP22SnoS_crystal_contacts_50ns_ntl9_0-0-1-RND0969_0 : 14,398.54 7,695.54 63,750.00 v9.14 (cuda80) e16s9_e9s9p0f217-SDOERR_CASP10_crystal_ss_50ns_ntl9_0-0-1-RND0343_0 : 19,631.24 6,984.46 63,750.00 v8.48 (cuda65) The Long Long PABLO SH2 task is presently realizing around 93% GPU Utilization with X Server Settings open and ~89% minimized, varying form 88% to 96% when the X Server Settings are open. PCIE bandwidth is ~28% and CPU usage is ~16%. The GPU heated up to 66C, so I increased the fan speed to ~2270RPM (70%) which brought the temp back down to 62C. Noticed that the GPU clock is 1879MHz, slightly lower than with the previous tasks. System power usage is also up to ~160W so the GPU is drawing ~110W (35W) more power running the PABLO's than the SCOERR's. So greater performance from the 1060 3GB while running longer tasks and greater utilization with some Long task types. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45184 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45189 - Posted: 5 Nov 2016 \| 9:02:51 UTC - in response to Message 45184. Last modified: 5 Nov 2016 \| 10:00:11 UTC
	The PABLO_SH2TRIPER took 3% longer on the 1060-3GB than it did on a 970, so there is a lot of performance variation. CPU usage was also 11% less when using the 1060: e16s27_e15s14p0f22-PABLO_SH2TRIPEP_L_TRI_2-0-1-RND3725_0 : 22,123.02 7,779.86 145,800.00 v9.14 (cuda80) e14s15_e12s4p0f72-PABLO_SH2TRIPEP_Q_TRI_1-0-1-RND5699_1 : 21,321.75 8,659.11 145,800.00 v8.48 (cuda65) e21s26_e15s3p0f391-PABLO_SH2TRIPEP_F_TRI_2-0-1-RND2465_0 : 21,323.73 8,596.28 145,800.00 v8.48 (cuda65) Not complaining about these PABLO tasks though; if an 1152 core GPU can get 569K/day it's not bad :) By comparison the 'shorter' Long SDOERR_CASP tasks only collect about 382K/day :\| ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45189 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45199 - Posted: 5 Nov 2016 \| 12:47:10 UTC - in response to Message 45189. Last modified: 5 Nov 2016 \| 12:48:41 UTC
	e9s8_e8s1p0f0-SDOERR_CASP22SnoS_crystal_contacts_50ns_ntl9_0-0-1-RND0969_0 : 14,398.54 7,695.54 63,750.00 v9.14 (cuda80) e16s9_e9s9p0f217-SDOERR_CASP10_crystal_ss_50ns_ntl9_0-0-1-RND0343_0 : 19,631.24 6,984.46 63,750.00 v8.48 (cuda65) -- GTX 1060 3GB @ 2.1GHz / 67% GPU usage / 51% BUS / 74W e10s5_e8s4p0f261-SDOERR_CASP22SnoS_crystal_ss_50ns_ntl9_1-0-1-RND6842_0 15,021.71 6,281.00 63,750.00 (cuda80) -- GTX 1070 @ 2.1GHz / 59% GPU usage / 37% BUS / 78W e5s9_e2s1p0f88-SDOERR_CASP22SnoS_crystal_ss_50ns_ntl9_1-0-1-RND2882_0 12,249.42 6,445.78 63,750.00 (cuda80) You're single GTX 1060 system is 4.21% faster than my GTX 1060 3GB. The higher PCIe bandwidth usage on my system probably due to having 4 GPU's. GTX 1070 PCIe3 x8 19% faster than my GTX 1060 PCIe3 x4. The PABLO_SH2TRIPER took 3% longer on the 1060-3GB than it did on a 970, so there is a lot of performance variation. CPU usage was also 11% less when using the 1060: e16s27_e15s14p0f22-PABLO_SH2TRIPEP_L_TRI_2-0-1-RND3725_0 : 22,123.02 7,779.86 145,800.00 v9.14 (cuda80) e14s15_e12s4p0f72-PABLO_SH2TRIPEP_Q_TRI_1-0-1-RND5699_1 : 21,321.75 8,659.11 145,800.00 v8.48 (cuda65) e21s26_e15s3p0f391-PABLO_SH2TRIPEP_F_TRI_2-0-1-RND2465_0 : 21,323.73 8,596.28 145,800.00 v8.48 (cuda65) -- GTX 1070 @ 2.1GHz / 69% GPU / 51% BUS / 96W: e13s5_e5s7p0f442-PABLO_SH2TRIPEP_W_TRI_2-0-1-RND9211_0 11929451 16,843.90 6,162.14 145,800.00 (cuda80) -- GTX 1060 (3GB) @ 2.1GHz / 74% GPU / 60% BUS / 85W: e15s20_e14s21p0f117-PABLO_SH2TRIPEP_S_TRI_1-0-1-RND6936_0 23,441.73 6,269.66 145,800.00 Long runs (cuda80) GTX 1070 PABLO_SH2TRIPEP 28.1% faster than my GTX 1060. Surprisingly I haven't received any unstable simulation messages on overclocked at 2.1GHz completed WU .
	ID: 45199 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45209 - Posted: 5 Nov 2016 \| 16:07:44 UTC - in response to Message 45199.
	Thanks for posting your performances. Quick look at the differences between our systems: You've a slower operating system (WDDM overhead, 11%+); Windows vs Linux You've a faster CPU, i5-4440S @2.8GHz vs AMD A6-3500 @2.1GHz You've a faster on die PCIE controller You've a PCIE3.0 bus vs my PCIE2.0 bus Your heavier use of the PICE3 bus likely restricts your performances more than my PCIE2.0 x16 is being restrictive Your 2.1GHz GPU clock is ~10% higher than my 1.9GHz. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45209 \| Rating: 0 \| rate: / Reply Quote

fuzzydice555 Send message Joined: 3 Oct 16 Posts: 5 Credit: 125,975,707 RAC: 0 Level Scientific publications	Message 45215 - Posted: 5 Nov 2016 \| 18:31:06 UTC Last modified: 5 Nov 2016 \| 18:31:59 UTC
	GTX 1060 6GB: I had 88-92% utlization yesterday, now it's only 65%. I changed nothing in the system. - Windows 10 - Gigabyte B150 mobo - i5-6600, one core dedicated to GPU running at 3.6 GHz - PCIE3 x16 Power consumption is 72W average at 65%. I crunch WCG on this rig as well. When I enable WCG on the 3 other CPU cores, the GPU usage goes up to 75%. I have no idea why.
	ID: 45215 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45216 - Posted: 5 Nov 2016 \| 19:02:19 UTC - in response to Message 45215. Last modified: 5 Nov 2016 \| 19:04:26 UTC
	I had 88-92% utlization yesterday, now it's only 65%. I changed nothing in the system. It depends (beside the system) on the workunit. Yesterday you had an ADRIA_1JWP_dist, which uses the CPU less than your recent SDOERR_CASP22S20M_crystal_ss_contacts_50ns_ntl9 workunit. Power consumption is 72W average at 65%. I crunch WCG on this rig as well. When I enable WCG on the 3 other CPU cores, the GPU usage goes up to 75%. I have no idea why. It's because you didn't set the SWAN_SYNC environmental value, and without it the GPUGrid app doesn't use a CPU thread that much to make your CPU to boost.
	ID: 45216 \| Rating: 0 \| rate: / Reply Quote

fuzzydice555 Send message Joined: 3 Oct 16 Posts: 5 Credit: 125,975,707 RAC: 0 Level Scientific publications	Message 45222 - Posted: 5 Nov 2016 \| 20:30:15 UTC - in response to Message 45216. Last modified: 5 Nov 2016 \| 20:30:40 UTC
	Thanks, SWAN_SYNC seems to help, now utilization is at 72% even if only GPUGRID is running. Moving to linux/win XP isn't possible, since this is my daily use PC. Would getting a faster CPU help? (i5 6600k/i7 6700k)
	ID: 45222 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45223 - Posted: 5 Nov 2016 \| 20:45:52 UTC - in response to Message 45222.
	Would getting a faster CPU help? (i5 6600k/i7 6700k) No, it won't.
	ID: 45223 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45224 - Posted: 5 Nov 2016 \| 21:31:23 UTC Last modified: 5 Nov 2016 \| 21:35:54 UTC
	If you want to maximize GPU usage on an operating system wich has WDDM (Windows 7, 8, 8.1, 10) you should: - crunch only 1 CPU task (or do not crunch CPU tasks at all) - not crunch on the iGPU - use SWAN_SYNC environmental value to make the GPUGrid app use a full CPU thread - use the app_config.xml to run two WU on a single GPU (it will double the runtimes, so do it only if your runtimes are well below 12h) - put your GPU to a PCIe3.0x16 slot, which really runs at x16 (you can check it with the GPU-Z tool) And now the "how-to" part: (TLDR) To crunch only 1 CPU task you should reduce the percentage of the CPUs available for BOINC, or the number of CPUs in the cc_config.xml. First you have to know how many CPU threads your PC has. To find it you should use the CPU-Z tool, or start task manager (right click on the empty area of taskbar and choose "Task Manager"). Then on the performance tab you should see as many graphs on the "CPU usage history" panel as many "logical processors" your PC has (Windows 10 reports it numerically also). If you see only 1 graph, then you should switch view. Then you should divide 100% by the number of "logical processors" aka "threads" your PC has, and multiply it by the number of GPU tasks you have plus 1 and round it up to the nearest integer, then type the result to the BOINC manager -> Options -> Computing preferences -> Use at most [...] % of the CPUs. The other field has to stay at 100% (use at most 100% of the CPU time). For example: 8 CPU cores + 2 GPU tasks: 100/8(1+2) =37.5 [38%] 12 CPU cores + 3 GPU tasks: 100/12(1+3)=33.333 [34%] 4 CPU cores + 2 GPU tasks: 100/4(1+2) =75 [75%] Theoretically this calculation can result in more than 100%, but in this case you should type 100% (2 CPU cores + 2 GPUs: 100/2(1+2)=150), and do not crunch CPU projects at all. Another method to set the number of CPUs in the cc_config.xml file: The actual number should be set to the number of GPU tasks + 1. Do not set this number higher than the number of your CPU's threads. For example for 2 GPU tasks you should replace the 2 by 3 in the example below: Copy the following to the clipboard: notepad c:\ProgramData\BOINC\cc_config.xml Press Windows key + R, then paste and press enter. If you see an empty file, copy and paste the following: <cc_config> <options> <ncpus>2</ncpus> </options> </cc_config> If your cc_config.xml already has an <options> section and there is no <ncpus> tag in it, you should insert the line <ncpus>2</ncpus> right after the <options> tag. Click file -> save and click [save]. If your BOINC manager is running, click on Options -> read config files. How not to crunch on the iGPU (the Intel GPU integrated into recent Intel CPUs): 1, Do not attach to projects with Intel (OpenCL) clients, or disable this application in the project's computing preferences (it is practical to use a different venue for these hosts) 2, Disable the iGPU in the cc_config.xml file: copy the following to the clipboard: notepad c:\ProgramData\BOINC\cc_config.xml Press Windows key + R, then paste and press enter. If you see an empty file, copy and paste the following text: <cc_config> <options> <ignore_intel_dev>0</ignore_intel_dev> </options> </cc_config> If your cc_config.xml already has an <options> section and there is no <ignore_intel_dev> tag in it, you should insert the line <ignore_intel_dev>0</ignore_intel_dev> right after the <options> tag. Click file -> save and click [save]. If your BOINC manager is running, you can click Options -> read config files. To apply the SWAN_SYNC environmental value: Click Start, copy & paste systempropertiesadvanced and press enter. Click on [Environmental Variables] Look for the lower section called "System Variables", click on the [New] button below the list of System Variables. Type SWAN_SYNC in the name field Type 1 in the Value field Click [OK] 3 times. Exit BOINC manager with stopping scientific applications. Start BOINC manager. To run two GPUGrid tasks on a single GPU: The app_config.xml file should be placed to the project's home directory (by default it's at c:\ProgramData\BOINC\projects\www.gpugrid.net\) Copy the following to the clipboard: notepad c:\ProgramData\BOINC\projects\www.gpugrid.net\app_config.xml Press Windows key + R, then paste and press enter. Copy & paste the following text: <app_config> <app> <name>acemdlong</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> <app> <name>acemdshort</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config> Click file -> save and click [save]. Exit BOINC manager with stopping scientific applications. Start BOINC manager. (If your BOINC manager is running, you can click Options -> read config files.)
	ID: 45224 \| Rating: 0 \| rate: / Reply Quote

fuzzydice555 Send message Joined: 3 Oct 16 Posts: 5 Credit: 125,975,707 RAC: 0 Level Scientific publications	Message 45225 - Posted: 5 Nov 2016 \| 22:59:15 UTC
	Thanks, I'll try these solutions!
	ID: 45225 \| Rating: 0 \| rate: / Reply Quote

Seba Send message Joined: 30 Oct 16 Posts: 6 Credit: 27,935,274 RAC: 0 Level Scientific publications	Message 45228 - Posted: 7 Nov 2016 \| 10:36:44 UTC Last modified: 7 Nov 2016 \| 10:52:05 UTC
	Thanks Retvari Zoltan, now my card works with utilisation 96-98% on windows 10 (driver 375.70) with 2 task.
	ID: 45228 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45236 - Posted: 7 Nov 2016 \| 20:26:07 UTC
	I've successfully installed Ubuntu 16.04 LTS on one of my hosts. Could someone please enlighten me how to make the SWAN_SYNC=1 setting noticed by the app? I'd appreciate it. I've put it in /etc/environment, and when I try printenv it shows the SWAN_SYNC=1, but the app obviously does not take a full CPU thread. The boinc and the GPUGrid app runs as user 'boinc' but I didn't find anything for this user in /home. Is this environmental value handled by the new (9.14) Linux app?
	ID: 45236 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45237 - Posted: 7 Nov 2016 \| 23:39:56 UTC - in response to Message 45236.
	Is this environmental value handled by the new (9.14) Linux app? To answer my own question: I think the new (9.14) Linux app doesn't support SWAN_SYNC=1, as I've started BOINC from the terminal by sudo /usr/bin/boinc --dir /var/lib/boinc-client and the CPU usage remained 7-8% (it should be 25%). I've checked previously that the SWAN_SYNC=1 is listed by sudo printenv This feature should be added.
	ID: 45237 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45269 - Posted: 15 Nov 2016 \| 18:14:03 UTC
	1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. Estimated completion 51,300/sec (14.28hr) at 54% GPU usage / 33% MCU / 24% BUS (PCIe 3.0 x8) / 45% GPU power (83W) I've noticed if only my GTX 1070 is running GPU usage 3 to 6% higher on all WU compared to 2/3/4 GPU Pascal or Maxwell compute.
	ID: 45269 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45273 - Posted: 15 Nov 2016 \| 22:33:29 UTC - in response to Message 45269.
	1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. Estimated completion 51,300/sec (14.28hr) at 54% GPU usage / 33% MCU / 24% BUS (PCIe 3.0 x8) / 45% GPU power (83W) I've noticed if only my GTX 1070 is running GPU usage 3 to 6% higher on all WU compared to 2/3/4 GPU Pascal or Maxwell compute. GPU Clocks? ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45273 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45275 - Posted: 15 Nov 2016 \| 23:11:46 UTC - in response to Message 45273. Last modified: 15 Nov 2016 \| 23:19:30 UTC
	1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. Estimated completion 51,300/sec (14.28hr) at 54% GPU usage / 33% MCU / 24% BUS (PCIe 3.0 x8) / 45% GPU power (83W) I've noticed if only my GTX 1070 is running GPU usage 3 to 6% higher on all WU compared to 2/3/4 GPU Pascal or Maxwell compute. GPU Clocks? 2.1GHz core and 3.8GHz (7.6GHz) memory - 2012MHz out of the box boost. My Pascal throttles in 12.5MHz increments every 8C starting at 32C - I set a +110MHz offset to keep the constant 2.1GHz.
	ID: 45275 \| Rating: 0 \| rate: / Reply Quote

Seba Send message Joined: 30 Oct 16 Posts: 6 Credit: 27,935,274 RAC: 0 Level Scientific publications	Message 45322 - Posted: 18 Nov 2016 \| 19:57:46 UTC
	Anyone knows how to force GPUGRID to work with two different cards : Pascal (cuda 80) and GTX 670 (cuda 65). When I put new card into computer my old card stopped work with GPUGRID. Do you know how to solve this problem? Many thanks!
	ID: 45322 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45323 - Posted: 18 Nov 2016 \| 20:23:25 UTC - in response to Message 45322.
	Anyone knows how to force GPUGRID to work with two different cards : Pascal (cuda 80) and GTX 670 (cuda 65). When I put new card into computer my old card stopped work with GPUGRID. Do you know how to solve this problem? Many thanks! Basically No: Either the app sorts that out or there are two different queues and you can manipulate your Boinc config files to do what you want. At present the cuda80 app is exclusively for Pascal's and the cuda65 app doesn't work for Pascal's. The cuda80 app has also populated all queues - which is fine for most people's setups. If possible move one of the GPUs to another system. In theory you could have two instances of Boinc with different drive locations and exclude one GPU for each instance, but in practice running two instances of Boinc just doesn't work. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45323 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : Pascal Settings and Performance

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45114 - Posted: 2 Nov 2016 \| 13:44:45 UTC
	nanoprobe wrote: JoergF wrote: Strange. My 1070 does not get better than 90% and the 1080 is even worse, maybe 75%. No matter how many tasks and the CPU/GPU ratio. As if the algorithm is not yet Pascal optimized. But this gets off topic a little... maybe I should bring that question forward somewhere else. FWIW my 1060 runs @ 95% with 2 tasks at a time at stock settings. What's truly amazing is that according to my UPS it's only pulling 45 watts. Nanoprobe, you are using your card under Linux, which doesn't have WDDM, while JoergF using his card under Windows 7 which has WDDM.
	ID: 45114 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45115 - Posted: 2 Nov 2016 \| 14:34:35 UTC
	One at a time ADRIA_1JWP_dist (Win8.1 OS) on GTX 1070 and GTX 1060 (3GB) have higher GPU usage than SDOERR CASP (a3d) or GERALD CXCL12. ADRIA_1JWP_dist: -- GTX 1070 (PCIe 3.0 x8) @ 2.1GHz: 78% GPU / 32% MCU / 52% BUS / 109W power -- GTX 1060 (PCIe 3.0 x4) @ 2.1GHz: 82% GPU / 32% MCU / 63% BUS / 96W power GTX 1070 ADRIA_1JWP_dist completed runtime will be ~30% faster. (8hr vs. 11.6hr) Non-ADRIA (GTX 1060) WU's are closer to 20~25% slower than GTX 1070. Pascal's power consumption really shine though even when the card OCed. My 970's are starting to feel old as it's power consumption / performance ratio struggle keeping up with Pascal. Pascal already have a very high out of the box boost negating overclocking. Even at 2.1GHz Pascal core scale for slight performance with PCIe 3.0 x4. For those who fine tune their cards locating it's highest stable operating point - Pascal is the easiest and least fun to manage.
	ID: 45115 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45116 - Posted: 2 Nov 2016 \| 15:16:13 UTC - in response to Message 45115. Last modified: 2 Nov 2016 \| 18:28:06 UTC
	Thanks, The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)... Even within the CASP runs there are different types of sub_run: e13s6_e9s4p0f5-SDOERR_CASP22SnoS_crystal_ss_5ns_ntl9_0-0-1-RND7413_0 1,337.92 - runtime e12s1_e11s5p0f27-SDOERR_CASP22SnoS_crystal_ss_contacts_5ns_a3D_1-0-1-RND6418_0 2,152.56 - runtime The runs that include atomic contacts take longer because they involve calculating contact interactions too. In theory these should utilize the GPU more (but I haven't looked to see if that's the case or not). The CASP runs have rather low PCIe Bandwidth Utilization (for all GPUs). For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15%! - edit - see I'm getting 88% GPU utilization on a non_contacts CASP22SnoS_crystal_ss task. PCIE bandwidth ~ 25%. Temps are higher too. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45116 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45118 - Posted: 2 Nov 2016 \| 17:24:00 UTC - in response to Message 45116.
	Thanks, You're welcome! The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)... [..]The runs that include atomic contacts take longer because they involve calculating contact interactions too. In theory these should utilize the GPU more (but I haven't looked to see if that's the case or not). What I reported in the CASP thread: CASP runtimes (atom and step amount) vary so this just a general reference. GTX 1070 (PCIe 3.0 x8) CASP runtimes: -- 1ns (ntl9) 600 credits = 240/sec @ 2.1GHz / 59% GPU / 15% MCU / 37% BUS / 78W power -- 1ns (a3d)= 1,350 credits = 330/sec @ 2.1GHz / 70% GPU / 24% MCU / 39% BUS / 96W power -- 5ns (ntl9) 3,150 credits = 1,200/sec @ same usage and power numbers as 1ns -- 5ns (a3d) 6,900 credits = 1,600/sec @ same usage and power numbers as 1ns GTX 1060 (3GB) PCIe 3.0 x4 CASP runtimes: -- 1ns (ntl9) 600 credits = 300/sec @ 2.1GHz / 63% GPU / 17% MCU / 51% BUS / 74W power -- 1ns (a3d) 1,350 credits = 450/sec @ 2.1GHz / 74% GPU / 24% MCU / 59% BUS / 88W power -- 5ns (ntl9) 3,150 credits = 1,500/sec @ same GPU usage and power as 1ns -- 5ns (a3d) = 6,900 credits = 2.275/sec @ same GPU usage and power as 1ns IMO: a (1152CUDA GTX 1060) is on par with (2048CUDA GTX 980) and ~20% faster than a (1664CUDA GTX 970). The (1920CUDA GTX 1070) is as (if not) ~5% faster than a (2816CUDA GTX 980ti). CASP WU on (2) GTX 970 at 1.5GHz are 2.5~2.7x slower with PCIe 2.0 x1 compared to PCIe 3.0 x4. WU's require PCIe 3.0 x8 for proper OC scaling. -- 1ns: 900/sec vs. 350/sec -- 5ns: 4770/sec vs. 1761/sec -- PCIe 2.0 x1: 46% GPU / 7% MCU / 80% BUS usage / 75W GPU power -- PCIe 3.0 x4: 57% GPU / 12% MCU / 40% BUS / 107W I haven't noticed any difference between GPU usage with contacts or having none. Though CPU usage 5% higher at ~15% per WU when the contacts are included. For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15% I have >10% CPU usage on each Pascal WU. Mostly around 25% average (4C/4T Haswell S series) crunching (2) WU's. When shooting for the most efficient runtimes it help's to have CPU clock speed above 3GHz (Preferably >3.5GHz). Every GTX 1070 host faster than mine have a overclocked 'K' series even though my Pascals are at 2.1GHz (1.5GHz on Maxwell). WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers.) PCIe 2.0 has an overhead of 20% (8bit/10b line-code encoding) while PCIe3.0 is 128bit/130b. In reality PCIe2.0 has an available bandwidth max of 80%. PCIe3.0 provides 98.4% available bandwidth. Intel (4) bi-directional lanes at 1GB/s per lane DMI link on Haswell and Ivy Bridge is suppose to be faster than AMD's 500MB/s? per lane. Skylake doubled the bandwidth with (4) lanes at 2GB/s per lane. Maybe during MT CPU compute the DMI link became a bottleneck causing dramatic GPU utilization loss? Or PCIe flooded out completely.
	ID: 45118 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45121 - Posted: 2 Nov 2016 \| 19:37:40 UTC
	Since the 9.14 app is CUDA 8.0, and there is a couple of CUDA8.0 drivers for Windows XP I've installed my GTX 1080 under Windows XP x64 with the latest XP driver available for GTX 960 (368.81), but the 9.14 app did not work with this setup. It said that the Task blablabla exited with zero status but no 'finished' file If this happens repeatedly you may need to reset the project. But the task did not run into an error, so these two lines repeated infinitely until I've suspended the task. When I booted this host to Windows 10 the task resumed normally. So now I really have to learn to install a Linux based BOINC host.
	ID: 45121 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45128 - Posted: 2 Nov 2016 \| 23:44:48 UTC Last modified: 2 Nov 2016 \| 23:45:47 UTC
	Is there anyone, who is using a GTX 1080 or TITAN X (Pascal) under Linux with swan_sync on? Is the new (9.14) Linux app supports swan_sync?
	ID: 45128 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45138 - Posted: 3 Nov 2016 \| 15:08:22 UTC - in response to Message 45118. Last modified: 3 Nov 2016 \| 15:09:06 UTC
	WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers. I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45138 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45139 - Posted: 3 Nov 2016 \| 16:01:12 UTC - in response to Message 45138.
	WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers. I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080. Yes by moving to Linux. There are a few remedies on WDDM OS's that help gain GPU utilization - enable SWAN-SYNC (I don't use this). Have a CPU above 3.5GHz with a GPU + CPU PCIe3.0 x16 connection. (single GPU set-ups seem be faster than a system with 2 or 3 of the same CPU and GPU's.) Or compute 2 tasks at a time with 30 to 50% longer runtime than single at a time. PCIe3.0 x8 is the bare minimum for overclock scaling. GTX 1060 and above with PCIe3.0 x4 will encounter a 4~12% performance drop off from x8 depending on type of WU.
	ID: 45139 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45154 - Posted: 4 Nov 2016 \| 8:08:37 UTC Last modified: 4 Nov 2016 \| 8:30:22 UTC
	Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason? My other system is less affected, maybe because the board is a new 1151 and the GTX 1070 GPU therefore utilizes PCIe3.0x16. The CPU is also a little faster (6700K) but I guess it has nothing to do with it. Besides, it would be of interesting to know how the task algorithms work. If there are large data blocks transferred, the bandwith doesn't really matter. (provided that the work time of the GPU per subtask is long enough). You would see some downward peaks in the GPU utilization occasionally. But in case the CPU/Memory and GPU are permanently exchanging little data slices, it could explain why limited bus bandwith would slow down the entire performance. If so, maybe something can be done on the code side in order to keep the Pascal (and the upcoming Volta) type GPU busy. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45154 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45156 - Posted: 4 Nov 2016 \| 9:44:18 UTC - in response to Message 45154.
	Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason? The i7-3770S is PCIE3x16 capable, but I guess there could be some LGA 1055 motherboards that are PCIE2 only. Which motherboard model do you have? If it is PCIE2 that could be the issue or one of the main issues. You could probably get a replacement PCIE3 capable motherboard if that's the case. IF you crunch using your integrated HD Graphics 4000 gPU, that would impact on the GTX1080's performance, as would crunching lots of CPU projects. Basically for optimal performance for GPUGrid (especially for such a high end GPU) you want to be crunching for as few CPU projects as possible. MT apps are a no-no and running apps in a VM can bog the systems down. CPU speed and RAM speed also impact, but while there are faster processors, that CPU isn't bad and it does have a PCIE3 controller on board (but probably just isn't using it). HT off and SWAN_SYNC might help a little too. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45156 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45158 - Posted: 4 Nov 2016 \| 10:13:53 UTC
	Thanks ... the mainboard is an ASUS P8P67-M (socket 1155) and definitely PCIe2.0 only. Yes, the CPU does support 3.0 but the board doesn't and so it could be part of the issue, as you wrote. No, i dont use the iGPU or VM in the background. So I guess I should simply upgrade my system components. Frankly I am waiting for AMD Zen or Intel Cannonlake in order to get noticable extra speed. Upgrading from Ivy Bridge to Kaby Lake now does not make much sense to me, maybe aside from DDR4 Memory which could be some advantage. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45158 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45164 - Posted: 4 Nov 2016 \| 11:47:53 UTC - in response to Message 45158.
	If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now. Zen will facilitate more PCIE3 lanes but wont be out for a while yet. A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition. Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45164 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45165 - Posted: 4 Nov 2016 \| 12:06:32 UTC - in response to Message 45164.
	If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now. I have already considered that but sometimes chaning the motherboard will also lead to different SATA contollers and drivers and therefore Windows will no longer boot. Which means, well, it is surely possible but not that easy. Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. yes, I have noticed that as well. Does this affect the performance in any way? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45165 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45167 - Posted: 4 Nov 2016 \| 13:04:14 UTC - in response to Message 45165.
	Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue. yes, I have noticed that as well. Does this affect the performance in any way? I think it's Boinc that reports this and the app reads the details directly from the hardware/system itself. So it wouldn't impact upon performance. Most tasks tend to use less than 1GB of GDDR and the most I can recall is around 1.7GB. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45167 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45168 - Posted: 4 Nov 2016 \| 13:29:28 UTC - in response to Message 45164. Last modified: 4 Nov 2016 \| 13:30:57 UTC
	A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition. yes... and not to forget the AMD RX490. If this one performs well, which I hope, it will have positive influence on Nvidia pricing in general. Which means: Prices DOWN ;-) ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45168 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45169 - Posted: 4 Nov 2016 \| 13:29:57 UTC - in response to Message 45167.
	Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB. Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Obviously this is a comparison of only two tasks, but they are similar task types and give the same amount of credit: e47s4_e35s2p0f0-SDOERR_CASP10_crystal_ss_5ns_ntl9_1-0-1-RND0325_1 : 1,454.48 804.24 3,150.00 v9.14 (cuda80) e25s5_e20s7p0f45-SDOERR_CASP22S_crystal_ss_5ns_ntl9_2-0-1-RND0908_0 : 1,498.21 736.19 3,150.00 v8.48 (cuda65) When watching the runs, CPU usage, GPU usage and PCIE usage all looked about the same for each GPU. I expect the Pascal uses less power to do the same work, but I haven't measured the power draw just yet... I think a few other people have reported increased CPU usage on the Pascal's. If CPU is increased with the CUDA8.0 app/Pascal cards it's probably going to be more noticeable with larger GPU's. Certainly something to take account of when building a system for GPU crunching. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45169 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 45170 - Posted: 4 Nov 2016 \| 13:37:57 UTC - in response to Message 45169.
	Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Great... which means the comparison by SP GFLOPs out from the specification works for this kind of jobs, more or less. What is the average GPU usage of the 1060? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 45170 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45175 - Posted: 4 Nov 2016 \| 15:12:50 UTC - in response to Message 45170. Last modified: 4 Nov 2016 \| 15:39:55 UTC
	The GPU usage is ~78% - similar but the utilization is spiky and might vary during the run. When I had the 970 in I used NVidia X Server Settings to observe the GPU Utilization. However, I've just observed that keeping the NV X Server Settings window open increases the apparent GPU utilization: When I watch the graphics % using Psensor, X Server windows maximized or minimized there is an approximate 10% difference in GPU utilization. CPU usage is also ~65% when the NV X Server Settings window is open and ~22% when it's minimized. So I would conclude that threads to both the CPU and GPU are kept live when X Server isn't minimized. Power usage at the wall doesn't change. That sort of throws a spanner in the works for some of my observations! - update - leaving the Boinc Manager window open has the same effect (increased GPU utilization). I suspect using SWAN_SYNC or a nice value might improve performance a bit as would running two tasks. My GPU clock is supposed to be 1911MHz but it's 1904MHz (a bit shy of 2000 but not far off). The memory is at 7604MHz. My GTX1060-3GB power usage is about 75W, so it's about 45% more efficient than a GTX970 for here: System idle using 38W at the wall. With Boinc maximized but only running some nci apps the systems power usage is 50W. When I start to crunch on the GPU the system uses ~125W, so the GPU is using ~75W. Of note is that my 1060 only has one 6-pin power connector (which only delivers up to ~75W)! So perhaps I'm being power capped? It's a small PCIe-2x16 motherboard, and although there's a 12-pin ATX power connector I wouldn't be surprised if the GPU isn't drawing power from the PCIE slot. I had to re-enter cool-bits=4 & reboot to allow me to alter the fan speed and manually control the GPU’s temp, but before I did this the GPU temps rose to 63C and the fan didn’t go over 50%. Now I'm running the fans at 60% and the GPU is ~58C. The fan's aren’t audible over the systems case fan. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45175 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45177 - Posted: 4 Nov 2016 \| 15:53:25 UTC - in response to Message 45169. Last modified: 4 Nov 2016 \| 16:01:59 UTC
	Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB. Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Obviously this is a comparison of only two tasks, but they are similar task types and give the same amount of credit: e47s4_e35s2p0f0-SDOERR_CASP10_crystal_ss_5ns_ntl9_1-0-1-RND0325_1 : 1,454.48 804.24 3,150.00 v9.14 (cuda80) e25s5_e20s7p0f45-SDOERR_CASP22S_crystal_ss_5ns_ntl9_2-0-1-RND0908_0 : 1,498.21 736.19 3,150.00 v8.48 (cuda65) When watching the runs, CPU usage, GPU usage and PCIE usage all looked about the same for each GPU. I expect the Pascal uses less power to do the same work, but I haven't measured the power draw just yet... I think a few other people have reported increased CPU usage on the Pascal's. If CPU is increased with the CUDA8.0 app/Pascal cards it's probably going to be more noticeable with larger GPU's. Certainly something to take account of when building a system for GPU crunching. The difference is more noticible with the 20ns tasks: e35s7_e32s8p0f82-SDOERR_CASP10_crystal_ss_20ns_ntl9_2-0-1-RND9465_0 : 5,365.27 3,215.99 12,750.00 v9.14 (cuda80) e14s4_e9s6p0f34-SDOERR_CASP22S_crystal_ss_20ns_ntl9_0-0-1-RND4064_0 : 5,946.69 2,911.63 12,750.00 (cuda65) e15s3_e14s5p0f90-SDOERR_CASP22S_crystal_contacts_20ns_ntl9_2-0-1-RND8066_0 : 5,966.27 2,947.58 12,750.00 v8.48 (cuda65) In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45177 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 45178 - Posted: 4 Nov 2016 \| 16:48:57 UTC - in response to Message 45177.
	Thanks. You have saved me the trouble. That is a nice improvement for the GTX 1060, but not enough to buy a new card. I will leave my GTX 970 on GPUGrid, and my 1060 on Folding, where it gets as much improvement, if not a little more, due to the Quick Return Bonus.
	ID: 45178 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45182 - Posted: 4 Nov 2016 \| 20:03:17 UTC - in response to Message 45177.
	In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt I think on WDDM system's the 3GB GTX 1060 ~20% faster than GTX 970 - at least this what I've observed on PCIe 3.0 x4 GTX 970 at 1.5GHz and 2.1GHz GTX 1060. The 1152CUDA is a great cruncher for here and GTX 1070 even more so from a purely watt/performance point of view. As you mentioned in another thread: the GTX 1060 (3GB) is hands down cost/performance king. IMO: both the GTX 1060 and GTX 1070 are going to be the most efficient GPUGRID GPU until Pascal refresh or Volta - ACEMD scaling a major factor (maybe someday the app will make GTX 1080 work at 95% on WDDM). My GTX 1070 hasn't risen past 110W (80% GPU usage) while staying mostly under 100W. My (2) GTX 970 would hit 170W on some GERALD's with 86% GPU usage. When I start to crunch on the GPU the system uses ~125W, so the GPU is using ~75W. Of note is that my 1060 only has one 6-pin power connector (which only delivers up to ~75W)! So perhaps I'm being power capped? It's a small PCIe-2x16 motherboard, and although there's a 12-pin ATX power connector I wouldn't be surprised if the GPU isn't drawing power from the PCIE slot. A true MiniFit.JR connectors 6 pin (not the one that missing a 12V pin like 4 pin molex adapter to 6-pin type.) can provide more than 75W. Check PSU wire gauge to determine it's amperage limit and you'll find out what the (3) 12V PCIe 6 pin wires are capable of. Tomshardware website has detailed power consumption tests showing how each card draws it's power. Some vBIOS software from AIB (Zotac / MSI / Gigabyte / some EVGA) draw mostly all of it's power from PSU - <25W from PCIe slot that controls up to 3 phases though mostly 1 or 2 on GPU board. If you have a laser thermometer or do simple old fashion skin method - check the PCIe capacitors. If the PCIe is providing most of the power (66W) they'll be hot - if barely warm then 6-pin is main provider. My 4+1 phase Gigabyte windforce OC GTX 1060 (3GB) get's majority of power from PSU with a 6-pin at 116% power limit (140W) = Primegrid Genefer program and si software scientist benchmark max's out power. Quoted from xdev.com Pascal OC guide (link in the x80 Pascal thread) 13A per contact (16AWG wire in small connector) to 8.5A/contact (18AWG wire). This means that using common 18AWG cable, 6-pin connector specified for 17A of current (3 contacts for +12V power, 2 contacts for GND return, one contact for detect). 8-pin have 25.5A current specification (3 contacts for +12V power, 3 contacts for GND return and 2 contacts for detection). 6-pin is 204W at +12.0V level or 306W for 8-pin accordingly.
	ID: 45182 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45184 - Posted: 4 Nov 2016 \| 21:48:14 UTC - in response to Message 45178. Last modified: 4 Nov 2016 \| 22:29:14 UTC
	After completing a 50nm long SCOERR_CASP22SnoS task it's looking more like the GTX1060-3GB can do a Long task in 73% of the time a 970 can (though I'm not certain my settings were identical back on the 30th Oct when using the 970; might have been running a CPU task then). If setup was identical that would make the 1060-3GB 36% faster at long runs, but others would have to demonstrate that too before I'd accept it. I've got a Long PABLO SH2 now and should be able to compare that tomorrow to 3 similar task I ran a few days ago when I definitely had the same setup. Still the same +10% CPU usage. e9s8_e8s1p0f0-SDOERR_CASP22SnoS_crystal_contacts_50ns_ntl9_0-0-1-RND0969_0 : 14,398.54 7,695.54 63,750.00 v9.14 (cuda80) e16s9_e9s9p0f217-SDOERR_CASP10_crystal_ss_50ns_ntl9_0-0-1-RND0343_0 : 19,631.24 6,984.46 63,750.00 v8.48 (cuda65) The Long Long PABLO SH2 task is presently realizing around 93% GPU Utilization with X Server Settings open and ~89% minimized, varying form 88% to 96% when the X Server Settings are open. PCIE bandwidth is ~28% and CPU usage is ~16%. The GPU heated up to 66C, so I increased the fan speed to ~2270RPM (70%) which brought the temp back down to 62C. Noticed that the GPU clock is 1879MHz, slightly lower than with the previous tasks. System power usage is also up to ~160W so the GPU is drawing ~110W (35W) more power running the PABLO's than the SCOERR's. So greater performance from the 1060 3GB while running longer tasks and greater utilization with some Long task types. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45184 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45189 - Posted: 5 Nov 2016 \| 9:02:51 UTC - in response to Message 45184. Last modified: 5 Nov 2016 \| 10:00:11 UTC
	The PABLO_SH2TRIPER took 3% longer on the 1060-3GB than it did on a 970, so there is a lot of performance variation. CPU usage was also 11% less when using the 1060: e16s27_e15s14p0f22-PABLO_SH2TRIPEP_L_TRI_2-0-1-RND3725_0 : 22,123.02 7,779.86 145,800.00 v9.14 (cuda80) e14s15_e12s4p0f72-PABLO_SH2TRIPEP_Q_TRI_1-0-1-RND5699_1 : 21,321.75 8,659.11 145,800.00 v8.48 (cuda65) e21s26_e15s3p0f391-PABLO_SH2TRIPEP_F_TRI_2-0-1-RND2465_0 : 21,323.73 8,596.28 145,800.00 v8.48 (cuda65) Not complaining about these PABLO tasks though; if an 1152 core GPU can get 569K/day it's not bad :) By comparison the 'shorter' Long SDOERR_CASP tasks only collect about 382K/day :\| ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45189 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45199 - Posted: 5 Nov 2016 \| 12:47:10 UTC - in response to Message 45189. Last modified: 5 Nov 2016 \| 12:48:41 UTC
	e9s8_e8s1p0f0-SDOERR_CASP22SnoS_crystal_contacts_50ns_ntl9_0-0-1-RND0969_0 : 14,398.54 7,695.54 63,750.00 v9.14 (cuda80) e16s9_e9s9p0f217-SDOERR_CASP10_crystal_ss_50ns_ntl9_0-0-1-RND0343_0 : 19,631.24 6,984.46 63,750.00 v8.48 (cuda65) -- GTX 1060 3GB @ 2.1GHz / 67% GPU usage / 51% BUS / 74W e10s5_e8s4p0f261-SDOERR_CASP22SnoS_crystal_ss_50ns_ntl9_1-0-1-RND6842_0 15,021.71 6,281.00 63,750.00 (cuda80) -- GTX 1070 @ 2.1GHz / 59% GPU usage / 37% BUS / 78W e5s9_e2s1p0f88-SDOERR_CASP22SnoS_crystal_ss_50ns_ntl9_1-0-1-RND2882_0 12,249.42 6,445.78 63,750.00 (cuda80) You're single GTX 1060 system is 4.21% faster than my GTX 1060 3GB. The higher PCIe bandwidth usage on my system probably due to having 4 GPU's. GTX 1070 PCIe3 x8 19% faster than my GTX 1060 PCIe3 x4. The PABLO_SH2TRIPER took 3% longer on the 1060-3GB than it did on a 970, so there is a lot of performance variation. CPU usage was also 11% less when using the 1060: e16s27_e15s14p0f22-PABLO_SH2TRIPEP_L_TRI_2-0-1-RND3725_0 : 22,123.02 7,779.86 145,800.00 v9.14 (cuda80) e14s15_e12s4p0f72-PABLO_SH2TRIPEP_Q_TRI_1-0-1-RND5699_1 : 21,321.75 8,659.11 145,800.00 v8.48 (cuda65) e21s26_e15s3p0f391-PABLO_SH2TRIPEP_F_TRI_2-0-1-RND2465_0 : 21,323.73 8,596.28 145,800.00 v8.48 (cuda65) -- GTX 1070 @ 2.1GHz / 69% GPU / 51% BUS / 96W: e13s5_e5s7p0f442-PABLO_SH2TRIPEP_W_TRI_2-0-1-RND9211_0 11929451 16,843.90 6,162.14 145,800.00 (cuda80) -- GTX 1060 (3GB) @ 2.1GHz / 74% GPU / 60% BUS / 85W: e15s20_e14s21p0f117-PABLO_SH2TRIPEP_S_TRI_1-0-1-RND6936_0 23,441.73 6,269.66 145,800.00 Long runs (cuda80) GTX 1070 PABLO_SH2TRIPEP 28.1% faster than my GTX 1060. Surprisingly I haven't received any unstable simulation messages on overclocked at 2.1GHz completed WU .
	ID: 45199 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45209 - Posted: 5 Nov 2016 \| 16:07:44 UTC - in response to Message 45199.
	Thanks for posting your performances. Quick look at the differences between our systems: You've a slower operating system (WDDM overhead, 11%+); Windows vs Linux You've a faster CPU, i5-4440S @2.8GHz vs AMD A6-3500 @2.1GHz You've a faster on die PCIE controller You've a PCIE3.0 bus vs my PCIE2.0 bus Your heavier use of the PICE3 bus likely restricts your performances more than my PCIE2.0 x16 is being restrictive Your 2.1GHz GPU clock is ~10% higher than my 1.9GHz. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45209 \| Rating: 0 \| rate: / Reply Quote

fuzzydice555 Send message Joined: 3 Oct 16 Posts: 5 Credit: 125,975,707 RAC: 0 Level Scientific publications	Message 45215 - Posted: 5 Nov 2016 \| 18:31:06 UTC Last modified: 5 Nov 2016 \| 18:31:59 UTC
	GTX 1060 6GB: I had 88-92% utlization yesterday, now it's only 65%. I changed nothing in the system. - Windows 10 - Gigabyte B150 mobo - i5-6600, one core dedicated to GPU running at 3.6 GHz - PCIE3 x16 Power consumption is 72W average at 65%. I crunch WCG on this rig as well. When I enable WCG on the 3 other CPU cores, the GPU usage goes up to 75%. I have no idea why.
	ID: 45215 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45216 - Posted: 5 Nov 2016 \| 19:02:19 UTC - in response to Message 45215. Last modified: 5 Nov 2016 \| 19:04:26 UTC
	I had 88-92% utlization yesterday, now it's only 65%. I changed nothing in the system. It depends (beside the system) on the workunit. Yesterday you had an ADRIA_1JWP_dist, which uses the CPU less than your recent SDOERR_CASP22S20M_crystal_ss_contacts_50ns_ntl9 workunit. Power consumption is 72W average at 65%. I crunch WCG on this rig as well. When I enable WCG on the 3 other CPU cores, the GPU usage goes up to 75%. I have no idea why. It's because you didn't set the SWAN_SYNC environmental value, and without it the GPUGrid app doesn't use a CPU thread that much to make your CPU to boost.
	ID: 45216 \| Rating: 0 \| rate: / Reply Quote

fuzzydice555 Send message Joined: 3 Oct 16 Posts: 5 Credit: 125,975,707 RAC: 0 Level Scientific publications	Message 45222 - Posted: 5 Nov 2016 \| 20:30:15 UTC - in response to Message 45216. Last modified: 5 Nov 2016 \| 20:30:40 UTC
	Thanks, SWAN_SYNC seems to help, now utilization is at 72% even if only GPUGRID is running. Moving to linux/win XP isn't possible, since this is my daily use PC. Would getting a faster CPU help? (i5 6600k/i7 6700k)
	ID: 45222 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45223 - Posted: 5 Nov 2016 \| 20:45:52 UTC - in response to Message 45222.
	Would getting a faster CPU help? (i5 6600k/i7 6700k) No, it won't.
	ID: 45223 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45224 - Posted: 5 Nov 2016 \| 21:31:23 UTC Last modified: 5 Nov 2016 \| 21:35:54 UTC
	If you want to maximize GPU usage on an operating system wich has WDDM (Windows 7, 8, 8.1, 10) you should: - crunch only 1 CPU task (or do not crunch CPU tasks at all) - not crunch on the iGPU - use SWAN_SYNC environmental value to make the GPUGrid app use a full CPU thread - use the app_config.xml to run two WU on a single GPU (it will double the runtimes, so do it only if your runtimes are well below 12h) - put your GPU to a PCIe3.0x16 slot, which really runs at x16 (you can check it with the GPU-Z tool) And now the "how-to" part: (TLDR) To crunch only 1 CPU task you should reduce the percentage of the CPUs available for BOINC, or the number of CPUs in the cc_config.xml. First you have to know how many CPU threads your PC has. To find it you should use the CPU-Z tool, or start task manager (right click on the empty area of taskbar and choose "Task Manager"). Then on the performance tab you should see as many graphs on the "CPU usage history" panel as many "logical processors" your PC has (Windows 10 reports it numerically also). If you see only 1 graph, then you should switch view. Then you should divide 100% by the number of "logical processors" aka "threads" your PC has, and multiply it by the number of GPU tasks you have plus 1 and round it up to the nearest integer, then type the result to the BOINC manager -> Options -> Computing preferences -> Use at most [...] % of the CPUs. The other field has to stay at 100% (use at most 100% of the CPU time). For example: 8 CPU cores + 2 GPU tasks: 100/8(1+2) =37.5 [38%] 12 CPU cores + 3 GPU tasks: 100/12(1+3)=33.333 [34%] 4 CPU cores + 2 GPU tasks: 100/4(1+2) =75 [75%] Theoretically this calculation can result in more than 100%, but in this case you should type 100% (2 CPU cores + 2 GPUs: 100/2(1+2)=150), and do not crunch CPU projects at all. Another method to set the number of CPUs in the cc_config.xml file: The actual number should be set to the number of GPU tasks + 1. Do not set this number higher than the number of your CPU's threads. For example for 2 GPU tasks you should replace the 2 by 3 in the example below: Copy the following to the clipboard: notepad c:\ProgramData\BOINC\cc_config.xml Press Windows key + R, then paste and press enter. If you see an empty file, copy and paste the following: <cc_config> <options> <ncpus>2</ncpus> </options> </cc_config> If your cc_config.xml already has an <options> section and there is no <ncpus> tag in it, you should insert the line <ncpus>2</ncpus> right after the <options> tag. Click file -> save and click [save]. If your BOINC manager is running, click on Options -> read config files. How not to crunch on the iGPU (the Intel GPU integrated into recent Intel CPUs): 1, Do not attach to projects with Intel (OpenCL) clients, or disable this application in the project's computing preferences (it is practical to use a different venue for these hosts) 2, Disable the iGPU in the cc_config.xml file: copy the following to the clipboard: notepad c:\ProgramData\BOINC\cc_config.xml Press Windows key + R, then paste and press enter. If you see an empty file, copy and paste the following text: <cc_config> <options> <ignore_intel_dev>0</ignore_intel_dev> </options> </cc_config> If your cc_config.xml already has an <options> section and there is no <ignore_intel_dev> tag in it, you should insert the line <ignore_intel_dev>0</ignore_intel_dev> right after the <options> tag. Click file -> save and click [save]. If your BOINC manager is running, you can click Options -> read config files. To apply the SWAN_SYNC environmental value: Click Start, copy & paste systempropertiesadvanced and press enter. Click on [Environmental Variables] Look for the lower section called "System Variables", click on the [New] button below the list of System Variables. Type SWAN_SYNC in the name field Type 1 in the Value field Click [OK] 3 times. Exit BOINC manager with stopping scientific applications. Start BOINC manager. To run two GPUGrid tasks on a single GPU: The app_config.xml file should be placed to the project's home directory (by default it's at c:\ProgramData\BOINC\projects\www.gpugrid.net\) Copy the following to the clipboard: notepad c:\ProgramData\BOINC\projects\www.gpugrid.net\app_config.xml Press Windows key + R, then paste and press enter. Copy & paste the following text: <app_config> <app> <name>acemdlong</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> <app> <name>acemdshort</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config> Click file -> save and click [save]. Exit BOINC manager with stopping scientific applications. Start BOINC manager. (If your BOINC manager is running, you can click Options -> read config files.)
	ID: 45224 \| Rating: 0 \| rate: / Reply Quote

fuzzydice555 Send message Joined: 3 Oct 16 Posts: 5 Credit: 125,975,707 RAC: 0 Level Scientific publications	Message 45225 - Posted: 5 Nov 2016 \| 22:59:15 UTC
	Thanks, I'll try these solutions!
	ID: 45225 \| Rating: 0 \| rate: / Reply Quote

Seba Send message Joined: 30 Oct 16 Posts: 6 Credit: 27,935,274 RAC: 0 Level Scientific publications	Message 45228 - Posted: 7 Nov 2016 \| 10:36:44 UTC Last modified: 7 Nov 2016 \| 10:52:05 UTC
	Thanks Retvari Zoltan, now my card works with utilisation 96-98% on windows 10 (driver 375.70) with 2 task.
	ID: 45228 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45236 - Posted: 7 Nov 2016 \| 20:26:07 UTC
	I've successfully installed Ubuntu 16.04 LTS on one of my hosts. Could someone please enlighten me how to make the SWAN_SYNC=1 setting noticed by the app? I'd appreciate it. I've put it in /etc/environment, and when I try printenv it shows the SWAN_SYNC=1, but the app obviously does not take a full CPU thread. The boinc and the GPUGrid app runs as user 'boinc' but I didn't find anything for this user in /home. Is this environmental value handled by the new (9.14) Linux app?
	ID: 45236 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 386 Level Scientific publications	Message 45237 - Posted: 7 Nov 2016 \| 23:39:56 UTC - in response to Message 45236.
	Is this environmental value handled by the new (9.14) Linux app? To answer my own question: I think the new (9.14) Linux app doesn't support SWAN_SYNC=1, as I've started BOINC from the terminal by sudo /usr/bin/boinc --dir /var/lib/boinc-client and the CPU usage remained 7-8% (it should be 25%). I've checked previously that the SWAN_SYNC=1 is listed by sudo printenv This feature should be added.
	ID: 45237 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45269 - Posted: 15 Nov 2016 \| 18:14:03 UTC
	1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. Estimated completion 51,300/sec (14.28hr) at 54% GPU usage / 33% MCU / 24% BUS (PCIe 3.0 x8) / 45% GPU power (83W) I've noticed if only my GTX 1070 is running GPU usage 3 to 6% higher on all WU compared to 2/3/4 GPU Pascal or Maxwell compute.
	ID: 45269 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45273 - Posted: 15 Nov 2016 \| 22:33:29 UTC - in response to Message 45269.
	1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. Estimated completion 51,300/sec (14.28hr) at 54% GPU usage / 33% MCU / 24% BUS (PCIe 3.0 x8) / 45% GPU power (83W) I've noticed if only my GTX 1070 is running GPU usage 3 to 6% higher on all WU compared to 2/3/4 GPU Pascal or Maxwell compute. GPU Clocks? ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45273 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 45275 - Posted: 15 Nov 2016 \| 23:11:46 UTC - in response to Message 45273. Last modified: 15 Nov 2016 \| 23:19:30 UTC
	1fdq-SDOERR_OPMcharmm6-0-1-RND3215_1 longest WU I've encountered to date on GTX 1070. Estimated completion 51,300/sec (14.28hr) at 54% GPU usage / 33% MCU / 24% BUS (PCIe 3.0 x8) / 45% GPU power (83W) I've noticed if only my GTX 1070 is running GPU usage 3 to 6% higher on all WU compared to 2/3/4 GPU Pascal or Maxwell compute. GPU Clocks? 2.1GHz core and 3.8GHz (7.6GHz) memory - 2012MHz out of the box boost. My Pascal throttles in 12.5MHz increments every 8C starting at 32C - I set a +110MHz offset to keep the constant 2.1GHz.
	ID: 45275 \| Rating: 0 \| rate: / Reply Quote

Seba Send message Joined: 30 Oct 16 Posts: 6 Credit: 27,935,274 RAC: 0 Level Scientific publications	Message 45322 - Posted: 18 Nov 2016 \| 19:57:46 UTC
	Anyone knows how to force GPUGRID to work with two different cards : Pascal (cuda 80) and GTX 670 (cuda 65). When I put new card into computer my old card stopped work with GPUGRID. Do you know how to solve this problem? Many thanks!
	ID: 45322 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 45323 - Posted: 18 Nov 2016 \| 20:23:25 UTC - in response to Message 45322.
	Anyone knows how to force GPUGRID to work with two different cards : Pascal (cuda 80) and GTX 670 (cuda 65). When I put new card into computer my old card stopped work with GPUGRID. Do you know how to solve this problem? Many thanks! Basically No: Either the app sorts that out or there are two different queues and you can manipulate your Boinc config files to do what you want. At present the cuda80 app is exclusively for Pascal's and the cuda65 app doesn't work for Pascal's. The cuda80 app has also populated all queues - which is fine for most people's setups. If possible move one of the GPUs to another system. In theory you could have two instances of Boinc with different drive locations and exclude one GPU for each instance, but in practice running two instances of Boinc just doesn't work. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 45323 \| Rating: 0 \| rate: / Reply Quote