Advanced search

Message boards : Graphics cards (GPUs) : PCI-e 16x PCI-e 4x

Author Message
3g555PSfYPxkNiQmmBd7MULS7...
Send message
Joined: 2 May 17
Posts: 2
Credit: 107,269,450
RAC: 0
Level
Cys
Scientific publications
watwat
Message 47472 - Posted: 18 Jun 2017 | 15:42:53 UTC

Hi,
I am in search of possible answers. I have just started GPU Crunching.
I have 2 GTX 1050 Ti 3Gb on a Gigabyte AB350MHD3 Motherbaord with an AMD Ryzen 5 1400 CPU on Windows 10.
I have noticed that when processing WU.( one core / WU, Core working on GPU is ranging 70-80 % core usage).

The first GPU card gets hot and GPU processor is at 90% most of the time.
This GPU is connected to the first PCI-e which works at PCI-e 3.0 16X.
With bus interface load averaging 30%.
The second card does not seem to get as hot as the first and the GPU processor is at 90% most of the time also.
This GPU is connected to the second PCI-e 16x Slot which works at PCI-e 2.0 4x.
With bus interface load averaging 30%.

Now, since both the first and second have their GPU processor usage ranging in the 90% and since the bus interface load is averaging 30% should both cards not heat up the same? For info, the second card takes a few additional hours to complete a WU than the first.

Also, Since it is evident that there is no bottleneck on the bus interface , does this mean the cards perform the same for this type of WU even though the bandwidth between GPU / CPU is narrower?

Now, the big question, since both cards are having the same processor usage, and there is no bottle neck, should the WU not finish at the same time also?

I might not be looking at the right information from GPU-Z.

If there is anyone out there who can guide me, I would much appreciate.
Thanks.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47476 - Posted: 18 Jun 2017 | 21:56:15 UTC - in response to Message 47472.
Last modified: 18 Jun 2017 | 22:02:02 UTC

I have 2 GTX 1050 Ti 3Gb on a Gigabyte AB350MHD3 Motherbaord with an AMD Ryzen 5 1400 CPU on Windows 10.
I think you refer to host 430643. This host now have only one GTX 1050 Ti, but I've checked your previous tasks, and it had two GPUs before, one GTX 1050 Ti and one GTX 1050 (without the "Ti": see task 16353457, 16355260). These are two different GPUs, the main difference is the number of CUDA cores: 768 vs 640, which explains the different computing times you've experienced, but there's more.

I have noticed that when processing WU.( one core / WU, Core working on GPU is ranging 70-80 % core usage).
GPU usage is misleading, if you want better estimation you should look at the GPU power measurement.
Different GPU models have different power consumption (and TDP).

The first GPU card gets hot and GPU processor is at 90% most of the time.
If this PC has a tower case, then the hot air from the lower card heats the upper card. Cards which blow the hot air out directly through the rear grille are better in this regard (they don't blow the hot air into the case). (There's no such GTX 1050 / GTX 1050 Ti as far as I know)

This GPU is connected to the first PCI-e which works at PCI-e 3.0 16X.
With bus interface load averaging 30%.
The second card does not seem to get as hot as the first and the GPU processor is at 90% most of the time also.
This GPU is connected to the second PCI-e 16x Slot which works at PCI-e 2.0 4x.
With bus interface load averaging 30%.

Now, since both the first and second have their GPU processor usage ranging in the 90% and since the bus interface load is averaging 30% should both cards not heat up the same?
They would not heat up the same even if they would be the same GPU.

For info, the second card takes a few additional hours to complete a WU than the first.
That could be caused by the narrower PCIe bandwidth, but in your case simply the lesser GPU takes longer.

Also, Since it is evident that there is no bottleneck on the bus interface , does this mean the cards perform the same for this type of WU even though the bandwidth between GPU / CPU is narrower?
Yes. There are two factors of it:
1. the present workunits does not need too much interaction between the GPU and the CPU
2. Your GPUs are not that fast to make the PCIe bandwidth bottleneck noticeable

Now, the big question, since both cards are having the same processor usage, and there is no bottle neck, should the WU not finish at the same time also?
They should, but your past workunits show different GPUs, so you might be missed the "Ti" on one of them.

I might not be looking at the right information from GPU-Z.
GPUz should show the different models. (One with Ti, and one without.)

Engagex BOINC-SETI
Send message
Joined: 7 Oct 16
Posts: 5
Credit: 949,150
RAC: 0
Level
Gly
Scientific publications
wat
Message 47483 - Posted: 20 Jun 2017 | 16:45:17 UTC

I run three dedicated machines. One of them has 5 GPUs. Only one of them is in a full length (16X) slot. One of them is in an 8X slot. The rest are in 1X slots with riser ribbons.

BOINC apps do not need to fully utilize all lanes. The card does all the work and only needs 1X for communicating. The other 15 lanes are for processing video during videogame play which BOINc does not use.

I run a GeForce GT 710, two 620s, a 520 and a 420. The 710 has the fastest chip (954 Mhz) but only 1Gb Vram/64-bit memory interface. Also has no processor clock, just a graphics clock.

The 420 has 2Gb Vram/128-bit memory interface. This card I use to run the display because it has the most memory/memory bandwidth/speed. It also has an 80mm fan. It also uses the most power. The other cards fall somewhere in between having 1-2Gb Vram but only 64-bit memory.

That being said, the only card that will accept and run GPUGrid tasks is the GT 710.

3g555PSfYPxkNiQmmBd7MULS7...
Send message
Joined: 2 May 17
Posts: 2
Credit: 107,269,450
RAC: 0
Level
Cys
Scientific publications
watwat
Message 47486 - Posted: 20 Jun 2017 | 19:40:26 UTC

Thanks for all Info.

@ Retvian , yes I removed the card because I had to move the rigs to their final resting place. I have ordered PCI-e extension cables and will reinstlal them shortly. Sorry I missed the Ti and it sure is the difference. But when the 1050 is installed alone it produces more heat then on the second slot...

@Engagex , I am in the process of building another rig using the 6 x PCI-e 1x riser cable on 1060 GPUs. I'll report back on another thread. I myself could not locate any 'reference' info of people crunching on PCI-e 1x riser cables.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47487 - Posted: 20 Jun 2017 | 22:39:04 UTC - in response to Message 47483.

I run three dedicated machines. One of them has 5 GPUs. Only one of them is in a full length (16X) slot. One of them is in an 8X slot. The rest are in 1X slots with riser ribbons.

BOINC apps do not need to fully utilize all lanes.
Every BOINC app is different. The GPUGrid app is *very* different. Different workunit batches in GPUGrid could need *very* different PCIe bandwidth. Even the WDDM alone can shave off 10-15% performance. That's why I used exclusively Windows XP x64 for crunching, until the Pascals came.

The card does all the work...
That's not true for the GPUGrid app, as the double precision arithmetic is done on the CPU, moreover there are batches which apply extra forces to the atoms, and this is done on the CPU.

...and only needs 1X for communicating. The other 15 lanes are for processing video during videogame play which BOINc does not use.
There's no such dedicated lane in PCIe. If there are more lanes, every app (regardless if it's a game, CAD, folding@home, or some BOINC related app) will utilize it. That's one of the key features of PCIe. The performance gain could be in the range from negligible to a direct ratio of the available lanes, but it's related to the GPU application, not to the PCIe architecture.

I run a GeForce GT 710, two 620s, a 520 and a 420. The 710 has the fastest chip (954 Mhz) but only 1Gb Vram/64-bit memory interface. Also has no processor clock, just a graphics clock.
Fast does not equal to clock frequency regarding GPUs. A GTX 1080 Ti could be 30 times faster than a GT 710, while it has only 1.5 times the clock frequency of a GT 710. The faster the GPU, the easier the app running on it could hit the PCIe bandwidth limit.

The 420 has 2Gb Vram/128-bit memory interface. This card I use to run the display because it has the most memory/memory bandwidth/speed. It also has an 80mm fan. It also uses the most power. The other cards fall somewhere in between having 1-2Gb Vram but only 64-bit memory.

That being said, the only card that will accept and run GPUGrid tasks is the GT 710.
The GPUGrid app and project is a power hungry kind.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47489 - Posted: 21 Jun 2017 | 9:22:23 UTC - in response to Message 47486.

But when the 1050 is installed alone it produces more heat then on the second slot...
A PCIe3.0x16 connection is roughly 8 times faster than a PCIe2.0x4, that could cause the difference in processing speed and heat dissipation. The other factor is the number of other (CPU) tasks running simultaneously. In my experience if I run more than 1 CPU task it does not "worth it" regarding the host's overall RAC, as it reduces more the host's GPU throughput (RAC) than it increases the host's CPU throughput (RAC) (If the host has a high-end GPU). Of course there could be other reasons to crunch CPU and GPU tasks on the same host, but regarding performance it does not worth it. If GPU performance is the main reason for building a rig it should have only one high-end GPU, but it does not need a high-end CPU and/or MB (with more PCIex16 connectors); it's better to build as many PCs as your space / budget allows.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,617,042,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 47490 - Posted: 21 Jun 2017 | 11:07:44 UTC - in response to Message 47486.
Last modified: 21 Jun 2017 | 11:11:20 UTC

I am in the process of building another rig using the 6 x PCI-e 1x riser cable on 1060 GPUs. I'll report back on another thread. I myself could not locate any 'reference' info of people crunching on PCI-e 1x riser cables.

This project uses extreme PCIe BUS memory bandwidth and requires at least an 8x PCIe 2.0 connection for anywhere near full speed. If you have PCIe 3.0 I would still try to get an 8x or even a 16x riser. This project, unlike all other GPU compute mining for cryptocurrency, is scientific which means it has aspects that need double precision compute, which is done on the CPU. I would run 0 CPU tasks and have at least 1 CPU thread per GPU dedicated and lock the CPU at the highest frequency you can get away with.

During the 6 1060 installation, if your motherboard doesn't support at least 6 4x PCIe slots then I would stick to 4 GPUs, and perhaps put the last two in another machine. 1x is fine for cryptocurrency mining, but this is far more important and you will probably get 30% GPU utilization on 1x.

Engagex BOINC-SETI
Send message
Joined: 7 Oct 16
Posts: 5
Credit: 949,150
RAC: 0
Level
Gly
Scientific publications
wat
Message 47493 - Posted: 22 Jun 2017 | 16:06:01 UTC - in response to Message 47487.
Last modified: 22 Jun 2017 | 16:06:54 UTC

Here is the xserver output of the GPU that's currently running GPUGrid:
https://drive.google.com/file/d/0By69FNpSrHHOTmpPUnZ6ZGV1UHc/view?usp=sharing

This is the first time I've seen any other card but the GT 710 running it.

Anyway, at 95% GPU use the PCIe bandwidth is only at 1%. The most I've EVER seen it at for ANY project was 15% regardless of 16X or 1X.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47494 - Posted: 22 Jun 2017 | 16:25:37 UTC - in response to Message 47493.

The most I've EVER seen it at for ANY project was 15% regardless of 16X or 1X.

With my hosts running a GTX750Ti and at GTX970, both on MBs with PCIe2@16x, GPU-Z shows bus interface values between 54% and 57%.
On the host with the two GTX980Ti (MB with PCIe3@16x), GPU-Z unfortunately shows "0" for bus interface (which means the tool does not recognize the bus speed).

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47495 - Posted: 22 Jun 2017 | 21:53:21 UTC - in response to Message 47493.
Last modified: 22 Jun 2017 | 21:56:22 UTC

Anyway, at 95% GPU use the PCIe bandwidth is only at 1%. The most I've EVER seen it at for ANY project was 15% regardless of 16X or 1X.

GTX 1080 @ 2000 MHz / 4763 MHz, Windows 10 v1703, NVidia v382.05, PCIe3.0x16, Bus usage: 30-31%
GTX 1080 Ti @ 1974 MHz / 5454 MHz, Windows 10 v1703, NVidia v382.05, PCIe3.0x16, Bus usage: 33-34%

Engagex BOINC-SETI
Send message
Joined: 7 Oct 16
Posts: 5
Credit: 949,150
RAC: 0
Level
Gly
Scientific publications
wat
Message 47496 - Posted: 23 Jun 2017 | 14:32:24 UTC - in response to Message 47495.
Last modified: 23 Jun 2017 | 14:33:32 UTC

Where's the proof?
Those numbers make sense though. Being about twice as fast they use about twice the bandwidth...

Post to thread

Message boards : Graphics cards (GPUs) : PCI-e 16x PCI-e 4x

//