Advanced search

Message boards : Graphics cards (GPUs) : Performance problem on an x1 PCI-e slot

Author Message
tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41255 - Posted: 5 Jun 2015 | 14:16:20 UTC

Until yesterday I was running a 780Ti and an unpowered 750Ti on my main rig, and a 770 on another. I had a spare 660 sitting in its box for want a PCIe slot.

This morning I remembered I had this x1 to x16 PCI Express Riser. I took out the 750Ti, replacing it with the spare 660, and installed the 750Ti on the riser, beside the 770 (this rig has a 620W PSU). Win 7 installed drivers and away she went! But...

In an x16 slot the 750Ti performs marginally better than a 660. In the x1 slot it's not much better than one third of the 770:



Is this to be expected from an x1 slot, or am I missing something?

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41256 - Posted: 5 Jun 2015 | 16:42:49 UTC - in response to Message 41255.
Last modified: 5 Jun 2015 | 16:47:11 UTC

Is this to be expected from an x1 slot, or am I missing something?

I can't speak for GPU's computing ACEMD app with a PCIe riser. In my experience - anything under x8/16 slot cause additional 10-22% runtime for GM204 GPU's on recent ACEMD tasks. GPU tasks at POEM or Primegrid (PPSseive) with ~2% BUS usage are unaffected and a non-factor: 970/750 Runtimes are the same for x4 vs. x8/16 slot.

PCIe3.0 x4 slot adds 10-20% to a 970 ACEMD runtime compared to PCIe3.0 x8/16. The GTX970 computing NOELIA on PCIe3.0 x4 cause the BUS load to be ~70%. (Runtimes take a beating while two tasks at a time is no go.) NOELIA 970's x8/x16 bus load is 40%. GERALD's x4 bus load is 55%.

Maxwell's BUS usage is higher than Kelper on comparable GPU's. WDDM tax and x4 slot operation causes a 30% performance degradation compared to the fastest 970's.

GTX970 (1.5GHz) on a PCIe3.0 x4 bus (48.3k runtime) will raise GERALD runtime's 17.2% compared to x8 PCIe3.0 (39-40K runtime). PCIe x4 NOELIA shorts are 12% slower vs. x8/16 slot. (8500/9600K runtime)

I have a PCIe SSD that doesn't like the x4 slot so I sacrificed my 970. (Refub ASUS WS Z97 [4] x8 board being delivered in few days)

My Gigabyte GTX750 runs ACEMD stable overclocked to 1406. >1406MHz cause an unstable simulation. I've settled on locked custom BIOS boost clock of (1.162V) 1341MHz due to summer ambient temps being over 90F with no AC. The stock BIOS at 80C Temp target pulls 85-110% of TDP as the Boost Clocks fluctuate from 1291-1406MHz (1.075-1.200V) dependent upon ambient temps.

(PCI3.0 x8 load is 33% while MCU usage = 30%) My GTX750 NOELIA's output is 45.7% of my GTX970 even though the 970 is clocked (1506MHz) 11.6% higher and has 3.25X more cores (512/1664). x8 750 has 51.2% GERALD output of my 970 on PCIe x4. (94k/48k)

When 970 is operating on PCIe3.0 x8 the GTX750 NOELIA output is 40.4% of a 970. Scaling for GM107 is world class. My GTX750 core usage at 94% while 970 is 79% computing NOELIA's. As of now with CUDA6.5 ACEMD app: GTX750 scaling 16% higher than 970 for NOELIA's. 750 vs. 970 computing GERALD's show a 22% scaling difference: 90% core (25%MCU/30%BUS x8) for 750. 70% core (29%MCU/55%BUS x4) for the 970.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2334
Credit: 16,178,080,749
RAC: 281,455
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41257 - Posted: 5 Jun 2015 | 17:30:20 UTC - in response to Message 41255.
Last modified: 5 Jun 2015 | 17:32:16 UTC

Is this to be expected from an x1 slot, or am I missing something?

Yes it's the expected performance.

If the x16 slot is PCIe3.0, and the x1 slot is PCIe2.0 then their performance ratio is 32x.
This is a huge difference even when there's the WDDM overhead.
The PCIe 1x slot is meant to be populated with USB, SATA/IDE, LAN, Parallel/serial, sound (etc.) controllers.

The recent GERARD_FXCXCL12_LIG workunits don't use the PCIe bandwidth that much.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41264 - Posted: 6 Jun 2015 | 10:35:41 UTC - in response to Message 41257.

Is this to be expected from an x1 slot, or am I missing something?

Yes it's the expected performance.

If the x16 slot is PCIe3.0, and the x1 slot is PCIe2.0 then their performance ratio is 32x.
This is a huge difference even when there's the WDDM overhead.
The PCIe 1x slot is meant to be populated with USB, SATA/IDE, LAN, Parallel/serial, sound (etc.) controllers.

The recent GERARD_FXCXCL12_LIG workunits don't use the PCIe bandwidth that much.

Thanks for the head-up, Retvari!

Looking like 48 hours for the 750Ti on PCIe x1...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41268 - Posted: 6 Jun 2015 | 18:02:19 UTC - in response to Message 41257.
Last modified: 7 Jun 2015 | 11:44:35 UTC

an unpowered 750Ti

Tomba, that PCIE riser doesn't look like it's powered; the 2 sets of jumpers only indicate that it wants to draw from both the 12V and 3.3V rails - doesn't mean it's doing what you want - maybe the jumpers are not set right, the board is rejecting the settings or can't supply the full 60W to that slot, saying as you also have a power-hungry GTX770 in the system.

You could check the Power Usage and GPU usage to find out (and clocks just in case its downclocking).

You appear to have it in an i7-920/LGA1366 system which means all the slots are PCIE 2.0. Normally only 75W in total can be supplied over the PCIE bus, without additional power feeders going into the board (in which case it can be up to 300W). This might be limiting your GPU.

2 months ago I used a riser with a molex power connector. I had a GTX970 in the same system and added my GTX670 so it was external to the case. My performance loss wasn't too much - noticeable but as much as you are seeing. Ran a Long GERARD_FXCXCL12 in 64,881 sec (18h). That was around about 18% slower than normal on a GTX670, but I was running Einstein tasks on the iGPU (3770K), 5 CPU tasks and the GPU was underclocked by 65MHz. So in theory a loss of <15% would be possible (on that system).

CPU usage definitely impacts highly when using a PCIE2 X1 or X4 slot (and varies a bit by task type).

The location of the PCIE X1 controllers is a factor too. In your case it's NOT the Northbridge (that's only for X16 slots), it's the ICH10/ICH10R (Southbridge) chip which connects to the Northbridge via DMI 1 @ 2Gb/s (but links to each PCIe slot @3Gb/s). I was using my Z77 board which uses DMI 2 (20Gb/s total, shared) and links to the PCIe slots @5Gb/s each.

Skylake (forecasts Sept 2015 to Jan 2016) will introduce DMI 3.0 and PCIE 3.0 X1 on the Southbridge (think the PCIe link will be 8Gb/s, but not sure). With PCIe 3 being twice as fast as PCIe 2 the loss from using PCIe X1 should be halved, making the use of PCIe X1 slots more feasible for new (Skylake) builds.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41270 - Posted: 7 Jun 2015 | 12:30:12 UTC - in response to Message 41268.

The location of the PCIE X1 controllers is a factor too. In your case it's NOT the Northbridge (that's only for X16 slots), it's the ICH10/ICH10R (Southbridge) chip which connects to the Northbridge via DMI 1 @ 2Gb/s (but links to each PCIe slot @3Gb/s). I was using my Z77 board which uses DMI 2 (20Gb/s total, shared) and links to the PCIe slots @5Gb/s each.

Skylake (forecasts Sept 2015 to Jan 2016) will introduce DMI 3.0 and PCIE 3.0 X1 on the Southbridge (think the PCIe link will be 8Gb/s, but not sure). With PCIe 3 being twice as fast as PCIe 2 the loss from using PCIe X1 should be halved, making the use of PCIe X1 slots more feasible for new (Skylake) builds.

Sandy/Ivy/Haswell/Broadwell (z77/87/97 chipset) CPU's are limited to PCie3.0 x16 (8GT/s) or [2]x8 (5GT/s) and x8/x4/x4 connection to GPU or PCIe SSD. Bandwidth figures depend on amount devices connected. An PCIe SSD along with 3 or more GPU's on z*7 chipset - bandwidth is nearly starved. The z*7 chipset alone can provide PCIe2.0 x16 (x4) to GPU.

DMI2.0 x99 platform (LGA2011-v3) CPU's have 28 or 40 PCIe3.0 lanes. 16x/8x/4x or 16x/8x/8x/8x and 8x/8x/8x/8x/8x etc. An ideal set up for 5 or more GPU's and PCie SDD. Multi (3 or more) GPU setups really shine on x99 boards compared to z*7.

Mainstream Skylake (LGA1151) CPU will add 4 more PCIe4.0 (16GT/s) lanes (20 Total) at 8x/8x/4x or 8x/4x/4x/4x CPU/GPU interconnect while the chipset will be capable of PCIe3.0 x8 (x4/x4) - PCIe2.0 x4 on z*7 chipsets.

Expansion slot specs for the Asus WS z97 board I've picked up (CPU + GPU PCIe3.0 interface limited to x8/x4/x4):
4 x PCIe 3.0/2.0 x16 (dual x16 or x16/x8/x8 or quad x8)
1 x PCIe 2.0 x4 (x4 mode)
2 x PCIe 2.0 x1

Asus WS-E x99 board PCIe expansion slot specs: 7 x PCIe 3.0/2.0 x16 (single x16 or dual x16/x16 or triple x16/x16/x16 or quad x16/x16/x16/x16 or seven x16/x8/x8/x8/x16/x8/x8)

(CPU+GPU interconnect for 7GPU's on a 40lane CPU: 8x/8x/8x/4x/4x/4x/4x while 28lane is 4x/4x/4x/4x/4x/4x/4x/):

Bandwidth For single-lane (×1) and 16-lane

(×16) links, in each direction:
v1.x (2.5 GT/s):
250 MB/s (×1)
4 GB/s (×16)

v2.x (5 GT/s):
500 MB/s (×1)
8 GB/s (×16)

v3.0 (8 GT/s):
985 MB/s (×1)
15.75 GB/s (×16)

v4.0 (16 GT/s):
1969 MB/s (×1)
31.51 GB/s (×16)


Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41273 - Posted: 7 Jun 2015 | 13:05:04 UTC - in response to Message 41270.
Last modified: 7 Jun 2015 | 15:06:17 UTC

Mainstream Skylake (LGA1151) CPU will add 4 more PCIe4.0 (16GT/s) lanes (20 Total) at 8x/8x/4x or 8x/4x/4x/4x CPU/GPU interconnect while the chipset will be capable of PCIe3.0 x8 (x4/x4) - PCIe2.0 x4 on z*7 chipsets.

The final PCIe4.0 specs aren't expected until late 2016. So, PCIE3.0 1x16, 2x8 or 1x8+2X4 directly from the CPU (Northbridge) and 'up-to' 20 PCIe3.0 (Southbridge) lanes on the Z170 and Q170 Intel 100 series chipsets. In theory this could facilitate 4 GPU's on PCIE3 X8 (2 Northbridge, 2 Southbridge). Obviously Skylake isn't out yet and neither are the motherboards so we don't know for sure. It's likely however that motherboard manufacturers will start implementing PCIe 3.0 x4 interfaces to facilitate M.2 & SATA Express.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41297 - Posted: 11 Jun 2015 | 15:20:08 UTC - in response to Message 41255.

Until yesterday I was running a 780Ti and an unpowered 750Ti on my main rig, and a 770 on another. I had a spare 660 sitting in its box for want a PCIe slot (etc).

Well - I'm back to the original configuration, having put the spare 660 back in its box. Why?

1. The 750Ti on the riser was taking 48 hours to complete a Gerard
2. She who must be obeyed observed that the fan noise from my main rig was unacceptable
3. Perhaps the main reason: ASUS Thermal Radar reported that, with the 780Ti and 660 together, the temperature on PCIe1 was 65C, and warnings start at 60C. With the 750Ti it's a modest 50C.

Hey-ho. I tried...

Post to thread

Message boards : Graphics cards (GPUs) : Performance problem on an x1 PCI-e slot

//