Advanced search

Message boards : Number crunching : 8 GPUs on a motherboard with 7 PCIe slots: Bifurcation

Author Message
Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55775 - Posted: 18 Nov 2020 | 21:48:44 UTC

So a while back I upgraded my old 10-GPU system to newer hardware. It was running a Supermicro X9DRX+-F motherboard which provided 10x PCIe 3.0 x8 slots. It was great for PCIe connectivity and a great project to get it all working. but i wanted to update it to more modern hardware with faster and more power efficient hardware.

So i upgraded it to an AMD EPYC platform, but without getting into an incredibly expensive and proprietary server ecosystem, I would be stuck with only 7x PCIe slots as is with the standard ATX spec. So that's what I did.

Asrock Rack EPYCD8 motherboard
AMD EPYC (Rome) 7402P 24-core/48-thread CPU
64GB (4x16GB) DDR4 3200MHz Registered ECC
8x EVGA RTX 2070 GPUs

So how to put all 8 GPUs on only 7 slots? Bifurcation! This motherboard supports bifurcation natively on all four(4) x16 slots, but I would need a riser to actually split the slot into 2 slots to accept two GPUs. did some digging and came across a user C_Payne on [H]ardforum, and eventually found his webstore where he sells custome risers just for this purpose.

finally got the bifurcation card and got it all setup. working great and it all went as smoothly as I could have hoped for. just plug in the riser to the cards, hook up the 8-pin power, and change the slot settings in the BIOS to x8x8. the biggest issue was waiting 6 weeks for delivery from Germany (I'm in the US), likely delayed due to COVID restrictions or policies.

this system: https://www.gpugrid.net/show_host_detail.php?hostid=543446

pics: https://imgur.com/a/LrreKks

this is the riser board that I bought: https://peine-braun.net/shop/index.php?route=product/product&path=59&product_id=81

obviously this requires a motherboard that supports PCIe bifurcation with support exposed in the BIOS. you can't do this on most consumer motherboards, but it seems to be well supported on Asrock motherboards (even consumer) and some higher end HEDT boards. and very well supported on Server and Workstation motherboards. you could in theory use a riser board with a PLX chip to do the same thing without needing bifurcation support on the mb (C_Payne even sells some), but they are MUCH more expensive.

the advantage here is that it's cheaper and uses less power than a PLX based solution, and i still get PCIe 3.0 x8 bandwidth to each GPU, so no slowdown with GPUGRID. i think i should be able to get all the way to 11 GPUs if i wanted to. but it'll likely stay at 8 for the foreseeable future due to power constraints at this location.
____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55781 - Posted: 19 Nov 2020 | 14:02:04 UTC

Impressive!

Gives new meaning to gathering around the fire (GPUs) at christmas...

My PC room is 90 degrees (F) at christmas (summer). What room temps do you see ambient?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55782 - Posted: 19 Nov 2020 | 14:44:26 UTC - in response to Message 55781.
Last modified: 19 Nov 2020 | 14:51:43 UTC

it varies. i usually leave the window to the room open a bit to let some fresh air in, and its below freezing at night now. i dont think the room ever gets over 75-80F, with the window closed. and much much cooler with it open. probably down to the 50s or 60sF.

the GPUs stay around 50-60C though while running GPUGRID

I have all GPUs power limited to 150W each, with an overclock to offset most of what i lost from power limiting. the whole system pulls 1500-1600W running full tilt.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 212
Credit: 359,295,378
RAC: 0
Level
Asp
Scientific publications
watwat
Message 55784 - Posted: 19 Nov 2020 | 16:58:03 UTC - in response to Message 55775.

Asrock Rack EPYCD8 motherboard


I was ogling your EPYC machine the other day and was curious if you had the Supermicro rack setup. I see you found something better. Asrock makes heavy-duty stuff and I've come to respect them even more than Asus from what I've read and heard from the geeks in the local computer club.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 813
Credit: 1,099,829,831
RAC: 2,305,539
Level
Met
Scientific publications
watwatwatwatwat
Message 55792 - Posted: 19 Nov 2020 | 19:42:20 UTC
Last modified: 19 Nov 2020 | 19:42:32 UTC

Supermicro has very little ATX sized motherboards. Mostly custom footprint solutions for 1U or 2U servers.

Asrock Rack OTOH seems to specialize in common motherboard form factors that fit in standard PC cases very easily.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55793 - Posted: 19 Nov 2020 | 19:44:51 UTC - in response to Message 55784.

when I was running the supermicro board. the only thing I cared about at the time was PCIe connectivity.

I wasn't too interested in CPU processing as I wasn't using it. but now, since I've started doing Universe@home on the CPU, i wanted something more capable and power efficient.

the platform change moved to a much more power efficient CPU (and much better IPC, more cores), more power efficient and faster RAM, and the motherboard likely uses less power also.

the only thing that's not better is my wallet lol.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55794 - Posted: 19 Nov 2020 | 19:50:41 UTC - in response to Message 55792.

Supermicro has very little ATX sized motherboards. Mostly custom footprint solutions for 1U or 2U servers.

Asrock Rack OTOH seems to specialize in common motherboard form factors that fit in standard PC cases very easily.


SM has a ton of ATX and EATX motherboards, but they do have a good amount of proprietary boards too to fit their custom server chassis. the old 11-slot SM board I had was a custom board, that I custom mounted to the mining frame, but I went with it for the fact that it had 11 slots. you can't get an 11-slot board in any normal form factor since it exceeds the normal ATX spec of only 7 slots.

I picked the Asrock Rack board more for the fact that I could get it with all 7 slots and an external PCIe power input. none of the SM boards had that. SM doesnt seem to have external PCIe power input on any of their boards, where that seems to be a common feature on the Asrock boards. Very important if you're going to have a lot of cards pulling power from the MB like this.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 813
Credit: 1,099,829,831
RAC: 2,305,539
Level
Met
Scientific publications
watwatwatwatwat
Message 55799 - Posted: 19 Nov 2020 | 20:32:10 UTC

When I've looked at the Supermicro mobo page and used the filters, it produced very little ATX or EATX form factor boards. I spend quite a bit of time in fact googling and searching on the SM site. Not many results.

But then I know I have terrible insight into keyword searches that produce nothing for me typically.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 813
Credit: 1,099,829,831
RAC: 2,305,539
Level
Met
Scientific publications
watwatwatwatwat
Message 55800 - Posted: 19 Nov 2020 | 20:36:08 UTC

SM doesnt seem to have external PCIe power input on any of their boards, where that seems to be a common feature on the Asrock boards. Very important if you're going to have a lot of cards pulling power from the MB like this.


Why I like the common Asrock brand also which seems to have an external PCIE slot input power connector almost guaranteed.

I think that is very uncommon for consumer boards from the mobo brands.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55822 - Posted: 26 Nov 2020 | 5:08:37 UTC - in response to Message 55775.
Last modified: 26 Nov 2020 | 5:18:36 UTC

I was interested to see how well the Bifurcation performed. So below is a capture of tasks from the system highlighted in this thread: https://www.gpugrid.net/show_host_detail.php?hostid=543446

Tasks are retained by Gpugrid statistics page for 7 days. All GPUs have 7 days of tasks.
All tasks are MDAD. (ADRIA and GERARD tasks have been exhausted for a while)


The results show the performance is excellent!

Boinc No. Ttl Ttl Average Average Device Tasks Runtime Credit Credit Runtime --------------------------------------------------------- 0 318 601069 4867705 699,703 1,890 1 330 602667 4918527 705,134 1,826 2 307 605969 4780381 681,594 1,974 3 305 601662 4702621 675,307 1,973 4 312 606045 4847006 691,007 1,942 5 315 598183 4720060 681,753 1,899 6 292 599571 4559550 657,045 2,053 7 308 598492 4649381 671,198 1,943 ---------------------------------------------------------


The small anomaly with device 6 (I assume is on the riser) is so small it could be attributed to normal Work Unit variation, GPU silicon lottery or the Bifurcation.
The data ably demonstrates that employing Bifurcation works, and works well!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55823 - Posted: 26 Nov 2020 | 5:32:23 UTC - in response to Message 55822.
Last modified: 26 Nov 2020 | 5:34:20 UTC

check the pics. every single GPU is on a riser. and no GPU has less than PCIe 3.0 x8 bandwidth.
three (3) GPUs on a 3.0x16 link
five (5) GPUs on a 3.0x8 link

only 2 GPUs are running on a bifurcated slot. one single PCIe 3.0 x16 slot split into two (2) PCIe 3.0 x8 slots. they are sharing the slot but no bandwidth goes to waste. all other GPUs are single strung to its own slot on the motherboard via riser ribbon cable.

I'll have to look closer at which cards are where, because the way the nvidia driver enumerates the cards isnt the same as the way BOINC enumerates them. especially hard to figure out which is which when they are all identical like this.

it could be down to WU variation, as well as clock speed variation. I have them all overclocked with the same offsets, but some run hotter than others or have poorer silicon quality and hence some ruin at lower clock speeds than others. i's just easier for me to apply the same exact settings to every single card rather than trying to tweak and fine tune each one.

also keep in mind, if you try to compare these cards to other RTX 2070, I have these cards all power limited to 150W, down from the 175W stock.
____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55824 - Posted: 26 Nov 2020 | 5:59:16 UTC - in response to Message 55823.
Last modified: 26 Nov 2020 | 6:38:42 UTC

Yes, I have studies the pics.
I should have been more specific, my mistake. I should have said the "Riser connected to the Bifurcated slot"

To match your GPUs to the Boinc Device, try this:
nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv

This will output your devices attached and their Bus ID.

slot designation can be gleaned from
dmidecode -t slot

The Bus ID and designation can then be matched to what is in you coproc_info.xml file and then will cross reference to the Boinc Device number.

Another tool to help with identifying the PCIe slots is:
lspci | grep VGA

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55826 - Posted: 26 Nov 2020 | 9:59:26 UTC - in response to Message 55824.

Yes, I have studies the pics.
I should have been more specific, my mistake. I should have said the "Riser connected to the Bifurcated slot"

To match your GPUs to the Boinc Device, try this:
nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv

This will output your devices attached and their Bus ID.

slot designation can be gleaned from
dmidecode -t slot

The Bus ID and designation can then be matched to what is in you coproc_info.xml file and then will cross reference to the Boinc Device number.

Another tool to help with identifying the PCIe slots is:
lspci | grep VGA


dmidecode may not work on the EPYC BIOS. (doesn't work too well on my X470 motherboard) lspci | grep VGA probably will work fine.
The first number in the output will be slot number in Hex.(starting at PCIe1 on the motherboard)
Match this to bus ID in coproc_info.xml The Bus ID is in Decimal.
For example, slot 1 on my system is 27 (hex, commonly denoted as 0x27), this corresponds to Bus ID 39 in coproc_info.xml

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55844 - Posted: 29 Nov 2020 | 5:14:12 UTC - in response to Message 55826.

According to the busID and then matching busID to the physical slots by manipulating fan speeds, the 2 GPUs on the bifurcated slot should be BOINC device 3 and 4.
____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55845 - Posted: 29 Nov 2020 | 7:39:40 UTC - in response to Message 55844.

According to the busID and then matching busID to the physical slots by manipulating fan speeds, the 2 GPUs on the bifurcated slot should be BOINC device 3 and 4.

At a guess, Asrock seem to have ordered their PCIe slots priortising the bus order with the x16 slots first then the x8 slots:
PCIe1 - dev 0
PCIe3 - dev 1
PCIe5 - dev 2
PCIe7 - dev 3 & 4
PCIe2 - dev 5
PCIe4 - dev 6
PCIe6 - dev 7

Does this line up with what you have seen?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 55846 - Posted: 29 Nov 2020 | 16:44:41 UTC - in response to Message 55845.
Last modified: 29 Nov 2020 | 16:50:22 UTC

According to the busID and then matching busID to the physical slots by manipulating fan speeds, the 2 GPUs on the bifurcated slot should be BOINC device 3 and 4.

At a guess, Asrock seem to have ordered their PCIe slots priortising the bus order with the x16 slots first then the x8 slots:
PCIe1 - dev 0
PCIe3 - dev 1
PCIe5 - dev 2
PCIe7 - dev 3 & 4
PCIe2 - dev 5
PCIe4 - dev 6
PCIe6 - dev 7

Does this line up with what you have seen?


nope, not at all. they're all over the place.

first, according to the BIOS and silkscreen printed on the PCB, the slots are ordered 1-7 starting from the bottom of the board, so the slot furthest from the CPU is slot 1, and the one closest to the CPU is slot 7.

then when all slots are populated, the slot that drives the monitor (and hence busID 1) is slot 5, the second x16 slot from the CPU slot.

I haven't checked them out in full detail, I only really looked at which cards were on the bifurcated slot yesterday.

---------
CPU
---------
PCIe7 - dev 3&4
PCIe6 - dev 1
PCIe5 - dev 0
PCIe4
PCIe3
PCIe2
PCIe1

I can change the device order if I manually edit the xorg.conf file but there's really no point since all the cards are identical. i don't want to manually edit the card that drives the monitor to be different than what the BIOS uses for ease of use. if I change it, then the OS and the BIOS will be trying to use different monitors and that's just confusing.

my old supermicro boards were pretty similar to this in behavior with bottom-to-top PCB ordering, and random mid-slot being the prime display slot. but on both boards, if you only have 1 GPU on the board, the display will drive from whatever slot it's plugged in to.
____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55850 - Posted: 29 Nov 2020 | 23:52:59 UTC - in response to Message 55846.

nope, not at all. they're all over the place.


Thanks for the update. Interesting that mid slots are picked as the active display on both your server motherboards. Apparently more factors to consider than just a numbering schema.

Skillz
Send message
Joined: 6 Jun 17
Posts: 2
Credit: 335,679,782
RAC: 0
Level
Asp
Scientific publications
wat
Message 57253 - Posted: 22 Aug 2021 | 20:23:26 UTC

Really interested in this setup.

What ribbon risers are you using?

How can you check if your motherboard/BIOS supports Bifurcation? While I could easily just get the same board you have, but I am trying to see how cheap I can build one of these using other platforms such as the Intel x99 platform since those boards can be had for much cheaper and CPUs are much cheaper than anything EPYC as well. I don't care about high core count on the CPU, just as long as it's got at least 8 cores so it can keep the GPUs busy.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 544
Credit: 4,527,186,357
RAC: 6,601,601
Level
Arg
Scientific publications
wat
Message 57255 - Posted: 26 Aug 2021 | 14:26:04 UTC - in response to Message 57253.

Really interested in this setup.

What ribbon risers are you using?

How can you check if your motherboard/BIOS supports Bifurcation? While I could easily just get the same board you have, but I am trying to see how cheap I can build one of these using other platforms such as the Intel x99 platform since those boards can be had for much cheaper and CPUs are much cheaper than anything EPYC as well. I don't care about high core count on the CPU, just as long as it's got at least 8 cores so it can keep the GPUs busy.


if X99, I would shoot for an 8-core/16-thread part. 8-threads would be a little tight trying for 8+ GPUs. I always try to leave 1-2 CPU threads doing nothing to ensure there are no issues with CPU resources (nvidia GPU apps use 1 CPU thread per GPU task).

X99 does support bifurcation on certain boards. but support is inconsistent and spotty. I think Asrock has given the most support for it. You can check if the board supports Bifurcation by going into the BIOS and looking for the Bifurcation setting. if it's not there, then it's not supported. this isnt a feature that is popular enough to be advertised. if you don't have the board, you'll have to reach out to the manufacturer or get confirmation from someone else who has one.

I'm using risers from Amazon under the brand name "EZDIY-FAB", but they don't appear to be as widely available anymore. but any PCIe 3.0 x16 rated riser should work well. just search amazon or ebay for them, there are tons.

____________

Skillz
Send message
Joined: 6 Jun 17
Posts: 2
Credit: 335,679,782
RAC: 0
Level
Asp
Scientific publications
wat
Message 57256 - Posted: 28 Aug 2021 | 16:31:56 UTC

I said 8 cores, not 8 threads. Thanks.

Really wish their was an easier way to determine if the board/bios supported bifurcation or not. Think I may just end up getting the same board you have, but going with the 7251 8-core EPYC CPU to help keep the costs low.

Thanks for replying.

Post to thread

Message boards : Number crunching : 8 GPUs on a motherboard with 7 PCIe slots: Bifurcation