Advanced search

Message boards : Graphics cards (GPUs) : Linux - 2 GPUS - 2nde one slow

Author Message
Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24452 - Posted: 17 Apr 2012 | 11:41:19 UTC

Hi there,

I've added a second GTX570 to my machine but for an unknown reason, this second GPU is very slow compare to the other one.

For example, a I13R7-NATHAN_CB1 will take 8h35 on the first GPU but 12h31 on the [ur=http://www.gpugrid.net/result.php?resultid=5251012l]second GPU[/url]...

BOINC recognizes well the 2 GPUs and I have a cc_config.xml that includes:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>


along with SWAN_SYNC = 0 variable (Working, I do have 2 full cores occupied for the GPUs)

I'm running Debian stable with Nvidia 285.05.09 and BOINC 6.12.34


And I have no idea why the second GPU takes longer than the first one....
(Temps are much lower as well on the second card)

Any idea where to look for?
____________
MyUneo, the Cupid of Services

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 24456 - Posted: 17 Apr 2012 | 18:41:58 UTC
Last modified: 17 Apr 2012 | 18:58:16 UTC

(Temps are much lower as well on the second card)

Have you got SLI enabled? If so disable it. If it is enabled, it will not account for a 50% difference, but it will not help either. If it was enabled, dis it, and see how it goes then.

Whats the temperature difference between the two?

Another thing you could do to aid in pointing at the actual issue, as long as this doesnt cause undue hassle in terms of hardware swopping, is to swop the cards in the slots, and see if the (now) "second" card (ie the original slot zero card) behaves the same way with a reduced output.

EDIT: If it does behave the same, unload and reload the driver (keeping SLI disabled)

Regards
Zy

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24466 - Posted: 18 Apr 2012 | 0:50:44 UTC - in response to Message 24456.

Hi Zydor.

- No SLI enabled (And I didn't find any option in the BIOS)
- 25 degrees differences between the 2 cards

and I've updated my Nvidia drivers to 295 version.

No difference.

I will try to swap the card, but I guess it will be the same thing....the second one will be slower....

Any other idea?
____________
MyUneo, the Cupid of Services

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24469 - Posted: 18 Apr 2012 | 3:09:58 UTC

If you could post your FULL system specs, it would help out immensely(think I spelled that wrong). mobo, psu, cpu, everything.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24472 - Posted: 18 Apr 2012 | 7:05:26 UTC

Second card is cooler cos it has less to do it seems. Whats gpuz says?
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 24477 - Posted: 18 Apr 2012 | 10:39:18 UTC
Last modified: 18 Apr 2012 | 10:39:54 UTC

I will try to swap the card, but I guess it will be the same thing....the second one will be slower....

Any other idea?

The point of the card swop is to verify the hardware element. If as is likely its the same, the driver is next on the list of suspects. Could always go straight for the driver - hobsons choice, but that order of activity kills many birds with one stone.

The driver is often the culprit in the end, bad install, corrupt yadie yadda. But until the process is gone through carefully, never can be sure. The addition of hardware often causes dramas, and usually gets back to how the driver reacted or was installed after the new hardware was added.
Not a silver bullet, but often happens that way.

Regards
Zy

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24479 - Posted: 18 Apr 2012 | 14:09:40 UTC - in response to Message 24477.

No difference when swapping the cards.

And as I said an update of the driver didn't change anything.
Could it came from Xorg settings or so?

Actually...it has always been an issue... When I had 2 GTS250 into another computer, the second one was slower....funny, after 2 years, still the same issue ....

I could try an update GNU/Linux version (Like Sabayon/Fedora/Ubunutu) but doesn't seems the good way to me.
____________
MyUneo, the Cupid of Services

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24481 - Posted: 18 Apr 2012 | 21:47:02 UTC - in response to Message 24479.
Last modified: 18 Apr 2012 | 21:51:01 UTC

The second card is probably downclocking.
Have you tried configuring Nvidia X Server, PowerMizer, Prefer Maximum Performance?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24486 - Posted: 19 Apr 2012 | 14:31:03 UTC - in response to Message 24481.

Both cards are on Maximum performance...
____________
MyUneo, the Cupid of Services

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24488 - Posted: 19 Apr 2012 | 21:19:37 UTC - in response to Message 24486.
Last modified: 19 Apr 2012 | 21:20:13 UTC

I can see you have an i7-2600K, but I don't know what motherboard it is.
This might be a PCIE issue; PCIEx16 for GPU0 + PCIEx4 for GPU1, for example.

When you swapped the cards, which was the slowest after swapping - the same card or the other one (same slot)?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24489 - Posted: 19 Apr 2012 | 21:27:41 UTC
Last modified: 19 Apr 2012 | 22:02:33 UTC

Same reason why i asked for specs.

EDIT: Well scratch that actually, although I first though PCI, which it still could be, it wouldn't be in the sense that since i7 2600k can support 16 lanes, meaning it couldn't be x16+x4 since that would be 20 lanes. For the numbers to work out the way they do, it would have to be like x12+x4, which isn't possible if I'm not mistaken.

EDIT2: Since it SHOULD be x8,x8. Do you have any other PCIE devices plugged in? Such as networking cards, sound cards, etc.?

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24491 - Posted: 20 Apr 2012 | 11:20:01 UTC - in response to Message 24489.

My MB is a Gigabyte GA-PH67A-D3-B3 and I don't have any others PCI slot used.

Thanks for your help !
____________
MyUneo, the Cupid of Services

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24492 - Posted: 20 Apr 2012 | 14:06:08 UTC

I believe it is a PCIE slot issue based on your mobo specs:

1.1 x PCI Express x16 slot, running at x16 (PCIEX16)
* For optimum performance, if only one PCI Express graphics card is to be installed, be sure to install it in the PCIEX16 slot.
2.1 x PCI Express x16 slot, running at x4 (PCIEX4)
* When the PCIEX1_2 or PCIEX1_3 slot is populated with an expansion card, the PCIEX4 slot will operate at up to x1 mode.


What this means to me, and I am personally just learning about lanes and PCIE issues with CPU and motherboards, BUT from what I can tell, your first slot has the capability to run at x16, and your second one at x4.

However, since the second slot cannot run at x8, this prevents your GPUs from being even at x8 and x8 respectively. Since 1155 CPU can only operate x16 lanes at a time, you are most likely running at x8 and x4. Since your second slot cannot go up to the x8 that your CPU can support.

This would be my best guess. As stated above, since 2 are in, i WOULD ASSUME that your CPU is trying to go x8+x8, but your mobo does not allow it, so it defaults to its own settings of x8+x4.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24515 - Posted: 21 Apr 2012 | 18:22:52 UTC - in response to Message 24492.

Dudumomo, In your original post the CPU and Run time for GPU1 was a bit longer, compared to the GPU0 task you linked to (but not 50%):
http://www.gpugrid.net/result.php?resultid=5250806 Run time 15,131.64
http://www.gpugrid.net/result.php?resultid=5251012 Run time 13,684.09
11% difference, with runtime ~4h (not 8.5h vs 12.5h)!

These were NATHAN_CB1 tasks. While your second GPU is 6% slower than the first* that would not quite account for such a performance disparity.
Recent IBUCH_xTRYP tasks seem to complete within 6% of each other on either GPU (~3%), but recent NATHAN_FAX4 tasks seem to vary by ~18% between your cards.
PAOLA_HGA tasks were only about 4% out between the cards.

Presuming you haven't changed anything(?), these differences (and lack of) may reflect the unequal requirements of the different task types on your system's architecture. The task type performance variation could be down to PCIE speed differences, with some tasks being more susceptible to PCIE shortcomings than others. It's likely that the first slot runs at X16 and the second at X4, but I guess it's possible that the PCIE bandwidth could vary due to the way the system (Linux) handles different tasks. What one GPU is doing (especially GPU0) could therefore impact on the performance of the other.

*The second GPU is 6% slower:
# Device 0: "GeForce GTX 570"
# Clock rate: 1.57 GHz
# Total amount of global memory: 1341325312 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Device 1: "GeForce GTX 570"
# Clock rate: 1.48 GHz
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24517 - Posted: 21 Apr 2012 | 18:45:58 UTC

Skgiven: What am I missing here in regards to the PCIE lanes. If 1155 chips can only operate 16 lanes, than how could it be possible for his cards to operate at x16 + x4. This does not make sense to me. It would be impossible(?) to run at x16+x4 on an 1155. Right?

On the other note. Yup, different clock speeds and tasks will definately(probably) account for that discrepency. lol, guess we should have looked at that first!! ;)

Again, what am I not understanding about PCIE lanes?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24520 - Posted: 21 Apr 2012 | 23:30:11 UTC - in response to Message 24517.
Last modified: 21 Apr 2012 | 23:42:35 UTC

Off the top of my head its 32 or 40 lanes for LGA1155, but I would need to check.
I might be confusing PCIE3 implementations with PCIE2.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24522 - Posted: 21 Apr 2012 | 23:51:27 UTC
Last modified: 21 Apr 2012 | 23:52:32 UTC

http://www.enthusiastpc.net/articles/00003/2.aspx. From the way I read this it's 16 lanes unless the motherboard has a special chipset. Is this correct? I really think the sb and ib only support 16 natively

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24549 - Posted: 23 Apr 2012 | 0:39:40 UTC - in response to Message 24522.

Ah...
In fact I think I've been confuse with the :

# Approximate elapsed time for entire WU: 13691.716 s
08:35:49 (12888): called boinc_finish

08:35:49 is not the runtime...

I see now why in term of point I don't see much difference.

So all goods in fact, I might loose 5% from somewhere, but no harm.
____________
MyUneo, the Cupid of Services

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24550 - Posted: 23 Apr 2012 | 1:53:56 UTC - in response to Message 24522.
Last modified: 23 Apr 2012 | 13:39:18 UTC

I thought LGA1155 had 32 PCIE lanes altogether and 2011 has 40 lanes in total, and that PCIE supports SATA, USB, Ethernet (host bridge)... as well as PCIEx16/8 slots, but it looks like it's 16lanes for SB. As SB took the Northbridge away, and brought PCIE lane control onto the CPU, control and availability of lanes is very limited for GPU slots.
Manufacturers can implement their own setups, and you will find many variations of similar motherboards, primarily based on what these channels support. However Intel has taken it upon itself to limit and control this freedom.
AMD systems are all limited to PCIE2.x and all use a chipset to control PCIE.
Quite often a primary PCIE supports up to two slots and a secondary PCIE supports one slot. Typically this is 16X or 8X+8X for the primary and 8X or 4X for the secondary.
As you may know, there are a few 1155 motherboards that 'sort of' support two PCIE2 X16 slots (NF200), and my MSI board supports PCIE3 (sort of), so there is huge chipset variation. Just how effective these bespoke implementations are is very much open to debate. My take is that Intel has severely hampered progress in an attempt to assert further control over chipset manufacturers, if not destroy them. My MSI board can support one PCIE3 slot at X16 or two at X8 (two PCIE3 at X8 is the same as two PCIE2 at X16, in theory), but it appears the CPU cannot! I get the impression that motherboard manufacturers are saying, this is what you could have had if you had just been reasonable. I would need to replace an i7-2600K with an IB processor to get PCIE3.
For upgrading it's important to note that if you get a PCIE3 motherboard and have two GPU's (one PCIE2.x capable and the other PCIE3 capable) the system can only operate at PCIE2.x (not PCIE3) - it's all or nothing.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile [AF>Libristes] Dudumomo
Send message
Joined: 30 Jan 09
Posts: 45
Credit: 425,620,748
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24557 - Posted: 23 Apr 2012 | 14:36:24 UTC - in response to Message 24550.

Thanks skgiven for this extended explanation!
____________
MyUneo, the Cupid of Services

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24558 - Posted: 23 Apr 2012 | 15:18:28 UTC

Just to further illustrate Intel's dominance. PCIE 3.0 has been recently disabled on the 680 drivers for the SB-E x79 platform. (besides the one that came with the device, like 300.86) This is because when the x79 came out and SB-E chips came out, there were no devices to test them out on. If you look at the 2011 CPUs, they will say PCIE 2.0, with the ability to have 8GT/s. Since they could not validate the PCIE 3 compatability. So NVIDIA disabled PCIE 3 on their latest drivers for the 680, since Intel says nothing is "Validated".

Now, NVIDIA is having to certify motherboards that work with PCIE 3.0, even though they worked when they first came out. People have done registry edits, and then showed using GPUz that PCIE 3.0 is indeed working.

It's all because of Intel saying that everything needs to be certified and validated.

Of course though their motherboards (x79) show the ability to have PCIE 3.0, yet their chips say 8GT/s. All "stupid" semantics as far as I'm concerned.

AMD 7000 series are of course showing PCIE 3 on their GPUs on x79, but this is because they probably don't have to listen, nor care about Intel as much IMHO.

Post to thread

Message boards : Graphics cards (GPUs) : Linux - 2 GPUS - 2nde one slow

//