Advanced search

Message boards : Graphics cards (GPUs) : error after switching from gtx570 to gtx280

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21139 - Posted: 2 May 2011 | 19:51:12 UTC
Last modified: 2 May 2011 | 20:01:22 UTC

According to this work unit status, the gtx570 was used for a while then the gtx280. Is that correct? I thought once a task started on a particular GPU it had to finish on that gpu. Device 0 is the 570 and device 1 the 280.

This is a new card that I recently upgraded to and the existing gtx280 has been running fine for a year.

Since the report states "using device 0" and later "using device 1" I assume it switched hardware after a checkpoint or whatever.

[EDIT]

Also, something does not seem correct. The gtx570 is listed as haveing only 1/2 the number of multiprocessors and cores as the gtx280. That is totally wrong.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21142 - Posted: 2 May 2011 | 21:52:33 UTC - in response to Message 21139.

Yeah, it started on the Fermi, was restarted on the Fermi later on and then tried to run on the GTX280. Looks like it failed when running on the GTX280.

Sometimes they complete fine if they move GPU, say after a restart. If you suspend, start gaming and restart, it's chancy.

Just a reporting error. The cuda core number is read from the driver, but is being misinterpreted due to the use of an old formula.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 573
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21143 - Posted: 2 May 2011 | 22:08:19 UTC - in response to Message 21139.

According to this work unit status, the gtx570 was used for a while then the gtx280. Is that correct?

It's correct.

I thought once a task started on a particular GPU it had to finish on that gpu.

It's not necessary to finish a task on the same GPU. If a host has multiple GPUs, a task can switch between them at a checkpoint, but this could be triggered only by stopping and restarting a task (either manually, or by a system restart, or by a BOINC client restart).

Device 0 is the 570 and device 1 the 280.

This is a new card that I recently upgraded to and the existing gtx280 has been running fine for a year.

When you put two GPUs in a single PC instead of one, many things change, and there can be unexpected consequences. Twice as much heat is generated by the two GPUs, therefore the whole PC will run at higher temperatures, and this can make it less stable. Also, the power supply has to be powerful enough for two GPUs (in your case: 800W or more)

Since the report states "using device 0" and later "using device 1" I assume it switched hardware after a checkpoint or whatever.

There were 3 such events for this task: on the first and the second time it was running on device 0, and on the third time it was switched over to device 1. After some kind of restart (as I mentioned above) device 0 was used by another GPU task (even from another project), which could have triggered this change .

Also, something does not seem correct. The gtx570 is listed as haveing only 1/2 the number of multiprocessors and cores as the gtx280. That is totally wrong.

The number of multiprocessors is correct, but the number of cores is incorrect for Fermi based cards. It's a known reporting bug of the BOINC client. It assumes that a Streaming Multiprocessor has 8 CUDA cores, which is true for G80 series and G200 series GPU based cards, but this architecture was changed in the Fermi GPUs to 32 CUDA cores per SM, and the number of SMs were reduced to 16 (GTX580) to 15 (GTX570/480) to 14 (GTX470). The BOINC client is not aware of this change, and still reports the number of CUDA cores as 8 times the number of multiprocessors, which is incorrect for the Fermi based cards. Irrespectively of the incorrectly reported number of CUDA cores, all of them are used by the GPUGRID client.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,846,961,851
RAC: 10,098,213
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21144 - Posted: 2 May 2011 | 22:51:59 UTC - in response to Message 21143.

The number of multiprocessors is correct, but the number of cores is incorrect for Fermi based cards. It's a known reporting bug of the BOINC client. It assumes that a Streaming Multiprocessor has 8 CUDA cores, which is true for G80 series and G200 series GPU based cards, but this architecture was changed in the Fermi GPUs to 32 CUDA cores per SM, and the number of SMs were reduced to 16 (GTX580) to 15 (GTX570/480) to 14 (GTX470). The BOINC client is not aware of this change, and still reports the number of CUDA cores as 8 times the number of multiprocessors, which is incorrect for the Fermi based cards. Irrespectively of the incorrectly reported number of CUDA cores, all of them are used by the GPUGRID client.

We managed to get that one fixed with changeset [21034] a year ago, so the reporting should be correct with clients v6.10.58 and v6.10.60, which have been (BOINC's) recommended versions for a long time now.

What are still mis-reported are the later compute capability 2.1 cards with 48 cores per multiprocessor, but I think the major blame lies with nVidia for failing to include an API call to enable applications such as BOINC to determine the appropriate value at runtime.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21147 - Posted: 3 May 2011 | 13:23:07 UTC - in response to Message 21144.

BeemerBiker is using BOINC client version 6.12.22 on that system. The card is CC2.0 and yet the cuda core count is still misreported at GPUGrid. Similarly I have Boinc 6.10.58 and a CC2.0 card (GTX470) and here (GPUGrid) the cuda core count is reported as 112 rather than 448, Number of cores: 112 - system. So GPUGrid is not reporting the number of cuda cores correctly; the reported value is still being multiplied by 8.

On the other hand Boinc reports the GFlops peak correctly,
29/04/2011 20:35:02 NVIDIA GPU 0: GeForce GTX 470 (driver version 27061, CUDA version 4000, compute capability 2.0, 1280MB, 1089 GFLOPS peak)

In fact Boinc is also reporting the number of usable shaders at GPUGrid correctly for CC2.1 cards, because it is multiplying by 32 rather than 48; at present only 2/3 of the shaders are usable on CC2.1 cards at GPUGrid (32/48=2/3).

The reporting of 112 shaders at GPUGrid (using X8 and not the correct X32) is down to the GPUGrid apps. Other projects report the correct number of Fermi shaders.

Of course GPUGrid still uses all the shaders on CC2.0 cards (448 in my case, and not 112).

Post to thread

Message boards : Graphics cards (GPUs) : error after switching from gtx570 to gtx280

//