Message boards : Graphics cards (GPUs) : error after switching from gtx570 to gtx280
Author | Message |
---|---|
According to this work unit status, the gtx570 was used for a while then the gtx280. Is that correct? I thought once a task started on a particular GPU it had to finish on that gpu. Device 0 is the 570 and device 1 the 280. | |
ID: 21139 | Rating: 0 | rate: / Reply Quote | |
Yeah, it started on the Fermi, was restarted on the Fermi later on and then tried to run on the GTX280. Looks like it failed when running on the GTX280. | |
ID: 21142 | Rating: 0 | rate: / Reply Quote | |
According to this work unit status, the gtx570 was used for a while then the gtx280. Is that correct? It's correct. I thought once a task started on a particular GPU it had to finish on that gpu. It's not necessary to finish a task on the same GPU. If a host has multiple GPUs, a task can switch between them at a checkpoint, but this could be triggered only by stopping and restarting a task (either manually, or by a system restart, or by a BOINC client restart). Device 0 is the 570 and device 1 the 280. When you put two GPUs in a single PC instead of one, many things change, and there can be unexpected consequences. Twice as much heat is generated by the two GPUs, therefore the whole PC will run at higher temperatures, and this can make it less stable. Also, the power supply has to be powerful enough for two GPUs (in your case: 800W or more) Since the report states "using device 0" and later "using device 1" I assume it switched hardware after a checkpoint or whatever. There were 3 such events for this task: on the first and the second time it was running on device 0, and on the third time it was switched over to device 1. After some kind of restart (as I mentioned above) device 0 was used by another GPU task (even from another project), which could have triggered this change . Also, something does not seem correct. The gtx570 is listed as haveing only 1/2 the number of multiprocessors and cores as the gtx280. That is totally wrong. The number of multiprocessors is correct, but the number of cores is incorrect for Fermi based cards. It's a known reporting bug of the BOINC client. It assumes that a Streaming Multiprocessor has 8 CUDA cores, which is true for G80 series and G200 series GPU based cards, but this architecture was changed in the Fermi GPUs to 32 CUDA cores per SM, and the number of SMs were reduced to 16 (GTX580) to 15 (GTX570/480) to 14 (GTX470). The BOINC client is not aware of this change, and still reports the number of CUDA cores as 8 times the number of multiprocessors, which is incorrect for the Fermi based cards. Irrespectively of the incorrectly reported number of CUDA cores, all of them are used by the GPUGRID client. | |
ID: 21143 | Rating: 0 | rate: / Reply Quote | |
The number of multiprocessors is correct, but the number of cores is incorrect for Fermi based cards. It's a known reporting bug of the BOINC client. It assumes that a Streaming Multiprocessor has 8 CUDA cores, which is true for G80 series and G200 series GPU based cards, but this architecture was changed in the Fermi GPUs to 32 CUDA cores per SM, and the number of SMs were reduced to 16 (GTX580) to 15 (GTX570/480) to 14 (GTX470). The BOINC client is not aware of this change, and still reports the number of CUDA cores as 8 times the number of multiprocessors, which is incorrect for the Fermi based cards. Irrespectively of the incorrectly reported number of CUDA cores, all of them are used by the GPUGRID client. We managed to get that one fixed with changeset [21034] a year ago, so the reporting should be correct with clients v6.10.58 and v6.10.60, which have been (BOINC's) recommended versions for a long time now. What are still mis-reported are the later compute capability 2.1 cards with 48 cores per multiprocessor, but I think the major blame lies with nVidia for failing to include an API call to enable applications such as BOINC to determine the appropriate value at runtime. | |
ID: 21144 | Rating: 0 | rate: / Reply Quote | |
BeemerBiker is using BOINC client version 6.12.22 on that system. The card is CC2.0 and yet the cuda core count is still misreported at GPUGrid. Similarly I have Boinc 6.10.58 and a CC2.0 card (GTX470) and here (GPUGrid) the cuda core count is reported as 112 rather than 448, Number of cores: 112 - system. So GPUGrid is not reporting the number of cuda cores correctly; the reported value is still being multiplied by 8. | |
ID: 21147 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : error after switching from gtx570 to gtx280