Advanced search

Message boards : Wish list : Correct reporting of coprocessors (GPUs)

Author Message
Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,834,518,624
RAC: 322,566
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32988 - Posted: 16 Sep 2013 | 10:01:36 UTC
Last modified: 16 Sep 2013 | 10:03:25 UTC

When you have more than one GPU in a system but of mixed version (from same company ATI/NVidia) the computer information details are inaccurate. Only one GPU is listed and the second or third cards are only indicated by a number [2].

For example:


In the above case the system has a GTX670 and a GTX650TiBoost, but you cannot tell that there is a 670 in the system.

In another example a system displays,
Coprocessors [2] NVIDIA GeForce GTX 660 (2048MB) driver: 326.41, AMD ATI Radeon HD 5800 series (Cypress) (1024MB) driver: 1.4.1848

It actually has a GTX660Ti and a GTX660, not two GTX660's. Also, there is no mention of the iHD4000 (and its driver which would indicate if its usable).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32990 - Posted: 16 Sep 2013 | 10:37:37 UTC - in response to Message 32988.


When you have more than one GPU in a system but of mixed version (from same company ATI/NVidia) the computer information details are inaccurate. Only one GPU is listed and the second or third cards are only indicated by a number [2].


Yeah, that's pretty annoying. I've not yet turned my attention to website side of things. It's on the list though..

Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,469,215,105
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32992 - Posted: 16 Sep 2013 | 11:01:16 UTC

This "issue" however is also present at other projects, so it seems to me more a BOINC problem than on project level.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,834,518,624
RAC: 322,566
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32996 - Posted: 16 Sep 2013 | 13:15:56 UTC - in response to Message 32992.
Last modified: 16 Sep 2013 | 13:22:04 UTC

Yeah, ideally this would be done by Berkeley, but Matt's GPUGrid app reads the GPU's, knows which GPU a WU is using and now reports this correctly in the stderr output file (on Windows), so in theory it could GPU info to the Computer Information Details under Coprocessor (rather than using what Boinc reports). As long as we have the information we might as well use it to correct what isn't read/reported accurately. Maybe some of the methods and code could be passed to Boinc central for Boinc incorporation, at some stage?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,496,021,954
RAC: 394,105
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32997 - Posted: 16 Sep 2013 | 13:50:00 UTC - in response to Message 32996.

Matt's GPUGrid app reads the GPU's, knows which GPU a WU is using and now reports this correctly in the stderr output file

Then hopefully we can get a fix for sending WUs to GPUs with too little memory to handle them, currently Noelia WUs which need a GPU with 1GB of memory whhile Santi and Nathan WUs run great on 768mb GPUs. It gets old aborting stuck Noelias every day (which also slow other [non-nvidia] processes to a crawl).

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33005 - Posted: 16 Sep 2013 | 16:45:19 UTC - in response to Message 32992.

Yes - if you can persuade DA to fix it, that'd be great.

MJH

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 788
Credit: 1,420,906,745
RAC: 1,418,697
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33006 - Posted: 16 Sep 2013 | 17:59:25 UTC - in response to Message 33005.

Errors, omissions and bugs excepted, BOINC already reports considerable details of each GPU in a system - have a look at the internals of a sched_request.xml file sometime (the most recent one for each attached project is kept in the root of the BOINC Data directory until overwritten).

If the 6GB of the Titan is still not being detected properly, that might be a 32-bit glitch in the CUDA runtime support in the consumer-grade drivers. We had problems in some cases with Keplers too: BOINC can't afford to purchase samples of every high-end card, and Rom Walton solved that one by remoting in to my GTX 670 (a problem with the high word of a 64-bit variable passed 'by reference' not being initialised in the way Rom expected, IIRC). If you have test Titans in the lab that could be made available to Rom via remote access, that might help fix that bug.

The issue of multiple GPUs in a single host being shown as multiple instances of the best GPU is cosmetic: although full details are passed back, the BOINC database hasn't been adapted to store, and more importantly display, such fine-grained detail.

But I do think there's a fundamental design problem which is going to cause problems here. When a task is allocated to a volunteer's computer, the capability checking is done on the server: questions like "does the host have a >1 GB CUDA card", "does the host have a double-precision ATI card" can be addressed. But - under the present design - those task requirements are not passed back to the host and re-tested locally when the task is scheduled.

Thus, if a particular host has both a 1 GB card and a 768 MB card, there's no way of telling it (after allocation) not to run a 1 GB task on the 768 MB card. That whole problem of heterogeneous resources within a single host is going to require a major re-write, I think.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,496,021,954
RAC: 394,105
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33013 - Posted: 16 Sep 2013 | 20:37:14 UTC - in response to Message 33006.

When a task is allocated to a volunteer's computer, the capability checking is done on the server: questions like "does the host have a >1 GB CUDA card", "does the host have a double-precision ATI card" can be addressed. But - under the present design - those task requirements are not passed back to the host and re-tested locally when the task is scheduled.

Thus, if a particular host has both a 1 GB card and a 768 MB card, there's no way of telling it (after allocation) not to run a 1 GB task on the 768 MB card.

However for hosts with one NVidia GPU or multiple GPUs with similar amounts of memory it is currently possible to allocate appropriate sized tasks. Is that correct?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 788
Credit: 1,420,906,745
RAC: 1,418,697
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33015 - Posted: 16 Sep 2013 | 20:50:50 UTC - in response to Message 33013.

When a task is allocated to a volunteer's computer, the capability checking is done on the server: questions like "does the host have a >1 GB CUDA card", "does the host have a double-precision ATI card" can be addressed. But - under the present design - those task requirements are not passed back to the host and re-tested locally when the task is scheduled.

Thus, if a particular host has both a 1 GB card and a 768 MB card, there's no way of telling it (after allocation) not to run a 1 GB task on the 768 MB card.

However for hosts with one NVidia GPU or multiple GPUs with similar amounts of memory it is currently possible to allocate appropriate sized tasks. Is that correct?

I'm pretty confident it is. If every NVidia card in a system meets, for example, a cuda55_himem plan class requiring >= 1 GB, server and client should work in harmony.

But with mixed card memory sizes, the server would still allocate tasks, and there's nothing the project can do to make them pick the right card. Individual users can set up detailled cc_config options (they'd have to set <use _all_gpus> anyway, so that might be possible). But I fear that some users might not, or might get it wrong. It's not a 'set and forget' scenario, as far as the project is concerned.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,834,518,624
RAC: 322,566
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33063 - Posted: 18 Sep 2013 | 17:34:49 UTC - in response to Message 33006.
Last modified: 18 Sep 2013 | 17:59:41 UTC

The issue of multiple GPUs in a single host being shown as multiple instances of the best GPU is cosmetic

Actually, it doesn't report multiple instances of the best GPU

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Wish list : Correct reporting of coprocessors (GPUs)