Advanced search

Message boards : Graphics cards (GPUs) : hi everyone

Author Message
MatthiasLeimbach
Send message
Joined: 18 Mar 09
Posts: 7
Credit: 1,425,418,423
RAC: 919,345
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 59603 - Posted: 29 Nov 2022 | 17:00:30 UTC

why is using 2 nvidia gtx1080 cards a problem ? i only compute 1 wu even when 2 wu are send, python is using more than 30% capacity of the 3900X processor

if i leave 2 wu to work 1 of them " hangs " at 4%

cc-cofig is set to

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>


____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1289
Credit: 5,177,581,959
RAC: 10,381,396
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59604 - Posted: 29 Nov 2022 | 17:54:12 UTC - in response to Message 59603.

It's because these tasks are primarily cpu tasks, with small infrequent bursts of gpu activity.

The reason your tasks fail is because you are using Windows which has limitations.

Your tasks fail with this error message.

DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes.

You need to increase your paging file to around 60GB and you should be able to process two tasks concurrently.

They will use almost all of your cpu.

Please read through the main thread for these tasks for the reason why.

https://www.gpugrid.net/forum_thread.php?id=5233

MatthiasLeimbach
Send message
Joined: 18 Mar 09
Posts: 7
Credit: 1,425,418,423
RAC: 919,345
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 59605 - Posted: 30 Nov 2022 | 12:04:48 UTC - in response to Message 59604.

hi Keith

thx for your reply > i changed the page file in w11 from automatic to 60000 and

at first computing of 2 gpu's went fine, both crashed after 4% progress

32 Gb ram is available

i noticed that windows does not allocate the 60000 MB after instruction to do so

the allocation seems to variate

now and before i run 1 task and stop the second before 4% when it wants to start

all the tasks perform succesful running 1 task

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1289
Credit: 5,177,581,959
RAC: 10,381,396
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59606 - Posted: 30 Nov 2022 | 18:18:07 UTC
Last modified: 30 Nov 2022 | 19:12:31 UTC

The 3900X host is still erroring out with not enough memory in the stderr.txt outputs.

You need to bump the pagefile up some more. Try 100000MB. I'm assuming your storage space actually has that much free space for that size of file.

I don't know much about Windows but maybe you need to restart Windows for the paging file change to be in effect.

MatthiasLeimbach
Send message
Joined: 18 Mar 09
Posts: 7
Credit: 1,425,418,423
RAC: 919,345
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 59608 - Posted: 2 Dec 2022 | 11:55:29 UTC - in response to Message 59606.


Keith

i have now 2 pagefiles of 100000 MB and watched in taskmanager 2 WU start, both will run python with about 20% processor capacity until 1 WU disappears and it stops at 4% progress

stderr.txt outputs is not found in W11

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1289
Credit: 5,177,581,959
RAC: 10,381,396
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59609 - Posted: 2 Dec 2022 | 19:55:02 UTC - in response to Message 59608.


Keith

i have now 2 pagefiles of 100000 MB and watched in taskmanager 2 WU start, both will run python with about 20% processor capacity until 1 WU disappears and it stops at 4% progress

stderr.txt outputs is not found in W11


The stderr.txt output is the result file listed on every returned task on the website. You can examine every task in your browser here.

Just click on the task detail number in the left-most column.

For example your latest errored task:

https://www.gpugrid.net/result.php?resultid=33155788

This looks like a bad task however and failed first because it couldn't get all its requireed file resources. But then it failed later as usual because of not enough virtual memory.

Error loading "C:\ProgramData\BOINC\slots\39\lib\site-packages\torch\lib\shm.dll" or one of its dependencies

DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes.

Maybe some Windows user can help further. I am out of suggestions. When I have helped other Windows users by explaining why these task are troublesome for the Windows OS and offered the same suggestion to increase the pagefile size, the user has become successful.

I suggest returning to the main thread I linked and read through it or other Windows users posts and maybe glean some other pertinent information.

jjch
Send message
Joined: 10 Nov 13
Posts: 98
Credit: 15,309,525,388
RAC: 1,651,662
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59610 - Posted: 4 Dec 2022 | 2:08:46 UTC
Last modified: 4 Dec 2022 | 2:11:57 UTC

Matthias,

First thing, simplify your troubleshooting. Only configure one Python task to run. After you get that working successfully then try adding the 2nd one. Go back to just one and see how that works. You can monitor things for one and see what the sizing looks like. If you are running other projects along with GPUgrid you should stop those and get them cleared off.

Second, you probably don't need 2 page files. That could actually be complicating things. Setup one page file on your primary OS disk. Select Custom size and set the Initial and Max size. For example with one Python task running mine is set 24576 and 51200. You can also see how much is currently allocated and that will be helpful to find out where your resources are limited. Mine currently says 48535 MB with one running. Remove the 2nd page file too.

The stderr.txt files are located in the slots directory wherever your BOINC Program Data folder is. You need to find the slot folder for the GPUGRID task by viewing the Properties. Once you open that you will find the file. Take a look at that when a job is running and see what it says. When a job is running correctly it should say "Created Learner" and it will stay there for several hours until it finishes or fails.

BOINC can also be a little touchy when it comes to how much disk space and memory it is allocating. This could actually be related to your problem. The default settings don't always work right for what you need. First look at the Disk tab and see what the Total disk usage looks like. Pay attention to the free, available to BOINC size. Then look at what GPUgrid is using. If you are running other projects you will need to compare the total size to what is available and make sure it is enough for everything.

You can make changes to these settings under Options > Computing preferences > Disk and memory tab. For the Disk section look and see if it is giving you enough disk space. You might only need 3-4 GB more so make an adjustment there as needed. You can lock it to a fixed size if you would like to do that too. Also, under the Memory section the "When computer is (not) in use ..." might need to be increased a bit. Make sure the Page/swap file setting is 100.00%

Final thoughts. I don't know how successful Win11 is for GPUgrid yet. There possibly could be other issues there. Recommend that you tune it up the best you can as well. Check for Windows updates, update GPU drivers, clean disk space etc. Don't run a lot of other programs at the same time you are running GPUgrid either. There could be a conflict of resources there too. GLHF

gemini8
Send message
Joined: 3 Jul 16
Posts: 31
Credit: 1,306,000,176
RAC: 4,295,117
Level
Met
Scientific publications
watwat
Message 59611 - Posted: 5 Dec 2022 | 10:04:52 UTC - in response to Message 59610.

[...]
Select Custom size and set the Initial and Max size. For example with one Python task running mine is set 24576 and 51200.
[...]
Make sure the Page/swap file setting is 100.00%

I'd go for a fixed size page file.
Just set it to 51200 or whatever on inital AND max size. Thus, the space is always reserved, and adding more space fast enough can't become a problem.

My page file setting is 1% on all my machines. This is including Debian, Mac OS, Ubuntu and Win7 crunchers. No problems with that so far.

I don't know too much about fragmentation on recent Windows machines. My Macs defragmentage themselves quite nicely, so Windows might be able to do so as well nowadays, and the next thing might not be necessary anymore: If the page file is on a rotating disk, try to disable it, start anew, defragment you drive, then enable the page file to the size you want.
____________
Greetings, Jens

MatthiasLeimbach
Send message
Joined: 18 Mar 09
Posts: 7
Credit: 1,425,418,423
RAC: 919,345
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 59612 - Posted: 5 Dec 2022 | 18:41:06 UTC


hi Keith and jjch and germini8

the funny mysterious is, it works today for both (2) WU's, no crash

pagingfile is now 81845 MB allocated by W11 ( i fixed the size but windows ignore's )

jjch . GLHF is funny > i looked it up : good luck have fun, thanks for your sharing

i have fun running boinc for years now and i need to keep up buying new (faster and more core's) hardware

we could/ should ask some people to return to contributing to gpugrid, don't know why they stopped ( python ? )

thanks again all ~ Matthias-Poortvliet-Netherlands



Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1289
Credit: 5,177,581,959
RAC: 10,381,396
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59613 - Posted: 6 Dec 2022 | 3:33:49 UTC

I just became a pioneer with arrows in my back.

Just upgraded one host to a new AM5 platform with a 7950X cpu and DDR5-6000 memory.

Lots of stuff to figure out now. Like absolutely no sensors are available in Linux except ffor the gpus and NVME stick temps. No fan speeds, no temps, no voltages are available.

Too new a platform for Ubuntu 22.04.1 LTS.

MatthiasLeimbach
Send message
Joined: 18 Mar 09
Posts: 7
Credit: 1,425,418,423
RAC: 919,345
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 59614 - Posted: 6 Dec 2022 | 12:08:00 UTC - in response to Message 59613.


my planned upgrade within a few weeks is on AM4 5950X

i'm collecting second hand hardware when i can

and Keith ~ arrows > you 're not dead yet

i think AM5 is a bit over the top > energy usage / efficiency

yet i wish you GLHF


(:-)
[/img]

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1289
Credit: 5,177,581,959
RAC: 10,381,396
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59615 - Posted: 6 Dec 2022 | 18:28:19 UTC - in response to Message 59614.

So far no difference in energy usage or temps.

Benefit of being able to run my PCIE Gen.4 cards at Gen.4 speeds now.

Benefit of having Gen. 4 M.2 speeds with a Gen. 4 device for storage now.

Benefit of running cpu tasks at 800-1000Mhz faster than previously on the 5950X.

Some projects cpu tasks scale linearly just with clock speeds.

Haven't run any projects that can make use of AVX-512 SIMD instructions yet.

Post to thread

Message boards : Graphics cards (GPUs) : hi everyone

//