Posts by Retvari Zoltan

1) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61154)
Posted 75 days ago by

I've disabled getting new GPUGrid tasks GPUGrid on my host with "small" amount (below 24GB) of GPU memory.
This gigantic memory requirement is ridiculous in my opinion.
This is not a user error, if the workunits can't be changed, then the project should not send these tasks to hosts that have less than ~20GB of GPU memory.
There could be another solution, if the workunit would allocate memory in a less careless way.
I've started a task on my RTX 4090 (it has 24GiB RAM), and I've monitored the memory usage:

idle: 305 MiB task starting: 895 MiB GPU usage rises: 6115 MiB GPU usage drops: 7105 MiB GPU usage 100%: 7205 MiB GPU usage drops: 8495 MiB GPU usage rises: 9961 MiB GPU usage drops: 14327 MiB (it would have failed on my GTX 1080 Ti at this point) GPU usage rises: 6323 MiB GPU usage drops: 15945 MiB GPU usage 100%: 6205 MiB ...and so on

So the memory usage doubles at some points of processing for a short while, and this cause the workunits to fail on GPUs that have "small" amount of memory. If this behaviour could be eliminated, much more hosts could process these workunits.

2) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61147)
Posted 76 days ago by

Retvari Zoltan

The present batch has a far worse failure ratio than the previous one.

3) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61140)
Posted 77 days ago by

Retvari Zoltan

150.000 credits for a few 100 seconds? I'm in! ;)
https://www.gpugrid.net/result.php?resultid=33771102
https://www.gpugrid.net/result.php?resultid=33771333
https://www.gpugrid.net/result.php?resultid=33771431
https://www.gpugrid.net/result.php?resultid=33771446
https://www.gpugrid.net/result.php?resultid=33771539

4) Message boards : Frequently Asked Questions (FAQ) : GPU recognition issues (Message 60556)
Posted 295 days ago by

Retvari Zoltan

Hi, why is your project saying I need to have Nvidia GPU when I have three Nvidia GT 1030 cards?

BOINC don't recognize your GPUs if you have installed BOINC as a service ("protected installation"), or you don't have the appropiate NVidia drivers installed on your system.

5) Message boards : News : Experimental Python tasks (beta) - task description (Message 59925)
Posted 427 days ago by

Retvari Zoltan

Anybody else getting sent Python tasks for the old 1121 app?
...
The 1121 app tasks are instant erroring out.

I had four. All have failed on my host, but one of them finished on the 7th resend.
Edit: because that was the 1131 app.

6) Message boards : News : Experimental Python tasks (beta) - task description (Message 59912)
Posted 430 days ago by

Retvari Zoltan

If you need the heat output of the GPU, then you need to run a different project.

I came to that conclusion, again.

Or only run ACEMD3 tasks when they are available.

I caught 2 or 3, that's why I put 3 host back to GPUGrid.

You will not get it [the full GPU heat output] from the Python tasks in their current state.

That's regrettable, but it could be ok for me this spring.

My main issue with the python app is that I think there's no point running that many spawned (training) threads, as their total (combined) memory access operations cause massive amount of CPU L3 cache misses, hindering each other's performace.
Before I've put my i9-12900F host back to GPUGrid, I run 7 TN-Grid tasks + 1 FAH GPU task simultaneously on that host, the average processing time was 4080-4200 sec for the TN-Grid tasks.
Now I run 1 GPUGrid task + 1 TN-Grid task simultaneously, and the processing time of the TN-Grid task went up to 4660-4770 sec. Compared to the 6 other TN-Grid tasks plus a FAH task the GPUGrid python task cause a 14% performance loss.
You can see the change in processing times for yourself here.
If I run only 1 TN-Grid task (no GPU tasks) on that host, the processing time is 3800 seconds. Compared to that, running a GPUGrid pythnon task cause a 22% performance loss.
Perhaps this app should do a short benchmark of the given CPU it's actually running on to establish the ideal number of training threads, or give some control of that number for the advanced users like me :) to do that benchmarking of their respective systems.

7) Message boards : News : Experimental Python tasks (beta) - task description (Message 59904)
Posted 431 days ago by

Retvari Zoltan

Is there a way to control the number of spawned threads?
there is no reason to do this anymore.

My reason to reduce their numbers is to run two tasks at the same time to increase GPU usage, because I need the full heat output of my GPUs to heat our apartment. As I saw it in "Task Manager" the CPU usage of the spawned tasks drops when I start the second task (my CPU doesn't have that many threads).
Could the GPU usage be increased somehow?

it's now capped at 4x CPU threads and hard coded in the run.py script. but that is in addition to the 32 threads for the agents.
there is no way to reduce that ...

I confirm that. I looked into that script, though I'm not very familiar with python. I've even tried to modify the num_env_processes in conf.yaml, but this file gets overwritten every time I restart the task, even though I removed the rights of the boinc user and the boinc group to write that file. :)

if you want to run python tasks, you need to account for this and just tell BOINC to reserve some extra CPU resources by setting a larger value for the cpu_usage in app_config. i use values between 8-10. but you can experiment with what you are happy with. on my python dedicated system, I stop all other CPU projects as that gives the best performance.

That's clear I did that.

8) Message boards : News : Experimental Python tasks (beta) - task description (Message 59900)
Posted 432 days ago by

Retvari Zoltan

I've been away from GPUGrid for a while...
Is there a way to control the number of spawned threads?
I've tried to modify the line:

<setenv>NTHREADS=$NTHREADS</setenv>

in linux_job.###########.xml file to

<setenv>NTHREADS=8</setenv>

but it made no difference.
The task was started with the original NTHREADS setting.
Is it the reason for no change in the number of spawned threads, or I should modify something else?

9) Message boards : Graphics cards (GPUs) : How many Pythons does it need to run??? (Message 58895)
Posted 678 days ago by

Retvari Zoltan

The Python tasks spawn more than two processes. Usually 32 or more.
This is normal and typical of reinforced learning.

Really!? 32 or more? Hum.
At least the problem is known.

This is not a bug. This is a feature.

10) Message boards : Number crunching : Is GPUGRID done? (Message 58817)
Posted 704 days ago by

Retvari Zoltan

Anyone willing and able to pick up the challenge of developing a 'work request' script which suspends the flow of update requests when the host has active tasks for the project?

I wrote a batch script for Windows XP / Windows 7 back in 2013. The main purpose was to check the progress of the tasks, and restart the host if there's a stucked task. It's here if you are interested, though some parts of it won't work on Windows 10, also I have a more sophisticated one now (which actually does check the number of GPUGrid workunits in the queue, and issues requests when there's less than 2 per GPU), but it needs to be updated for Windows 10 (no WMIC in Windows 10, it should be rewritten in powershell). It does much more than requesting work if needed. I haven't published it, because I find the mass use of such methods counterproductive for the community. It can't cure the work shortage anyway, the continous supply of workuntis would be the real solution. The primitive method is already out there, the mass use of that (or a more sophisticated one) could bring down the GPUGrid server, regardless of the DDOS protection in place.

Next 10

	About	Science	Volunteers	Performance	Forum	Join us	Donate