Advanced search

Message boards : News : Experimental Python tasks (beta) - task description

Author Message
abouh
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 31 May 21
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 56977 - Posted: 17 Jun 2021 | 10:40:32 UTC

Hello everyone, just wanted to give some updates about the machine learning - python jobs that Toni mentioned earlier in the "Experimental Python tasks (beta) " thread.

What are we trying to accomplish?
We are trying to train populations of intelligent agents in a distributed computational setting to solve reinforcement learning problems. This idea is inspired in the fact that human societies are knowledgeable as a whole, while individual agents have limited information. Also, every new generation of individuals attempts to expand and refine the knowledge inherited from previous ones, and the most interesting discoveries become part of a corpus of common knowledge. The idea is that small groups of agents will train in GPUgrid machines, and report their discoveries and findings. Information of multiple agents can be put in common and conveyed to new generations of machine learning agents. To the best of our knowledge this is the first time something of this sort is attempted in a GPUGrid-like platform, and has the potential to scale to solve problems unattainable in smaller scale settings.

Why most jobs were failing a few weeks ago?
It took us some time and testing to make simple agents work, but we managed to solve the problems in the previous weeks. Now, almost all agents train successfully.

Why are GPUs being underutilized? and why are CPU used for?
In the previous weeks we were running small scale tests, with small neural networks models that occupied little GPU memory. Also, some reinforcement learning environments, especially simple ones like those used in the test, run on CPU. Our idea is to scale to more complex models and environments to exploit the GPU capacity of the grid.

More information:
We use mainly PyTorch to train our neural networks. We only use Tensorboard because it is convenient for logging. We might remove that dependency in the future.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 83
Credit: 13,785,091
RAC: 12
Level
Pro
Scientific publications
wat
Message 56978 - Posted: 17 Jun 2021 | 11:46:18 UTC
Last modified: 17 Jun 2021 | 12:08:24 UTC

Highly anticipated and overdue. Needless to say, kudos to you and your team for pushing the frontier on the computational abilities of the client software. Looking forward to contribute in the future, hopefully with more than I have at hand right now.

A couple of questions though:

1. As the main ML technique used for training the individual agents is neural networks, I wonder about the specifics of the whole setup? What does the learning data set look like? What AF do you use? Any optimisation, regularisation used?
2. Is it mainly about getting this kind of framework to work and then test for its accuracy? How did you determine the model's base parameters as is to get you started? How can you be sure that the initial model setup is getting you anywhere/is optimal? Or do you ultimately want to tune the final model and compare the accuracy of various reinforced learning approaches?
3. Is there a way to gauge the future complexity of those prospective WUs at this stage? Similar runtimes as the current Bandit tasks?
4. What do you want to use the trained networks for? What are you trying to predict? Or rephrased what main use cases/fields of research are currently imagined for the final model?
What do you envision to be

"problems [so far] unattainable in smaller scale settings"
?
5. What is the ultimate goal of this ML-project? Have only one latest gen trained agents group at the end that is the result of the continuous reinforeced learning iterations? Have several and test/benchmark them against each other?

Thx! Keep up the great work!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 469
Credit: 4,437,974,830
RAC: 69,046
Level
Arg
Scientific publications
wat
Message 56979 - Posted: 17 Jun 2021 | 13:26:58 UTC - in response to Message 56977.

will you be utilizing the tensor cores present in the nvidia RTX cards? the tensor cores are designed for this kind of workload.
____________

Profile phi1258
Send message
Joined: 30 Jul 16
Posts: 3
Credit: 1,403,534,484
RAC: 102,590
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 56989 - Posted: 18 Jun 2021 | 11:21:31 UTC - in response to Message 56977.

This is a welcome advance. Looking forward to contributing.



Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 412
Credit: 2,024,265,642
RAC: 471,587
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56990 - Posted: 18 Jun 2021 | 12:04:08 UTC - in response to Message 56977.

Thank you very much for this advance.
I understand that on this kind of "singular" research only a limited general guidelines can be given, or there is a risk for them not being singular any more...
Best wishes.

_heinz
Send message
Joined: 20 Sep 13
Posts: 16
Credit: 3,433,447
RAC: 0
Level
Ala
Scientific publications
wat
Message 56994 - Posted: 20 Jun 2021 | 5:39:42 UTC
Last modified: 20 Jun 2021 | 5:43:47 UTC

Wish you sucess.
regards _heinz
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 770
Credit: 3,402,889,227
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56996 - Posted: 21 Jun 2021 | 11:28:16 UTC - in response to Message 56979.

Ian&Steve C. wrote on June 17th:

will you be utilizing the tensor cores present in the nvidia RTX cards? the tensor cores are designed for this kind of workload.

I am courious what the answer will be

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 469
Credit: 4,437,974,830
RAC: 69,046
Level
Arg
Scientific publications
wat
Message 57000 - Posted: 22 Jun 2021 | 12:17:47 UTC

also, can the team comment on not just GPU "under"utilization. these have NO GPU utilization.

when will you start releasing tasks that do more than just CPU calculation? are you aware that only CPU calculation is occurring and nothing happens on the GPU at all? I have never observed these new tasks to use the GPU, ever. even the tasks that takes ~1hr to crunch. it all happens on the single CPU thread allocated for the WU. 0% GPU utilization and no gpugrid processes reported in nvidia-smi
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 1
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57009 - Posted: 23 Jun 2021 | 20:09:29 UTC

I understand this is basic research in ML. However, I wonder which problems it would be used for here. Personally I'm here for the bio-science. If the topic of the new ML research differs significantly and it seems to be successful based on first trials, I'd suggest to set it up as a seperate project.

MrS
____________
Scanning for our furry friends since Jan 2002

bozz4science
Send message
Joined: 22 May 20
Posts: 83
Credit: 13,785,091
RAC: 12
Level
Pro
Scientific publications
wat
Message 57014 - Posted: 24 Jun 2021 | 10:32:37 UTC

This is why I asked what "problems" are currently envisioned to be tackled by the resulting model. But IMO and understanding this is a ML project specifically set up to be trained on biomedical data sets. Thus, I'd argue that the science being done is still bio-related nonetheless. Would highly appreciate a feedback to loads of great questions here in this thread so far.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2274
Credit: 16,057,322,981
RAC: 94
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57020 - Posted: 26 Jun 2021 | 7:53:10 UTC

https://www.youtube.com/watch?v=yhJWAdZl-Ck

Post to thread

Message boards : News : Experimental Python tasks (beta) - task description