Advanced search

Message boards : News : What is happening and what will happen at GPUGRID, update for 2021

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57240 - Posted: 7 Aug 2021 | 6:44:07 UTC

As you know, GPUGRID was the first BOINC project to run GPU applications, in fact we help in creating the infrastructure for that. This was very many years ago and since then many things changed. In particular, recently, we had not a constant stream of workunits. I would like to explain the present and expected future of GPUGRID here.

In the last few years, we moved from doing science by running very many simulations to develop new methods at the boundary between physical simulations and machine learning methods/artificial intelligence. These new methods did not require a lot of simulations and most of the PhD students in the research group did not use GPUGRID daily. We still had some long term project running on GPUGRID for which you will see results shortly in terms of new scientific publications.

Among other things ACEMD, the application behind GPUGRID is now built partially using OpenMM, of which I am also principal investigator. As you might know OpenMM is also used in Folding@Home. We have received a grant to develop OpenMM very recently with one or two people starting before the end of the year. This will be good for GPUGRID because it means that we will be using GPUGRID a lot more.

Furthermore, we recently found a way to run AI simulations in GPUGRID. We have only run very few test cases, but there is a PhD student in the lab with a thesis on cooperative intelligence, where very many AI agents collaborate to solve tasks. The goal is to understand how cooperative intelligence works. We are also looking for a postdoc in cooperative intelligence, in case you know somebody.

https://www.compscience.org

I hope that this clarify the current situation. On the practical term, we expect to have the ACEMD application fixed for RTX30xx within few weeks, as now the developper of ACEMD is also doing the deployment on GPUGRID, making everything simpler.

GDF

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 57241 - Posted: 7 Aug 2021 | 11:13:43 UTC

Thanks for the much anticipated update! Appreciate that you provide a roadmap for the future. Hopefully there aren't too many roadblocks ahead with the development of OpenMM. The future project direction sounds very exciting :) I'll take that as an opportunity to upgrade my host by the end of the year to contribute more intensively next year!

Keep up the good work

baffoni
Send message
Joined: 7 Mar 20
Posts: 1
Credit: 18,787,429
RAC: 178,454
Level
Pro
Scientific publications
wat
Message 57242 - Posted: 7 Aug 2021 | 14:54:01 UTC - in response to Message 57240.

Are there any plans to add support for AMD GPUs now that ACEMD3 supports OPENCL? https://software.acellera.com/docs/latest/acemd3/capabilities.html This would increase participation.

[CSF] Aleksey Belkov
Avatar
Send message
Joined: 26 Dec 13
Posts: 85
Credit: 1,215,531,270
RAC: 403,100
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57244 - Posted: 7 Aug 2021 | 20:11:45 UTC
Last modified: 7 Aug 2021 | 20:18:19 UTC

Good news, everyone!

mikey
Send message
Joined: 2 Jan 09
Posts: 290
Credit: 1,947,828,615
RAC: 11,340,297
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57245 - Posted: 8 Aug 2021 | 3:44:03 UTC - in response to Message 57240.

Great News, Thanks!!!!

Profile [AF] fansyl
Send message
Joined: 26 Sep 13
Posts: 20
Credit: 1,711,706,441
RAC: 891,866
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57246 - Posted: 8 Aug 2021 | 8:53:02 UTC

Thanks for the news, I hope the work goes well. I am looking forward to making new calculations.

Go ahead !

erotemic
Send message
Joined: 29 May 21
Posts: 1
Credit: 147,308,728
RAC: 441,956
Level
Cys
Scientific publications
wat
Message 57249 - Posted: 9 Aug 2021 | 23:15:01 UTC

I've got a 3090 and 2 1080ti's waiting for some work. Looking forward to the new updates.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57250 - Posted: 10 Aug 2021 | 13:24:16 UTC

Thanks for the update, GDF, it's very much appreciated!

MrS
____________
Scanning for our furry friends since Jan 2002

Erich56
Send message
Joined: 1 Jan 15
Posts: 1087
Credit: 6,446,781,926
RAC: 26,532,609
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57251 - Posted: 14 Aug 2021 | 16:44:29 UTC - in response to Message 57240.
Last modified: 14 Aug 2021 | 16:44:54 UTC

On the practical term, we expect to have the ACEMD application fixed for RTX30xx within few weeks, as now the developper of ACEMD is also doing the deployment on GPUGRID, making everything simpler.

one of my hosts with two RTX3070 inside will be pleased :-)

Profile luckdragon82
Send message
Joined: 16 Nov 11
Posts: 4
Credit: 420,687,609
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 57252 - Posted: 20 Aug 2021 | 6:46:21 UTC - in response to Message 57242.

Are there any plans to add support for AMD GPUs now that ACEMD3 supports OPENCL? https://software.acellera.com/docs/latest/acemd3/capabilities.html This would increase participation.


I would also like to know if AMD will finally be supported. I have a water-cooled Radeon RX 6800 XT and am ready to utilize its full capacity for cancer and COVID research, as well as other projects as they may come.

AMD Ryzen 9 5950X
AMD Radeon RX 6800 XT
32BG 3200MHz CAS-14 RAM
NVMe 4th Gen storage
Custom water cooling

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57257 - Posted: 2 Sep 2021 | 12:47:05 UTC - in response to Message 57252.

Some initial new version of ACEMD has been deployed on linux and it's working, but we are still testing.

gdf

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57259 - Posted: 2 Sep 2021 | 18:06:25 UTC - in response to Message 57257.
Last modified: 2 Sep 2021 | 18:44:40 UTC

Some initial new version of ACEMD has been deployed on linux and it's working, but we are still testing.

gdf


what is the criteria for sending the cuda101 app vs the cuda1121 app?

I see both apps exist. and new drivers on even older cards will support both apps. For example, if you have CUDA 11.2 drivers on a Turing card, you can run both the 11.2 app or the 10.1 app. so what criteria does the server use to determine which app to send my Turing cards?

Of course Ampere cards should only get the 11.2 app.

Also looks like the Windows apps are missing for New ACEMD, are you dropping Windows support?
____________

WR-HW95
Send message
Joined: 16 Dec 08
Posts: 7
Credit: 1,413,313,813
RAC: 488,502
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57260 - Posted: 2 Sep 2021 | 19:04:21 UTC

Support for AMD cards would make good for project.
Atm. I´m running mostly Milkyway with my 6900XT and its doing 3 units in 1:50... that takes for 1080Ti about 6:00. I havent checked times to WCG because GPU units cames up so rarely, but I can imagine it to be ok in that too.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57261 - Posted: 2 Sep 2021 | 19:38:16 UTC - in response to Message 57257.
Last modified: 2 Sep 2021 | 19:48:46 UTC

Some initial new version of ACEMD has been deployed on linux and it's working, but we are still testing.

gdf


there seems to be a problem with the new 2.17 app. it's always trying to run on GPU0 even when BOINC assigns it to another GPU.

I had this happen on two separate hosts now. the host picked up a new task, BOINC assigns it to some other GPU (like device 6 or device 3) but the acemd process spins up on GPU0 anyway, even though it is already occupied by another BOINC process from another project. I think there's sometime off in that the boinc device assignment isnt being communicated to the app properly. this results in multiple processes running on a single GPU, and no process running on the device that BOINC assigned the GPUGRID task to.

rebooting the BOINC client brings it back to "OK" since it prioritizes the GPUGRID task to GPU0 on startup (probably due to resource share). but I feel this will keep happening.

needs an update ASAP.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,518,461,851
RAC: 8,592,356
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57262 - Posted: 2 Sep 2021 | 21:20:18 UTC

Haven't snagged one of the new ones as yet (I was out all day), but I'll watch out for them, and try to find out where the device allocation is failing.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,845,452,024
RAC: 12,749,353
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57264 - Posted: 2 Sep 2021 | 22:54:46 UTC - in response to Message 57261.

there seems to be a problem with the new 2.17 app. it's always trying to run on GPU0 even when BOINC assigns it to another GPU.

I had this happen on two separate hosts now. the host picked up a new task, BOINC assigns it to some other GPU (like device 6 or device 3) but the acemd process spins up on GPU0 anyway, even though it is already occupied by another BOINC process from another project. I think there's sometime off in that the boinc device assignment isnt being communicated to the app properly. this results in multiple processes running on a single GPU, and no process running on the device that BOINC assigned the GPUGRID task to.

First of all: Congratulations, both of your multi GPU systems that weren't getting work from previous app versions, seem to have the problem solved with this new one. Welcome back to the field!

I'm experiencing the same behavior, and I can go even further:
I catched six WUs of the new app version 2.17 at my triple 1650 GPU system.
Then I aborted three of these WUs, and two of them were recatched at my twin 1650 GPU system.
At the triple GPU system: While all the three WUs seem to be progressing normally from the Boinc Manager point of view, looking at Psensor only GPU #0 (first PCIE slot) is working and GPUs #1 and #2 are inactive. It's like GPU #0 is carrying all the workload for the three WUs. Same 63% fraction done after 8,25 hours for all the three WUs. However, CPU usage is coherent with three WUs running concurrently at this system.
At the twin GPU system: While both WUs seem to be progressing normally from the Boinc Manager point of view, looking at Psensor only GPU #0 (first PCIE slot) is working and GPU #1 is inactive. It's like GPU #0 is carrying all the workload for both WUs. Same 89% fraction done after 8 hours for both WUs. Also, CPU usage is coherent with two WUs running concurrently at this system.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,845,452,024
RAC: 12,749,353
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57265 - Posted: 3 Sep 2021 | 6:20:59 UTC - in response to Message 57261.

there seems to be a problem with the new 2.17 app. it's always trying to run on GPU0 even when BOINC assigns it to another GPU.

Confirmed:
While Boinc Manager was saying that Task #32640074, Task #32640075 and Task #32640080 were running at devices #0, #1 and #2 at this triple GTX 1650 GPU system, they actually were all processed concurrently at the same device #0.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57266 - Posted: 3 Sep 2021 | 15:05:45 UTC - in response to Message 57264.


First of all: Congratulations, both of your multi GPU systems that weren't getting work from previous app versions, seem to have the problem solved with this new one. Welcome back to the field!


yeah it was actually partially caused by some settings on my end, combined with the fact that when the cuda1121 app was released on July 1st they deleted/retired/removed the cuda100 app. had they left the cuda100 app in place, I would have at least received that one still. i'll post more details in the original thread about that issue.
____________

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57298 - Posted: 14 Sep 2021 | 13:15:02 UTC

The device problem should be fixed now.
Windows version on their way

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57299 - Posted: 14 Sep 2021 | 16:54:23 UTC - in response to Message 57298.

Windows version deployed

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57300 - Posted: 14 Sep 2021 | 18:22:03 UTC

Great news! Now to just have some tasks ready to send.

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 57304 - Posted: 17 Sep 2021 | 9:57:51 UTC - in response to Message 57240.

Good news! :-)

stiwi
Send message
Joined: 18 Jun 12
Posts: 2
Credit: 100,396,087
RAC: 13,146
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57305 - Posted: 17 Sep 2021 | 15:57:22 UTC
Last modified: 17 Sep 2021 | 16:00:31 UTC

Got 1 WU.

Boinc shows a runtime of 6 days on my 2080ti with 60% powertarged. Hoppefully its not the real time.

Edit: 1% after 10 Minutes so probably something about 16h

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57308 - Posted: 17 Sep 2021 | 18:17:05 UTC

Picked up a task apiece on two hosts. Some 800 tasks in progress now.

Hope this means the project is getting back to releasing steady work.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57360 - Posted: 23 Sep 2021 | 15:20:57 UTC
Last modified: 23 Sep 2021 | 15:23:55 UTC

it seems that even though the cuda1121 app is available and works fine on Ampere cards, there's nothing preventing the cuda101 app from being sent to an Ampere host. these will always fail.

example: https://gpugrid.net/result.php?resultid=32643471

The project-side scheduler needs to be adjusted to not allow the cuda101 app from being sent to Ampere hosts. this can be achieved by checking the compute capability reported from the host. In addition to the cuda version checks, the cuda101 app should be limited to hosts with compute capability less than 8.0. hosts with 8.0 or greater should only get the cuda1121 app.

or simply remove the cuda101 app and require all users to update their video drivers to use the 1121 app.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57363 - Posted: 24 Sep 2021 | 0:00:36 UTC - in response to Message 57360.
Last modified: 24 Sep 2021 | 0:00:55 UTC

This is correct. I had the same exact failure with the CUDA101 app sent to my Ampere RTX 3080.

Failure was the inability to compile the CUDA kernel because it was expecting a different architecture.

https://www.gpugrid.net/result.php?resultid=32642922

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57365 - Posted: 24 Sep 2021 | 13:57:01 UTC - in response to Message 57363.

Technical question: has anybody idea if the CC is available in the scheduler?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,518,461,851
RAC: 8,592,356
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57366 - Posted: 24 Sep 2021 | 14:19:16 UTC - in response to Message 57365.
Last modified: 24 Sep 2021 | 14:30:08 UTC

Technical question: has anybody idea if the CC is available in the scheduler?

I'm sure it is. I'll start digging out some references, if you want.

Sorry, you caught me in the middle of a driver update.

Try https://boinc.berkeley.edu/trac/wiki/AppPlanSpec#NVIDIAGPUapps:

<min_nvidia_compcap>MMmm</min_nvidia_compcap>
minimum compute capability
<max_nvidia_compcap>MMmm</max_nvidia_compcap>
maximum compute capability

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57367 - Posted: 24 Sep 2021 | 14:28:27 UTC - in response to Message 57366.

To expand on what Richard wrote, I’m sure it’s available. Einstein@home uses this metric in their scheduler to gatekeep some of their apps to certain generations of Nvidia GPUs. So it’s definitely information that’s provided from the host to the project via BOINC.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,518,461,851
RAC: 8,592,356
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57368 - Posted: 24 Sep 2021 | 14:33:31 UTC - in response to Message 57367.
Last modified: 24 Sep 2021 | 14:34:00 UTC

So it’s definitely information that’s provided from the host to the project via BOINC.

From the most recent sched_request file sent from this computer to your server:

<coproc_cuda>
...
<major>7</major>
<minor>5</minor>

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57369 - Posted: 24 Sep 2021 | 14:44:30 UTC - in response to Message 57368.

Uhm... yes but I was wondering how to retrieve it in the C++ code.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,518,461,851
RAC: 8,592,356
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57370 - Posted: 24 Sep 2021 | 14:56:53 UTC - in response to Message 57369.

Uhm... yes but I was wondering how to retrieve it in the C++ code.

Same principle. Start with Specifying plan classes in C++, third example.

...
if (!strcmp(plan_class, "cuda23")) {
if (!cuda_check(c, hu,
100, // minimum compute capability (1.0)
200, // max compute capability (2.0)
2030, // min CUDA version (2.3)
19500, // min display driver version (195.00)
384*MEGA, // min video RAM
1., // # of GPUs used (may be fractional, or an integer > 1)
.01, // fraction of FLOPS done by the CPU
.21 // estimated GPU efficiency (actual/peak FLOPS)
)) {
return false;
}
}

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,845,452,024
RAC: 12,749,353
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57377 - Posted: 25 Sep 2021 | 21:52:50 UTC
Last modified: 25 Sep 2021 | 21:53:52 UTC

Uhm... yes but I was wondering how to retrieve it in the C++ code.

I guess that there might be an easier workaround, with no need to touch the current code.
It would consist of unfolding in Project Preferences page the ACEMD3 app into ACEMD3 (cuda 101) and ACEMD3 (cuda 1121)
This way, Ampere users would be able to untick ACEMD3 (cuda 101) app, thus manually preventing to receive tasks that will fail for sure.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57378 - Posted: 26 Sep 2021 | 2:49:42 UTC - in response to Message 57377.

Won't work for multi-gpu users that have both Turing and Ampere cards installed in a host.

I have a 2080 and 3080 together in a host.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,845,452,024
RAC: 12,749,353
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57379 - Posted: 26 Sep 2021 | 7:58:44 UTC - in response to Message 57378.

Won't work for multi-gpu users that have both Turing and Ampere cards installed in a host.

I have a 2080 and 3080 together in a host.

It should work as a manual selection in Project Preferences for receiving ACEMD3 (cuda 1121) tasks only.
Your RTX 3080 (Ampere - device 0) can't process ACEMD3 (cuda 101), as seen in your failed task e1s627_I757-ADRIA_AdB_KIXCMYB_HIP-1-2-RND0972_0, but it can process ACEMD3 (cuda 1121), as seen in your succeeded task e1s385_I477-ADRIA_AdB_KIXCMYB_HIP-0-2-RND6281_2
And your RTX 2080 (Turing - device 1) on the same host can also process ACEMD3 (cuda 1121) tasks, as seen in your succeeded task e1s667_I831-ADRIA_AdB_KIXCMYB_HIP-1-2-RND8282_1
Therefore, restricting preferences in a particular venue for your host # 462662] to only receiving ACEMD3 (cuda 1121) tasks would work for both cards.

The exception is the general limitation for ACEMD3 app and for all kind of mixed multi-GPU systems when restarting tasks in a different device.
It was described by Toni at his Message #52865, dated on Oct 17 2019.
Paragraph: Can I use it on multi-GPU systems?

Can I use it on multi-GPU systems?

In general yes, with one caveat: if you have DIFFERENT types of NVIDIA GPUs in the same PC, suspending a job in one and restarting it in the other will NOT be possible (errors on restart). Consider restricting the client to one GPU type only ("exclude_gpu", see here).

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,518,461,851
RAC: 8,592,356
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57380 - Posted: 26 Sep 2021 | 9:59:18 UTC

There are two different aspects to this debate:

1) What will a project server send to a mixed-GPU client?
2) Which card will the client choose to allocate a task to?

The project server will allocate work solely on the basis of Keith's 3080. BOINC has been developed, effectively, to hide his 2080 from the server.

Keith has some degree of control over the behaviour of his client. He can exclude certain applications from particular cards (using cc_config.xml), but he can't exclude particular versions of the same application - the control structure is too coarse.

He can also control certain behaviours of applications at the plan_class level (using app_config.xml), but that control structure is too fine - it doesn't contain any device-level controls.

Other projects have been able to develop general-purpose GPU applications which are at least compatible with mixed-device hosts - tasks assigned to the 'wrong' or 'changed' device at least run, even if efficiency is downgraded. If this project is unable to follow that design criterion (and I don't know why it is unable at this moment), then I think the only available solution at this time is to divide the versions into separate applications - analogous to the old short/long tasks - so that the limited range of available client options can be leveraged.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57381 - Posted: 26 Sep 2021 | 12:15:28 UTC - in response to Message 57380.

I haven’t seen enough compelling evidence to justify keeping the cuda101 app. The cuda1121 app works on all hosts and is basically the same speed.

Removing the cuda101 app would solve all problems.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57383 - Posted: 26 Sep 2021 | 16:51:41 UTC - in response to Message 57381.

I agree. Simply remove the CUDA101 app and restrict sending tasks to any host that hasn't updated the drivers to the CUDA11.2 level.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,518,461,851
RAC: 8,592,356
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57391 - Posted: 29 Sep 2021 | 13:22:37 UTC

What's with the new ACEMD beta version 9.17, introduced today? What are we testing?

I got a couple of really short tasks on Linux host 508381. That risks really messing up DCF.

SolidAir79
Send message
Joined: 22 Aug 19
Posts: 7
Credit: 168,393,363
RAC: 0
Level
Ile
Scientific publications
wat
Message 57392 - Posted: 29 Sep 2021 | 14:50:26 UTC

I received 15 of the test wu's no problems on Ampere all crunched without issue just want more :)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57393 - Posted: 29 Sep 2021 | 15:58:17 UTC - in response to Message 57392.

Yes, you crunched those tasks with the CUDA1121 app on your Ampere which was always working.

A beta test should have been with the CUDA101 app on Ampere to see whether they have fixed the issue with all tasks failing

Killersocke
Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57394 - Posted: 29 Sep 2021 | 18:15:34 UTC

NVIDIA GeForce RTX 3080 TI
Driverversion 472.12 Game Ready Driver

e2s94_e1s550p0f1071-ADRIA_AdB_KIXCMYB_HIP-1-2-RND9846_0

195 (0xc3) EXIT_CHILD_FAILED
New version of ACEMD v2.18 (cuda101)

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
20:09:10 (720): wrapper (7.9.26016): starting
20:09:10 (720): wrapper: running bin/acemd3.exe (--boinc --device 0)
ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

Profile PDW
Send message
Joined: 7 Mar 14
Posts: 15
Credit: 1,000,002,525
RAC: 0
Level
Met
Scientific publications
watwatwatwatwat
Message 57395 - Posted: 29 Sep 2021 | 18:32:06 UTC - in response to Message 57394.

Mine also failed on a single 3080, driver 470.63:

Application version ACEMD beta version v9.17 (cuda101)
Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
19:20:04 (445272): wrapper (7.7.26016): starting
19:20:12 (445272): wrapper (7.7.26016): starting
19:20:12 (445272): wrapper: running bin/acemd3 (--boinc --device 0)
ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

19:20:13 (445272): bin/acemd3 exited; CPU time 0.782133
19:20:13 (445272): app exit status: 0x1
19:20:13 (445272): called boinc_finish(195)

</stderr_txt>
]]>

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 57396 - Posted: 29 Sep 2021 | 22:01:26 UTC - in response to Message 57393.

Yes, you crunched those tasks with the CUDA1121 app on your Ampere which was always working.

A beta test should have been with the CUDA101 app on Ampere to see whether they have fixed the issue with all tasks failing


They’ll never get cuda101 working on Ampere as long as they have the architecture check.
____________

mikey
Send message
Joined: 2 Jan 09
Posts: 290
Credit: 1,947,828,615
RAC: 11,340,297
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57408 - Posted: 1 Oct 2021 | 0:47:15 UTC - in response to Message 57392.

I received 15 of the test wu's no problems on Ampere all crunched without issue just want more :)


I got a few on my Nvidia 1660 and would also like to see more come my way.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57466 - Posted: 4 Oct 2021 | 21:31:15 UTC - in response to Message 57396.

Yes, you crunched those tasks with the CUDA1121 app on your Ampere which was always working.

A beta test should have been with the CUDA101 app on Ampere to see whether they have fixed the issue with all tasks failing


They’ll never get cuda101 working on Ampere as long as they have the architecture check.

Just threw away a couple more tasks because the server scheduler sent me the CUDA101 app for my Ampere card.

James C. Owens
Send message
Joined: 16 Apr 09
Posts: 7
Credit: 3,404,168,404
RAC: 383,192
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57859 - Posted: 20 Nov 2021 | 18:05:59 UTC

Are we out of work again? I am going to have to greylist GPUGrid again unless I see better WU flow soon.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57860 - Posted: 20 Nov 2021 | 18:23:57 UTC - in response to Message 57859.
Last modified: 20 Nov 2021 | 18:26:07 UTC

Looks like it. I was hoping the new work from the new researcher doing machine learning with pytorch was going to provide consistent work again.

But this post from Toni differs. https://www.gpugrid.net/forum_thread.php?id=5283&nowrap=true#57849

Until we fill these positions, we have little capacity to send jobs.


Didn't help the WAS and ZCD scores either that the server script for generating work credits export wasn't running for 5 days either.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57861 - Posted: 22 Nov 2021 | 17:52:52 UTC

Project has been greylisted again for Gridcoin. 😱

Erich56
Send message
Joined: 1 Jan 15
Posts: 1087
Credit: 6,446,781,926
RAC: 26,532,609
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57862 - Posted: 23 Nov 2021 | 7:08:03 UTC - in response to Message 57860.

Looks like it. I was hoping the new work from the new researcher doing machine learning with pytorch was going to provide consistent work again.

But this post from Toni differs. https://www.gpugrid.net/forum_thread.php?id=5283&nowrap=true#57849
Until we fill these positions, we have little capacity to send jobs.


Didn't help the WAS and ZCD scores either that the server script for generating work credits export wasn't running for 5 days either.

to me all this seems that no one can really tell when GPUGRID will be back to "normal", which is very sad in a way :-(

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,845,452,024
RAC: 12,749,353
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57863 - Posted: 23 Nov 2021 | 8:12:14 UTC

If project ostracism takes much longer, there is a certain risk of gradually losing its well earned prestige over years.
Usually crisis lead to changes.
I wish this time changes-to-better are being prepared.

micropro
Send message
Joined: 4 Feb 20
Posts: 8
Credit: 674,423
RAC: 0
Level
Gly
Scientific publications
wat
Message 57864 - Posted: 23 Nov 2021 | 10:59:03 UTC - in response to Message 57862.

Actually we could have a approximate date of the "restart" of the project.

I read the PhD. proposition and it's kind of an "ASAP" to start working.

But I'm quite sure there's paperwork, review of candidates etc.

So... maybe there's a deadline to all that.


@ServicEnginIC
"Usually crisis lead to changes." true. Let's hope it's for the better whatever the different crisis for now.
(makes me think of "The times they are a-changin' of Bob Dylan")
Never got the chance to tell you but nice picture as avatar.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57963 - Posted: 29 Nov 2021 | 16:29:41 UTC - in response to Message 57861.

Incidentally how does gridcoin works and where is the list of project gray listed?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 57965 - Posted: 29 Nov 2021 | 17:40:16 UTC
Last modified: 29 Nov 2021 | 17:42:53 UTC

The project won't be whitelisted again until it can consistently create work and have its credits exported for enough days to get its work available and zero credit days scores below greylist criteria.

You can verify here.https://gridcoin.ddns.net/pages/project-list.php

The fact that your stats creation script fails frequently causes it to be greylisted often along with the projects inability create consistent work does also

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57991 - Posted: 1 Dec 2021 | 15:23:00 UTC - in response to Message 57965.

and what is the point of gridcoin?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 16,606
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57992 - Posted: 1 Dec 2021 | 15:33:30 UTC - in response to Message 57991.

and what is the point of gridcoin?
The crucnher receives Gridcoins for crunching. Gridcoins worth much less than Bitcoins as of yet (1 GRC = 0.01 USD).

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 58001 - Posted: 1 Dec 2021 | 19:19:19 UTC - in response to Message 57991.

Gridcoin rewards citizen scientists for their distributed computing contribution.

The Gridcoin network can reward your project's userbase to encourage participation and improve user retention.

Gridcoiners can also send GRC directly to researchers for their project through the project's wallet address or can side-stake a portion of their research awards to project scientists.

The Gridcoin protocol distributes GRC directly to scientists and those who contribute to scientific endeavors. Every time you use GRC, you voice your support for an economy based on science.

enels
Send message
Joined: 16 Sep 08
Posts: 9
Credit: 915,807,167
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58034 - Posted: 7 Dec 2021 | 20:37:29 UTC

The human foot did not evolve with shoes.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 58040 - Posted: 8 Dec 2021 | 9:02:30 UTC

can't someone pls just block this guy/bot. guy's posting all over random stuff and cluttering message boards across various projects. thx

marsinph
Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 58043 - Posted: 10 Dec 2021 | 8:54:57 UTC

Admin,
please, block this "administrator" (userid 573008)
he (or it) publish non sense ,and not only here !!!

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,035,789,249
RAC: 30,992,531
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 58059 - Posted: 11 Dec 2021 | 0:36:50 UTC - in response to Message 58043.

marsinph that is a hassle to do for admin but you can block him in your end.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1087
Credit: 6,446,781,926
RAC: 26,532,609
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 58064 - Posted: 11 Dec 2021 | 7:32:09 UTC - in response to Message 58059.

marsinph that is a hassle to do for admin

why so? Should be easy.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,025,382
RAC: 1,853,638
Level
Trp
Scientific publications
watwatwat
Message 58066 - Posted: 11 Dec 2021 | 10:08:20 UTC - in response to Message 57963.
Last modified: 11 Dec 2021 | 10:08:36 UTC

Incidentally how does gridcoin works and where is the list of project gray listed?

Get a GridCoin wallet and post your address. I'd be glad to sidestake you my GRC earnings from all projects, not just GPUgrid.

I swear there used to be way to see the whitelist the scraper was using in real-time. Maybe they dropped it. They still have obsolete stuff like Team_Whitelist displayed on the Help/Debug window/Scraper tab.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 58072 - Posted: 11 Dec 2021 | 16:47:47 UTC - in response to Message 58066.

The website for the status of project whitelisting https://gridcoin.ddns.net/pages/project-list.php
isn't being maintained anymore. G-UK stated this to me recently when I added an issue to the code repo for the page. So you can't depend on that for true status.

But the wallet client Help/Debug window/Scraper tab does show the status of current whitelisted projects. Unlisted or greylisted projects fall off the scraper convergence tally.

Also the Main page of the wallet has the gear icon which takes you to the Researcher configuration page with the Summary and Projects tabs. The Projects tab lists all the projects and their status of listed or unlisted and if there is magnitude shown, you know that it is still listed. This updates at every Superblock.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,025,382
RAC: 1,853,638
Level
Trp
Scientific publications
watwatwat
Message 58078 - Posted: 12 Dec 2021 | 14:37:34 UTC

I had a pair of WUs complete successfully on a computer with a pair of 2080 Ti's. My other GG computer with a 3080 + 3080 Ti has failed 6 times. E.g.,

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:02:25 (654733): wrapper (7.7.26016): starting
14:02:44 (654733): wrapper (7.7.26016): starting
14:02:44 (654733): wrapper: running bin/acemd3 (--boinc --device 1)
16:50:46 (662618): wrapper (7.7.26016): starting
16:51:07 (662618): wrapper (7.7.26016): starting
16:51:07 (662618): wrapper: running bin/acemd3 (--boinc --device 0)
ERROR: /home/user/conda/conda-bld/acemd3_1632842613607/work/src/mdsim/context.cpp line 318: Cannot use a restart file on a different device!
16:51:11 (662618): bin/acemd3 exited; CPU time 4.131044
16:51:11 (662618): app exit status: 0x9e
16:51:11 (662618): called boinc_finish(195)

</stderr_txt>
]]>

The first two that failed were probably caused by me trying to use the BOINC command <ignore_nvidia_dev>1</ignore_nvidia_dev> in my cc_config and restarting boinc after suspending both tasks. Common sense says the proximal GPU would be GPU0 but apparently the distal GPU is device 0. So I deleted the command to allow acemd3 WUs to run on both GPUs.
The next pair of WUs also failed for "Cannot use a restart file on a different device!" While they were running a batch of OPNG WUs arrived and BOINC suspended acemd3 WUs and let the OPNG WUs run. Maybe when they restarted they tried to swap GPUs?
So, can acemd3 play nice with others or must I stop accepting OPNG if I want to run acemd3 WUs?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 58084 - Posted: 12 Dec 2021 | 16:55:41 UTC - in response to Message 58078.

If you have different devices, you can't interrupt the acemd3 calculation or you will error it out on restart. The only solution is to change your "switch between applications every xx minutes" in the Manager and pass that to the client.

Set that to a value that the longest running tasks have run on the host and add 10%. I have mine set at 2880 minutes or two days. The tasks run to completion on the card and then the card can move on to other gpu work.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 58085 - Posted: 12 Dec 2021 | 17:32:13 UTC - in response to Message 58084.

To add on to what Keith said, it’s become apparent that the acemd3 app is generating some memory load which is specific to the hardware it’s running on. Look at different GPUs and you’ll see different amounts of memory used by each, and it seems to scale by total memory size. (IE, a 12GB GPU will show more memory used than a 4GB GPU). Or maybe it scales by memory configuration (bandwidth, speed) and not just total size. Or some combination.

This is why you can’t restart a task on a different device. The calculation which was setup for some specific hardware cannot continue if you change the device hardware midway through. They seem to have some logic built in to catch when hardware has changed. If you have identical GPUs, most times it will restart on a different device OK, but I’ve seen times when even restarting on an identical GPU still triggers this and it fails right away.

Best option is to never interrupt GPUGRID tasks.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,025,382
RAC: 1,853,638
Level
Trp
Scientific publications
watwatwat
Message 58086 - Posted: 12 Dec 2021 | 18:44:10 UTC

At the root of this problem is why does GG switch to a different device when its name did not change???

If another GPU WU switches to Running High Priority it will override that switch every 2880 minutes approach.

When we get a continuous supply of acemd3 WUs my urge to timeslice them will vanish :-)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 58087 - Posted: 12 Dec 2021 | 18:56:25 UTC - in response to Message 58086.

Has nothing to do with GPUGrid or the acemd3 application. The issue is with BOINC.

BOINC does not care that you were running on any particular device. It just knows that a gpu resource just became available when a task finishes on a gpu and assigns the interrupted or checkpointed acemd3 task to the card that just became available.

If that is a different device that what the task started on, then the task errors out in the manner that Ian mentioned.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 58088 - Posted: 12 Dec 2021 | 19:24:06 UTC - in response to Message 58086.

Make sure your task switching options are set to longer than the estimated task run time.

As I recall, you also run WCG OPNG tasks, which if you run that at a higher priority, or some other project at a higher priority (resource share) and your task has been running longer than the task switch time, it will fail over to the new high priority task, stopping GPUGRID, then if it restarts on a different device, you get the error. But if your task switch time is longer than run time, it shouldn’t ever switch away to high priority work until the task is complete.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,025,382
RAC: 1,853,638
Level
Trp
Scientific publications
watwatwat
Message 58095 - Posted: 13 Dec 2021 | 14:31:19 UTC - in response to Message 58087.
Last modified: 13 Dec 2021 | 14:32:56 UTC

Has nothing to do with GPUGrid or the acemd3 application. The issue is with BOINC.

BOINC does not care that you were running on any particular device. It just knows that a gpu resource just became available when a task finishes on a gpu and assigns the interrupted or checkpointed acemd3 task to the card that just became available.

If that is a different device that what the task started on, then the task errors out in the manner that Ian mentioned.

Sounds like a defect that must be remedied. If it's not already a github issue someone should list. I've worn out my welcome with 4 issues that will never be fixed.

I wonder if it's not possible for GG to assign a GPU device number to the WU and stick with same after time-slicing?

Set switching to 1440 minutes and got 4 acemd3 WUs overnight that are running nicely. The OPNG WUs are sitting there patiently like a bug in a rug :-)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1280
Credit: 4,854,631,959
RAC: 4,348,946
Level
Arg
Scientific publications
watwatwatwatwat
Message 58097 - Posted: 13 Dec 2021 | 19:04:39 UTC - in response to Message 58095.

Again, the acemd3 application would have to be completely rewritten to hook into BOINC in a manner that BOINC does not currently have.

So rewrite BOINC first to get the ability to assign individual tasks to be locked to specific hardware and then rewrite GG to use that BOINC feature.

Or if the task is detected to be NOT running on the original hardware, start the the task from zero again so that it does not error. Which is not conducive to returning work in the original 5 day deadline for slower cards.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 58100 - Posted: 13 Dec 2021 | 19:41:41 UTC

keep task switching set to the proper value and operate with the understanding of the idiosyncrasies of different projects and this won't be an issue. for GPUGRID that means not turning off your PC or anything else that would cause the task to restart while processing GPUGRID work.

I take it a step further and have designed my GPUGRID systems with identical GPUs. not just the same model, but where possible the same exact SKU from the same manufacturer (the 7x 2080Ti system has all EVGA 2080Ti XC Ultra cards; the 7x 2080 system has all ASUS 2080Ti Turbo custom watercooled). Making the system homogeneous in this way greatly reduces the chance that a restart detects a new device as "different" in the event that a restart is unavoidable (like a power outage, or hardware issue), since they are all identical. You can take this mindset even further with battery backup to cover short power outages. My smaller 1-GPU system is on a 1500W battery backup that can keep the system up for a few minutes during short power blips, which is all I usually experience in my area. Just enough that the mains voltage drop doesn't induce a reboot, the battery just kicks on for a few seconds and it stays up.
____________

Gogian
Send message
Joined: 1 May 20
Posts: 2
Credit: 137,892,091
RAC: 0
Level
Cys
Scientific publications
wat
Message 58235 - Posted: 3 Jan 2022 | 22:03:28 UTC

I haven't received any new projects since roughly in November and I just recently replaced the Certificate. My other 3 projects are working just fine except for this one. I have tried resetting, removing and re-adding GPUGRID but no work gets downloaded. Any ideas?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1030
Credit: 35,478,482,483
RAC: 166,952,320
Level
Trp
Scientific publications
wat
Message 58236 - Posted: 3 Jan 2022 | 22:21:28 UTC - in response to Message 58235.

Any ideas?


no work is available to send you. GPUGRID always operates with intermittent work availability, especially recently when they really only release a small batch at a time. you're trying to get work in a time when work isn't available.


____________

Gogian
Send message
Joined: 1 May 20
Posts: 2
Credit: 137,892,091
RAC: 0
Level
Cys
Scientific publications
wat
Message 58252 - Posted: 6 Jan 2022 | 15:24:05 UTC - in response to Message 58236.

ok, Thanks!

Post to thread

Message boards : News : What is happening and what will happen at GPUGRID, update for 2021

//