Advanced search

Message boards : Number crunching : Unsent tasks decreasing much more slowly

Author Message
WPrion
Send message
Joined: 30 Apr 13
Posts: 96
Credit: 1,891,034,111
RAC: 18,076,302
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54312 - Posted: 12 Apr 2020 | 13:04:35 UTC

I've noticed that the number of Unsent Tasks is decreasing at a much slower rate even though the number of tasks in progress is growing and the Current GigaFLOPS is approaching record levels.

Tasks in progress had decreased from 300,000 to 250,000 in a few weeks, but now it is taking several days to decrease by only 1,000.

What changed? Are additional new tasks being added or are the tasks being crunched now more difficult?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54314 - Posted: 12 Apr 2020 | 14:14:37 UTC - in response to Message 54312.

Toni prioritized some batches before, those have run out. That made the number of unsent task to decrease more rapidly.
Now it's back to the "normal" (almost 0) rate. It means that when these will run out, the decrease will be 100 times faster than the previous faster rate.

WPrion
Send message
Joined: 30 Apr 13
Posts: 96
Credit: 1,891,034,111
RAC: 18,076,302
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54317 - Posted: 13 Apr 2020 | 11:39:18 UTC - in response to Message 54314.

Thanks!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54579 - Posted: 4 May 2020 | 19:27:18 UTC

On March 10th 2020 | 17:39:16 UTC Retvari Zoltan wrote at message #53884:

I'm receiving many tasks which are the last one of their batch:

1nkvA00_450_0-TONI_MDADpr4sn-9-10-RND4090_0

Or near the end of their batch:
1gaxA04_348_0-TONI_MDADpr4sg-8-10-RND1850_0

Total number of tasks in the batch
The sequential number of the given task within the batch (starting number is 0)

I expect the number of unsent tasks in the queue will drop significantly during the next days.
There are 305.826 unsent tasks as I wrote this.

At this time, the number of unsent tasks is 243.556, as can be seen at Server status page.
The last tasks I'm currently receiving are similar to: 3tekA00_320_3-TONI_MDADpr4st-8-10-RND9554_0
As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54604 - Posted: 7 May 2020 | 6:01:35 UTC

As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?)

All my received WUs today are this kind.
Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54607 - Posted: 7 May 2020 | 9:45:18 UTC - in response to Message 54604.
Last modified: 7 May 2020 | 9:56:02 UTC

As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?)
All my received WUs today are this kind.
Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.
I'm sure that the number of unsent tasks will drop drastically in the next few days.
The only question is the bottom of that drop. It depends on the priority of the tasks in the queue. If it's uniform, the number of unsent tasks will drop near 0, only the tasks stuck in slow or inactive hosts will remain in the queue (~1000 in this case). If there are lower priority tasks than the ones we receive now, then we will receive those soon. We will know if that's the case as they will have low sequence number (for example 3-10). In this case the number of unsent tasks will remain high. I guess there are no lower priority tasks, so the number of unsent tasks will drop near 0.
Number of unsent task is 237.790 at the moment. (-4.773 ~2% drop in 3h 45m)

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54608 - Posted: 7 May 2020 | 11:39:02 UTC - in response to Message 54607.

I prioritised tasks ending with _0: 1gaxA04_348_0 over the others (_1 to _4)

T

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54612 - Posted: 7 May 2020 | 18:21:56 UTC - in response to Message 54604.
Last modified: 7 May 2020 | 18:36:17 UTC

Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.
Current reading is 222 460 that is -20 103 (8.28%) drop in 12h 20m = 27.17 / minute
If this rate is constant, the present supply will last for 5 days 16 hours 28 minutes and 50.8 seconds. :)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54613 - Posted: 8 May 2020 | 6:19:07 UTC - in response to Message 54612.
Last modified: 8 May 2020 | 6:24:10 UTC

Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.
Current reading is 222 460 that is -20 103 (8.28%) drop in 12h 20m = 27.17 / minute
If this rate is constant, the present supply will last for 5 days 16 hours 28 minutes and 50.8 seconds. :)
The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute
The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54618 - Posted: 8 May 2020 | 20:08:45 UTC - in response to Message 54613.

The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute
The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:)

-1) Mr. Zoltan: Thank you very much for making this funny.
I took screenshots that are confirming your data.





Reduction in unsent tasks: 41.926 in this about 24H lapse.

-2) Mr. Toni/GPUGrid's Team: Thank you very much for your continuous support.
This high decreasing rate has been greatly facilitated by exceptionally good communications since yesterday's morning.
Whatever you did in the transition from May 6th to 7th, it supposed a drastic change between extremely sluggish to very agile communications.
Please, take note of the recipy.

At he moment of writing this, scheduler is stopped.
I guess that this high rate in returning results has caused a new momentary buffer disk overflow...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54621 - Posted: 8 May 2020 | 20:26:58 UTC - in response to Message 54618.
Last modified: 8 May 2020 | 20:30:03 UTC

The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute
The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:)

At the moment of writing this, scheduler is stopped.
I guess that this high rate in returning results has caused a new momentary buffer disk overflow...

Note that the return rate was this high all along hence there are frequent disk buffer overflows. As new tasks created from the returned tasks the number of unsent workunits remain constant, so the return rate remain hidden from us, until the batches reach their final sequence number.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54622 - Posted: 8 May 2020 | 20:54:34 UTC - in response to Message 54621.

Note that the return rate was this high all along hence there are frequent disk buffer overflows. As new tasks created from the returned tasks the number of unsent workunits remain constant, so the return rate remain hidden from us, until the batches reach their final sequence number.

Yes, you're right, and I'm aware of it.
Lately frequent schduler stops most probably keep relationship with this Optimized bandwith anouncement, and significantly raised number of crunchers...
This combination has likely caused some bottleneck in project's resources.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 748,770,933
RAC: 127,823
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54624 - Posted: 9 May 2020 | 3:59:19 UTC

It looks like the server status page needs something added - free disk space - at least for this disk areas that receive uploads.

That seems to be the current bottleneck in the project's resources.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54631 - Posted: 9 May 2020 | 10:53:25 UTC - in response to Message 54624.

One more conclusion that could be drawn:

- Taking Retvari Zoltan's current calculation: 29,1 average returned WUs per minute
- Taking some calculations coming from this previous outage: 6,367 MB average per returned WU
This results in 185,28 MB coming from finished WUs data returned to server per minute.
That is: 260,55 GB of data to manage per day, counting only returned WU's data. (About 1 TB every 4 days)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1617
Credit: 8,176,644,351
RAC: 16,733,896
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54632 - Posted: 9 May 2020 | 12:31:04 UTC - in response to Message 54631.

What we don't know - at least, I certainly don't know, and I've not seen it described here, ever - is what exactly the processing path of that data is after our raw results are returned to the server.

We do know that each of our tasks forms part of a sequential sequence of (currently) 10 tasks making up the entire job, and that at least some of our returned data is used to assemble the starting data for the next task in the sequence.

Is it all used in that way? Once it's been used, does it need to be kept? If so, how long? Can it (any of it) be discarded once the next task in sequence has been created? Has been completed? Once the whole 10-task job has been completed?

People in other threads have mentioned SETI as a comparison. There, the process is that the scientific data returned by each task is assimilated into a gigantic, 20-year, scientific database. And that once assimilation has taken place, our raw, returned, data is erased (usually within 24 hours).

If we knew for certain that our returned data needed to be retained in quick-access online storage, say until the final paper had been accepted for publication following peer review, then I'd be prepared to contribute to a fundraising drive for additional disk spindles and a chassis to mount them in. But if the daily data is simply transferred over a slow link to an offsite backing store, then spindles aren't the answer: more drives would simply delay the need for an outage from a 5 day to a 10 day interval, and then extend that outage when it eventually arrived.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54640 - Posted: 10 May 2020 | 9:54:11 UTC
Last modified: 10 May 2020 | 9:54:52 UTC

Project's scheduler is just up again, with 174.874 tasks left ready to send!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54641 - Posted: 10 May 2020 | 10:24:22 UTC

All my stacked WUs have been reported as finished, and all (but one 8-10) the new WUs I've received are of the kind 9-10.
So this topic is still on fire 🔥🔥🔥

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54642 - Posted: 10 May 2020 | 11:44:52 UTC

I have a couple of ghost tasks, so I suppose that many other ghost tasks are waiting to pass their deadline, so some 8-10 tasks will be re-send to other hosts.
However the present supply (171,016) will last for about 4 days from now.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,157,167,459
RAC: 14,602,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54650 - Posted: 11 May 2020 | 6:53:32 UTC

What is the ghost recovery procedure on this project?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54651 - Posted: 11 May 2020 | 8:36:40 UTC - in response to Message 54650.

Ghost tasks are on GPUGRID's server side.
After 5 days deadline is past, server will automatically clear ghost tasks on original host, and resend to another one.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54652 - Posted: 11 May 2020 | 9:41:47 UTC - in response to Message 54650.

What is the ghost recovery procedure on this project?
I've tried the way it works for SETI, but it didn't work here.
Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54653 - Posted: 11 May 2020 | 9:46:56 UTC - in response to Message 54642.

the present supply (171,016) will last for about 4 days from now.
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54654 - Posted: 11 May 2020 | 9:52:57 UTC - in response to Message 54651.
Last modified: 11 May 2020 | 9:54:26 UTC

Ghost tasks are on GPUGRID's server side.
After 5 days deadline is past, server will automatically clear ghost tasks on original host, and resend to another one.

[Clarification]

We call "Ghost task" to that the server counts as sent to a Host, but for any reason, it was not really received.
It doesn't interfere at the host side, as BOINC Manager will not see these ghost tasks, and it will continue asking for new tasks until tasks buffer is full, or maximum "2 tasks per GPU" is achieved.
On the server's side, ghost tasks are wrongly being counted as "In process" tasks, while really they are not.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54656 - Posted: 11 May 2020 | 9:59:54 UTC - in response to Message 54653.

The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

What is coming next, is a mystery...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,157,167,459
RAC: 14,602,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54660 - Posted: 11 May 2020 | 16:13:33 UTC - in response to Message 54652.

What is the ghost recovery procedure on this project?
I've tried the way it works for SETI, but it didn't work here.
Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem.

Thanks Zoltan, I tried my Seti ghost recovery protocol and it didn't work either.
I managed to pick up 10 ghosts and wanted to clear them.
Good thing the deadline here is so short compared to Seti.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54662 - Posted: 11 May 2020 | 17:20:09 UTC

These ghost tasks seem to occur after the server runs out of disk space. Are they somehow related to that? 🤔
_____________________________

An unrelated item: Anybody else getting this error?

(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
01:29:38 (6776): wrapper (7.9.26016): starting
01:29:38 (6776): wrapper: running acemd3.exe (--boinc input --device 0)
EXCEPTIONAL CONDITION: src\mdio\bincoord.c, line 193: "nelems != 1"
01:29:40 (6776): acemd3.exe exited; CPU time 0.015625
01:29:40 (6776): app exit status:


It apparently signals that the WU is bad- when you track them. After getting 6 of them I'm curious what the bug might be. Bad code?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1065
Credit: 40,231,533,983
RAC: 22,690
Level
Trp
Scientific publications
wat
Message 54663 - Posted: 11 May 2020 | 17:21:14 UTC - in response to Message 54662.

yes, I saw a bunch of bad WUs. checking the resends, they are all erroring out also on different hosts.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,157,167,459
RAC: 14,602,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54664 - Posted: 11 May 2020 | 18:50:58 UTC

Looks like a lot of tasks lost their file references on the storage. Can't pull the correct data for the tasks.

<core_client_version>7.17.0</core_client_version>
<![CDATA[
<message>
ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/trajectory.cpp line 135: Simulation box has to be rectangular!
07:01:16 (1119448): acemd3 exited; CPU time 0.557061
07:01:16 (1119448): app exit status: 0x9e
07:01:16 (1119448): called boinc_finish(195)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1617
Credit: 8,176,644,351
RAC: 16,733,896
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54665 - Posted: 11 May 2020 | 19:09:47 UTC - in response to Message 54664.

I'm interpreting that message as "file is present, but contains bad contents".

On another aspect of the 'error task' problem. I'm using a very ancient predecessor of BoincTasks. It (and I think BoincTasks itself), retains the concept of "CPU efficiency", which was withdrawn from BOINC Manager several years ago.

What I'm seeing for Windows tasks is that the ACEMD worker app crashes seconds after launch, but the Wrapper app doesn't notice for some time - the task as a whole is seen by BOINC as continuing to run. This shows up as a CPU efficiency of 0.0000 (helpfully colour coded) - no CPU time is being measured for the task as a whole, instead of the usual 96% - 97%.

That low efficiency warning prompts me to look at the workunit on the website, and see if there are any previous failures (the replication number is a good hint, as well). If it's a bad workunit, I can abort and move on with less wasted time overall.

It's a technique which some users might find helpful.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,754,860,632
RAC: 9,459,765
Level
Trp
Scientific publications
watwatwat
Message 54666 - Posted: 11 May 2020 | 19:19:39 UTC - in response to Message 54656.

The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

What is coming next, is a mystery...

And what we're finishing now is a complete and utter mystery as well.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54674 - Posted: 12 May 2020 | 18:32:49 UTC - in response to Message 54666.
Last modified: 12 May 2020 | 18:34:20 UTC

And what we're finishing now is a complete and utter mystery as well


I've only been able to glean that it is a vigorous attempt at mapping the simulation environment which is meant to improve (or simplify?) future modeling methods.

If one of the admins would want to comment, we're all ears...
👂👂👂👂👂🦻👂😉

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54675 - Posted: 12 May 2020 | 19:05:46 UTC

New version of ACEMD: 73,631 Unsent tasks left

⏳️

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54686 - Posted: 14 May 2020 | 9:49:10 UTC - in response to Message 54653.

the present supply (171,016) will last for about 4 days from now.
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)
3 days passed, there are 11.806 workunits left, this supply will last for another 6~7 hours.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,754,860,632
RAC: 9,459,765
Level
Trp
Scientific publications
watwatwat
Message 54690 - Posted: 14 May 2020 | 18:28:49 UTC

They're all gone, so what now?

Ben
Send message
Joined: 28 Dec 14
Posts: 9
Credit: 149,574,556
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 54691 - Posted: 14 May 2020 | 18:47:42 UTC - in response to Message 54690.
Last modified: 14 May 2020 | 18:50:50 UTC

Our poor GPUs start getting hangry!! :)

And I was pushing so hard for the magic 100m milestone. :(

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54693 - Posted: 14 May 2020 | 20:07:15 UTC - in response to Message 54691.

They're all gone, so what now?

I liked this expresion:

...is a complete and utter mystery...

Familiar?
(I took note for such a moment like this)

Now that unsent tasks have reached and stuck on zero, the topic of this thread recovers full sense: Unsent tasks decreasing much more slowly
(Unless negative values are permitted, who knows?)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54696 - Posted: 14 May 2020 | 21:19:32 UTC - in response to Message 54690.
Last modified: 14 May 2020 | 21:23:00 UTC

They're all gone, so what now?
It will take at least 5-10 days (or more) until all the workunits out in the field are finished (or timed out, and finished on another host).
I don't expect that another batch will be queued until then.
Exam period is coming, then the summer break is coming, so perhaps there won't be much work queued soon.
Unless Toni prepared some COVID-19 related work. Or perhaps we could help out the Acellera drug design people doing their job.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1124
Credit: 9,058,545,176
RAC: 27,459,788
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54698 - Posted: 15 May 2020 | 5:38:07 UTC

the difference between the tasks of the current series in contrast to all the others before is:
whereas, before, tasks still could be downloaded once in a while, as long as there were enough tasks "in process", here this seems not to be the case.
Once the "unsent" queue is dry, no more tasks can be downloaded.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,157,167,459
RAC: 14,602,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54699 - Posted: 15 May 2020 | 6:56:25 UTC

I picked up 4 resends after the RTS buffer had hit zero today.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1617
Credit: 8,176,644,351
RAC: 16,733,896
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54700 - Posted: 15 May 2020 | 7:16:57 UTC - in response to Message 54699.

I picked up 4 resends after the RTS buffer had hit zero today.

Were they from the 'instant crashing' batch? I've had a few of those recently, though I haven't checked to see if I got any while I was asleep.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54702 - Posted: 15 May 2020 | 7:49:18 UTC - in response to Message 54700.

The large batch has essentially finished. If there are MDAD left, they are probably failing leftovers.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,157,167,459
RAC: 14,602,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54707 - Posted: 15 May 2020 | 15:42:33 UTC - in response to Message 54700.

No,3 in fact were original issue -1's from yesterday. Must have been the very last issued. One was a -2 resend from an aborted user. None were from the badly formatted task run. I got lucky.

I was just surprised to see the cache increase after I had seen the RTS count on the SSP go to zero.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1617
Credit: 8,176,644,351
RAC: 16,733,896
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54709 - Posted: 15 May 2020 | 18:12:52 UTC - in response to Message 54707.

At this project, only the _0 are original issue. _1 is already a replacement, unlike projects which use comparison validation.

But I'm glad you got some meat off the bones.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,157,167,459
RAC: 14,602,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54710 - Posted: 15 May 2020 | 19:32:20 UTC

Thanks for correcting me Richard. I forgot about the workunits on this project with quorum of 1.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54713 - Posted: 16 May 2020 | 2:57:26 UTC

I figure what we see trickle out for a while will be timed-out tasks that are recycled by Grosso. I'm curious as to how many tasks expire on how many hosts by the end of a run the size of this one.

Friends, don't dismiss the amount of raw computing power that the project we all have just accomplished represents! Even if Grosso isn't always as awesome as its nickname suggests, the DC network it supports is truly awesome in its potential power. That power finds its source in every one of us and our individual support.

And beside that, where else can we accrue BOINC cobblestones this fast? 😉

Hey, look at this little hiatus as "recess", where you get to go find out what the kids in the other classes are crunching. You might get a little case of hardware lust.

Just remember to practice virtual social distancing! 💻🌎🖥🌏🖥🌍💻😎

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1617
Credit: 8,176,644,351
RAC: 16,733,896
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54714 - Posted: 16 May 2020 | 8:21:36 UTC

Looks like I'm participating in that trickle-down, too. Somebody let WU 19993861 slip past it's deadline, so they tossed it back for me.

oemuser
Send message
Joined: 18 Sep 16
Posts: 10
Credit: 1,291,979
RAC: 0
Level
Ala
Scientific publications
wat
Message 54716 - Posted: 16 May 2020 | 10:17:46 UTC

folding@home has many GPU work units now against Corona Virus. So that would be a good option.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 748,770,933
RAC: 127,823
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54718 - Posted: 16 May 2020 | 14:19:30 UTC - in response to Message 54716.

folding@home has many GPU work units now against Corona Virus. So that would be a good option.

I signed up for folding@home at least a week ago, and then enabled GPU work for them. No GPU work downloaded so far, only CPU work.

Also, so far, I've been unable to log into their forums.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54719 - Posted: 16 May 2020 | 15:23:38 UTC - in response to Message 54718.
Last modified: 16 May 2020 | 15:29:40 UTC

I signed up for folding@home at least a week ago, and then enabled GPU work for them. No GPU work downloaded so far, only CPU work.

Also, so far, I've been unable to log into their forums.

Things seem to be a bit strange on Folding at the moment. I can't get to the forums either, but I have been getting work regularly (both CPU and GPU) for a couple of weeks. But on some cards I don't get any work. It is not a difference in the cards, but some of their servers have more problems than others, due to the recent growing pains. If you try later, you will probably get some.

And make sure you are using their latest release. They have fixed a few bugs recently that could hang up getting work.
https://foldingathome.org/start-folding/

EDIT: I think this explains it.
https://foldingathome.org/2020/05/16/foldingforum-org-is-currently-out-of-service/

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 748,770,933
RAC: 127,823
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54720 - Posted: 16 May 2020 | 19:09:27 UTC - in response to Message 54719.

OK, that site offers a more recent version.

I hope that updating it will not disturb work in progress - there doesn't seem to be a way to tell the previous version to finish any work in progress, but not start more.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54721 - Posted: 17 May 2020 | 1:29:02 UTC - in response to Message 54720.

Robert, I had to adjust the slider to full power to get my GPUs to engage. It will take a while to catch some available work the first time as I recall. Once you have GPU tasks you can run at any speed.
The app is great alongside of BOINC from my recent experience. I can multi-task my GPUs that way. Just remember twice the tasks means half the speed for each task.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 748,770,933
RAC: 127,823
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54724 - Posted: 17 May 2020 | 2:39:14 UTC

Pop Piasa,

Thanks. Than started my first Folding@Home GPU task.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55133 - Posted: 31 Jul 2020 | 10:06:15 UTC

Here we are again in the same situation:
My last received task is named 1nh8A02_348_1-TONI_MDADex2sn-49-50-RND4451_0
That is, we have arrived to the last tasks of current batch.
If repeating similar past behaviors, unsent tasks will start decreasing more quickly until they get exhausted.
With our effort, a new milestone at GPUGrid is likely to be reached...
It's somehow exciting: What will come the next?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1124
Credit: 9,058,545,176
RAC: 27,459,788
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 55134 - Posted: 31 Jul 2020 | 15:35:51 UTC - in response to Message 55133.

What will come the next?

maybe tasks related to Corona?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55135 - Posted: 31 Jul 2020 | 15:39:47 UTC

August is next.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55136 - Posted: 31 Jul 2020 | 15:43:41 UTC - in response to Message 55134.

maybe tasks related to Corona?

🤞️🤞️

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55137 - Posted: 31 Jul 2020 | 15:55:38 UTC - in response to Message 55136.

There are so many anti-virals and vaccines under testing now, and probably available in a few months, that any additional computer project would get lost in the noise. I think they are better in focusing on more general subject matter, where they can make a difference in the longer term. The databases or whatever it is that they are establishing look worthwhile to me.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55138 - Posted: 31 Jul 2020 | 16:10:55 UTC - in response to Message 55137.

About two days of tasks left to think about it...

⏳️

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 748,770,933
RAC: 127,823
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55139 - Posted: 31 Jul 2020 | 19:25:48 UTC - in response to Message 55137.
Last modified: 31 Jul 2020 | 19:32:18 UTC

There are so many anti-virals and vaccines under testing now, and probably available in a few months, that any additional computer project would get lost in the noise. I think they are better in focusing on more general subject matter, where they can make a difference in the longer term. The databases or whatever it is that they are establishing look worthwhile to me.

True for CPU workunits, but GPU workunits related to medical research are hard to find elsewhere. There appear to be none using BOINC. Folding@home and Quarantine@home have some non-BOINC GPU workunits, though. The ones at Quarantine@home are for Linux only.

The OpenPandemics subproject at World Community Grid is looking into whether the right software is available to allow GPU workunits for them. If so, they may start while they are still working on COVID-19.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55140 - Posted: 31 Jul 2020 | 19:46:51 UTC - in response to Message 55139.
Last modified: 31 Jul 2020 | 19:47:40 UTC

All true, but that is not the point. It is not what GPU work the projects can provide, but what we can do for the projects, and hence the science. Even if you handed them a new recipe to test now (there seem to be hundreds waiting in the wings to be tested), they could not do it for several more months. By then at least a first round of anti-virals will be ready, with the vaccines not far behind.

As you know, there has been a huge effort by labs all over the world. Just because it is a GPU project does not guarantee any special status for the results. This video, posted by Jim Slade on WCG, shows the effort underway in the U.S. to test them all, and figure out how to produce them in time.
I don't think we can do more here.
https://www.youtube.com/watch?v=uEgfwVy9w8A&feature=youtu.be

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55143 - Posted: 1 Aug 2020 | 20:35:59 UTC

If repeating similar past behaviors, unsent tasks will start decreasing more quickly until they get exhausted.

I love GPUGrid. It is a surprises box...
Suddenly, behavior has given a quantum jump, and it somehow differs from precedents:
Unsent tasks started to decrease more slowly today, in coincidence with receiving new tasks as:
e1s241_17gen-PABLO_UCB_NMR_KIX_CMYB_8-0-5-RND1858_1
e1s321_villin_100ns_6-ADRIA_VillinAdaptive100ns-0-1-RND7450_0
1b5fC00_320_4-TONI_MDADpr4sb-0-10-RND9302_2
1ac5A00_320_0-TONI_MDADex7sa-0-50-RND9726_2
...
To be continued... (?)

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 748,770,933
RAC: 127,823
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55145 - Posted: 2 Aug 2020 | 1:08:25 UTC - in response to Message 55143.

If repeating similar past behaviors, unsent tasks will start decreasing more quickly until they get exhausted.

I love GPUGrid. It is a surprises box...
Suddenly, behavior has given a quantum jump, and it somehow differs from precedents:
Unsent tasks started to decrease more slowly today, in coincidence with receiving new tasks as:
e1s241_17gen-PABLO_UCB_NMR_KIX_CMYB_8-0-5-RND1858_1
e1s321_villin_100ns_6-ADRIA_VillinAdaptive100ns-0-1-RND7450_0
1b5fC00_320_4-TONI_MDADpr4sb-0-10-RND9302_2
1ac5A00_320_0-TONI_MDADex7sa-0-50-RND9726_2
...
To be continued... (?)

This suggests that the "new" tasks are really old tasks that required manual fixes to the input files before they could be sent again, and those manual fixes are getting harder and harder.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55147 - Posted: 2 Aug 2020 | 10:01:48 UTC
Last modified: 2 Aug 2020 | 10:05:39 UTC

If repeating similar past behaviors, unsent tasks will start decreasing more quickly until they get exhausted.
About two days of tasks left to think about it...
Suddenly, behavior has given a quantum jump, and it somehow differs from precedents:

Due to this fact, my current estimate for remaining unsent tasks has changed from a couple of days to more than a dozen.
Thank you very much to all those goblins in the shadow...

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56155 - Posted: 27 Dec 2020 | 12:51:22 UTC

At the present rate, I roughly estimate that current batch of TONI_MDAD available WUs will start to decrease at a significant higher rate at around next Sunday (January 3rd 2021)
After that, they will be extinguishing in about one more week.
If nothing changes in between, we faithfull crunchers and project's Team will be achieving another milestone at GPUGrid...

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56220 - Posted: 1 Jan 2021 | 10:03:27 UTC - in response to Message 56155.

At the present rate, I roughly estimate that current batch of TONI_MDAD available WUs will start to decrease at a significant higher rate at around next Sunday (January 3rd 2021)
After that, they will be extinguishing in about one more week.
If nothing changes in between, we faithfull crunchers and project's Team will be achieving another milestone at GPUGrid...

At current rate, your prediction from last year (27th dec) is looking pretty good!

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1617
Credit: 8,176,644,351
RAC: 16,733,896
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56229 - Posted: 2 Jan 2021 | 16:30:00 UTC

All gone. We're down to the resends.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,003,487,024
RAC: 16,568,484
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56232 - Posted: 2 Jan 2021 | 18:46:30 UTC

All gone. We're down to the resends.

My backup GPU projects are waking up from their lethargy.
In the interim, I'll be in the remote chance to discover a new prime number at PrimeGrid, or a new pulsar at Einstein@Home...
⏳️

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 56233 - Posted: 3 Jan 2021 | 4:14:45 UTC

No more GPUGRID work? For how long?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1065
Credit: 40,231,533,983
RAC: 22,690
Level
Trp
Scientific publications
wat
Message 56234 - Posted: 3 Jan 2021 | 5:14:52 UTC - in response to Message 56233.

No telling. Could be a couple days. Could be a couple weeks. Could be months.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1124
Credit: 9,058,545,176
RAC: 27,459,788
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56235 - Posted: 3 Jan 2021 | 6:59:45 UTC - in response to Message 56234.

No telling. Could be a couple days. Could be a couple weeks. Could be months.

I don't understand why we are not given at least a rough idea of how long it could take :-(

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 3,058,436,456
RAC: 2,221,612
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56236 - Posted: 3 Jan 2021 | 7:10:29 UTC
Last modified: 3 Jan 2021 | 7:10:59 UTC

https://www.dictionary.com/e/slang/take-for-granted/

WHAT DOES TAKE FOR GRANTED MEAN?

The expression to take for granted means "to accept without question or objection," and often implies a lack of appreciation or gratitude. (E.g., "Many of us may take for granted the fact that we have access to clean drinking water.")

When it come to people, to take someone for granted means to take advantage of, show no appreciation for, or undervalue them.

____________
Reno, NV
Team: SETI.USA

Post to thread

Message boards : Number crunching : Unsent tasks decreasing much more slowly

//