11) Message boards : News : ACEMD 4 (Message 58758)
Posted 700 days ago by Profile Retvari Zoltan
The ACEMD4 app puts less stress on the GPU, than the ACEMD3 app.
ACEMD3 on RTX 3080Ti: 1845MHz 342W
ACEMD4 on RTX 3080Ti: 1920MHz 306W
I made similar observations on RTX 2080Ti, though I didn't record the exact numbers yet.
12) Message boards : News : ACEMD 4 (Message 58756)
Posted 700 days ago by Profile Retvari Zoltan
The ACEMD3 could restart from the checkpoint, so it will finish eventually.
It has failed actually. :(
It was suspended a couple of times, so I set "no new task" to make it finish within 24 hours, but it didn't help.
13) Message boards : News : ACEMD 4 (Message 58748)
Posted 701 days ago by Profile Retvari Zoltan
I don't think much chance at all. We've blown through all those 1000 tasks I think.
Not yet.
The task which preempted the ACEMD3 task is:
P0_NNPMM_frag_85-RAIMIS_NNPMM-5-10-RND6112_0
The blue number is the total number of tasks in the given sequence
The red number is the number of the task in the given sequence (starting from 0, so the last one will be 9-10)
The green number is the number of resends.
So those 1000 tasks are actually 100 task sequences, each sequence is broken into 10 pieces.
14) Message boards : News : ACEMD 4 (Message 58746)
Posted 701 days ago by Profile Retvari Zoltan
The host from my previous post has received an ACEMD3 task. It has reached 16.3%, when the host received an ACEMD4 task, which took over, as the latter has much shorter deadline. The ACEMD3 could restart from the checkpoint, so it will finish eventually. I wonder how many times the ACEMD3 taks will be suspended, and how many days will pass until it's completed.
15) Message boards : News : ACEMD 4 (Message 58745)
Posted 701 days ago by Profile Retvari Zoltan
Take a look at the tasks on my host.
It's very easy to spot the one which was restarted without the checkpoint.
16) Message boards : News : ACEMD 4 (Message 58743)
Posted 701 days ago by Profile Retvari Zoltan
The BOINC manager UI shows 42 minutes left, while the work fetch debug shows 2811 minutes:
[work_fetch] --- state for NVIDIA GPU --- [work_fetch] shortfall 0.00 nidle 0.00 saturated 167688.92 busy 167688.92
That's odd, because the fraction_done_exact isn't set in the app_config.xml
17) Message boards : News : ACEMD 4 (Message 58736)
Posted 702 days ago by Profile Retvari Zoltan
Checkpointing is still not active.
I confirm that.
18) Message boards : Server and website : Failed to add project (Message 58732)
Posted 702 days ago by Profile Retvari Zoltan
Update your BOINC manager to the latest version (it's 7.16.20 at the moment).
https://boinc.berkeley.edu/download.php
19) Message boards : Server and website : Undying task (Message 58526)
Posted 741 days ago by Profile Retvari Zoltan
How about this one?
20) Message boards : News : Experimental Python tasks (beta) - task description (Message 58513)
Posted 745 days ago by Profile Retvari Zoltan
Would it be better to create a new app for real jobs once the testing is finished?
Based on the last few days' discussion here, I've understood the purpose of the former short and long queue from GPUGrid's perspective:
By separating the tasks into two queues based on their length, the project's staff didn't have to bother setting the rsc_fpops_est value for each and every batch, (note that the same app was assigned to each queue). The two queues had used different (but constant through batches) rsc_fpops_est values, so the runtime estimation of BOINC could not get so much off in each queue that would tigger the "won't finish on time" or the "run time exceeded" situation.
Perhaps this practise should be put in operation again, even on a finer level of granularity (S, M, L tasks, or even XS and XL tasks).


Previous 10 | Next 10
//