1) Message boards : GPUGRID CAFE : The milestone thread (Message 46785)
Posted 47 minutes ago by Profile Retvari Zoltan
Zoltan, what was your RAC back in the day, say 2012.
In 2012 my RAC started at 1.6M, and its maximum was 3.5M.
From May 2010 (GTX 480) to December 2010 I've earned ~40M credits, now it takes 4 days :).

How many GPUs did you have at the time and would you say they were high end?
I had two GTX 590s, one GTX 580 and 3 maybe 4 GTX 480s, then I've sold them and I've bought two GTX 670s and two GTX 680s (later a 3rd one and a GTX 690). I've always had high-end GPUs.

From your post it sounds like they are not scaling the RAC and rather it is purely FLOPS based.
The RAC depends on many factors:
1. the CUDA version used by the client (it had a big jump (+40%) when the CUDA4.2 client has been released)
2. the credit/sec ratio is different for different batches
3. the number of GPUs
4. the computing speed of the GPUs
5. the availability / reliability of workunits
6. the availability / reliability of the hosts
2) Message boards : GPUGRID CAFE : The milestone thread (Message 46770)
Posted 1 day ago by Profile Retvari Zoltan
Just wondering, since computational power increases exponentially, I'm guessing the RAC for this project scales with the FLOPS at the time. I assume because crunching back in 2010 would yield nearly no credit as compared to today if that were the case.

See my retrospective post about it.
My last billion took 115 days, while my first billion took 28 months.
3) Message boards : Number crunching : What are these "SWAN" errors on my 8-12hr tasks? (Message 46769)
Posted 1 day ago by Profile Retvari Zoltan
I would try:
1. open an elevated command prompt (right click on start)
2. type chkdsk /f /x <enter>
3. don't touch the keyboard until your PC finish checking c:
If it does not help:
1. download display driver uninstaller
2. download the latest nVidia driver
3. disconnect (disable) the network
4. exit BOINC with stopping scientific applications
5. remove display drivers with DDU
6. restart PC
7. install the latest nVidia driver
8. restart PC
9. try GPUGrid again
If it does not help, you should try to reset the GPUGrid project within BOINC manager
4) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46761)
Posted 2 days ago by Profile Retvari Zoltan
Has the problem been fixed?
Yes.
There still could be some faulty workunits in the long queue, but those are not threatening the daily quota.
5) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46757)
Posted 2 days ago by Profile Retvari Zoltan
I've just picked up a 4th replication from workunit e34s5_e17s62p0f449-PABLO_p53_mut_7_DIS-0-1-RND8386. From the PABLO_p53 and the _4 at the end of the task name, I assumed the worst - but it's running just fine. Don't assume that every failure - even multiple failures - comes from a faulty workunit.
If there's the
ERROR: file mdioload.cpp line 81: Unable to read bincoordfile
message in many of the previous task's stderr.txt output file, then it's a faulty task.
The one you've received is failed 4 times, from 3 different reasons (but none of them is the one above):

1st & 3rd:
<message> process exited with code 201 (0xc9, -55) </message> <stderr_txt> # Unable to initialise. Check permissions on /dev/nvidia* (err=100) </stderr_txt>

2nd (that's the most mysterious:)
<message> process exited with code 212 (0xd4, -44) </message> <stderr_txt> </stderr_txt>

4th:
<message> (unknown error) - exit code -80 (0xffffffb0) </message> <stderr_txt> ... # Access violation : progress made, try to restart called boinc_finish </stderr_txt>

BTW things are now back to normal (almost), some faulty workunits are still floating around.
6) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46750)
Posted 3 days ago by Profile Retvari Zoltan
When this is all over there should be a publication badge for participation in faulty Pablo WUs ;-)
Indeed. This should be a special one, with special design. I think of a crashed bug. :)
7) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46748)
Posted 3 days ago by Profile Retvari Zoltan
... so the daily quota will be recovered in a couple of days.

still it's a shame that there is no other mechanism in place for cases like the present one :-(
You can't prepare a system to every abnormal situation. BTW you'll receive workunits while your daily quota is lower than its maximum. The only important factor is that a host should not receive many faulty workunits in a row, because it will "blacklist" that host for a day. This is a pretty good automatism to minimize the effects of a faulty host, as such a host would exhaust the queues in a very short time if there's nothing to limit the work assigned to a faulty host. Too bad that this generic error combined with this self-defense made all of our hosts blacklisted, but there's no defense of this self-defense. I've realized that we are this "device", which could make this project running in such regrettable situations.
8) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46747)
Posted 3 days ago by Profile Retvari Zoltan
You are likely to be suffering from a quota of one long task per day: if you allow short tasks in your preferences, it is possible (but rare) to get short tasks allocated

that's what BOINC is showing me:

22/03/2017 13:12:42 | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
22/03/2017 13:12:42 | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
22/03/2017 13:12:42 | GPUGRID | This computer has finished a daily quota of 1 tasks

So I doubt that could get short runs.
(your assumption is correct: I should be suffering on a long runs quota only, since no short runs were selected when the "accident" happened).
The short queue is empty, and the scheduler won't send you from the long queue, because of the host's decreased daily quota. You should wait for a couple of hours.
9) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46745)
Posted 3 days ago by Profile Retvari Zoltan
To my surprise, the faulty / working ratio is much better than I've expected.
I did a test with my dummy host again, and only 18 of 48 workunits were faulty.
I've received some of the new (working) workunits on my alive hosts too, so the daily quota will be recovered in a couple of days.
10) Message boards : Number crunching : BAD PABLO_p53 WUs (Message 46734)
Posted 3 days ago by Profile Retvari Zoltan
Well these broken tasks will have to run their course.
That will be a long and frustrating process, as every host can have only one workunit per day, but right now 9 out of 10 workunits is a broken one (so the daily quota of the hosts won't rise for a while), and every workunit has to fail 7 times before it's cleared from the queue.
To speed this up, I've created dummy hosts with my inactive host, and I've "killed" about 100 of these broken workunits. I had to abort some working units, but these are the minority right now.


Next 10