Advanced search

Message boards : Graphics cards (GPUs) : Credit where Credit is due ...

Author Message
Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 16619 - Posted: 29 Apr 2010 | 5:07:28 UTC

Here goes Paul, tilting at windmills again ...

This Task Set (or whatever termonology UCB is using this week, had a 100% failure rate on all that attempted it ... now, one of my favorite projects pays even for failures because the failures are equally interesting to the project ...

I don't know how often this happens, but to me, this is a clear case where the project should also be paying (in my opinion) because we made the good faith effort to produce a result for a flawed task ... we can debate the pay rate ... but, my point is, it was not the fault of the people who did the work that you asked them to do something that was not possible ...

I know, I know, cobblestones are worthless ... well if they are... what is the beef with paying them?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16620 - Posted: 29 Apr 2010 | 7:53:05 UTC - in response to Message 16619.
Last modified: 29 Apr 2010 | 7:53:27 UTC

Sorry Paul - is the "other" project which is paying for failures BOINC-based?

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16624 - Posted: 29 Apr 2010 | 12:23:37 UTC - in response to Message 16620.

Sorry Paul - is the "other" project which is paying for failures BOINC-based?


Well CPDN does and its BOINC based, but it depends on trickles to get there. Given GPUgrid doesn't use trickles that might present an issue (or an opportunity - you could have larger wu using trickles).
____________
BOINC blog

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 16626 - Posted: 29 Apr 2010 | 14:59:09 UTC - in response to Message 16624.

Sorry Paul - is the "other" project which is paying for failures BOINC-based?


Well CPDN does and its BOINC based, but it depends on trickles to get there. Given GPUgrid doesn't use trickles that might present an issue (or an opportunity - you could have larger wu using trickles).

As MarkJ noted one of the projects is CPDN...

I was also thinking of WCG where the new sub-project DDDT2 has some molecules that when modeled "blow-up" and the application reports a failure. The just went from a 5 to 3 failure test (3 replications vice 5) and they pay because the failures of the application to solve essentially demonstrates that that line of research is a dead end ... in this case they do a post award after all the failed results are in ...

When they have a suite of "successful" tasks they create the next generation of tasks which is larger and builds on the ones that "worked"

Back to CPDN, their tasks run for hundreds of hours of course which is far more than GPU Grid and WCG, but the principle is the same ... if the task has a likelihood of failure, do not punish the willing for a failure of the project ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16635 - Posted: 29 Apr 2010 | 18:53:20 UTC

That would mean something along the lines of
"If enough people can't run a WU, there must be something wrong project-wise, so pay them at least something"?

Sounds fair enough, if it doesn't somehow lead to cheating. I.e. program your own app which
- checks other results of a WU
- if it finds an error, returns the same error and a certain runtime
- if it doesn't find a previous error generates a probable one and a certain runtime

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 16638 - Posted: 29 Apr 2010 | 19:34:27 UTC - in response to Message 16635.
Last modified: 29 Apr 2010 | 19:38:30 UTC

That would mean something along the lines of
"If enough people can't run a WU, there must be something wrong project-wise, so pay them at least something"?

Sounds fair enough, if it doesn't somehow lead to cheating. I.e. program your own app which
- checks other results of a WU
- if it finds an error, returns the same error and a certain runtime
- if it doesn't find a previous error generates a probable one and a certain runtime

MrS

I will grant that there is that small class of users that are crass enough that they would invest the effort... the problem is that instead of looking for those people we choose instead to punish (in effect) those that are sincerely trying to help, and through no fault of their own, can't ...

Now, on a more practical matter, few tasks fail like this on any of the projects, on CPDN you have to have part of the work done and return the trickles, on WCG I am not sure, but I think they have other sanity checks as well ... not sure what about here if the failed task returns partly completed files or not ... but the point is that the person would have to get one of the rares, check to find it is failing and then forge a satisfactory result... quite a feat ... easier just to run the damn thing I would think ... Besides, he/she would have to be the last of the loop to know that it was worthwhile to attempt to forge the failure ...

Again, quite a feat of arms as it were ...

{edit-add}
BTW, I will point out that the 5-12 hour run times on a GPU is equivalent to 300-1,200 hours on the CPU, well above CPDN's run time equivalency at the current stage of their models ... so, we are in the same general vicinity of total calculations done in a model ... the GPUs just do is so much faster that we lose sight of the massive amount of work that is actually being accomplished... as I have noted elsewhere and elsewhen, an MW task takes hours to run on a CPU core, but I am running them off in less than 2 minutes on my GPUs ...

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16664 - Posted: 30 Apr 2010 | 7:55:26 UTC - in response to Message 16638.
Last modified: 30 Apr 2010 | 10:45:28 UTC

Cheating is a concern, in fact, but also random errors due to overclocking/driver issues are IMHO. Up to now, the vast majority of the errors that we see are due to misconfigured hosts rather than WU mistakes. Incidentally, that's the reason why we can't reliably (ie automatically) figure out "erroneous" wu right away.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16674 - Posted: 30 Apr 2010 | 13:14:16 UTC

How about awarding credits for failures only after a WU reaches the "too many failures" status?
____________
Thanks - Steve

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 16675 - Posted: 30 Apr 2010 | 14:27:48 UTC - in response to Message 16664.

Cheating is a concern, in fact, but also random errors due to overclocking/driver issues are IMHO. Up to now, the vast majority of the errors that we see are due to misconfigured hosts rather than WU mistakes. Incidentally, that's the reason why we can't reliably (ie automatically) figure out "erroneous" wu right away.

I would think you would be able to coorelate the reliablity index of the participating computers as part of the process ... at any rate,

- this is a suggestion to consider...
- it does not have to be automatic ...
- it should as I originally suggest (I thought), and as Snow Crash suggested, be after "too many failures" is reached...
- Cheating and mis-configuration are legitimate concerns of the project ...
- tasks that are impossible to process are equally legitimate concerns of the participant ...
- we do best when cooperation is the word of the day, and everyone's concerns are considered ...

Thanks for taking the time to think about this ...

:)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16678 - Posted: 30 Apr 2010 | 14:39:52 UTC - in response to Message 16674.

Perhaps it could be done for Betas; the limited Beta numbers would deter sneaky programmers from building apps to point steal.
Betas are usually the tasks that fail the most because the task/app has a problem.

The normal tasks, as said, usually fail as a result of other problems (system stability, other programs [especially games], bad configurations, OC, Boinc problems, hard restarts, summer heat, failing cards, hardware limitations [RAM or drive space], network issues...). So if you reward people for messing up their system configuration it is counterproductive - Better to encourage stability and give help in the forum.

If a normal batch of tasks start failing, on several users systems, perhaps points could be rewarded for time spent on the basis that they should have been put through Betas!

My main concern is that this would take up too much time for the scientists. If they had to spend 2h a week awarding points, then over a year that’s 100hours they could have been spending developing faster apps, which in turn would result in more points anyway! Oh, and more science ;)

What do you want, a few more points now, or your ATI cards to work in 2 months rather than 6 months?

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 16687 - Posted: 30 Apr 2010 | 19:00:08 UTC - in response to Message 16678.

You posit this as an either or to the exclusion of both ...

This one instance is an example... I could also have, and probably should have pointed back to the many runs where the use of incorrect parameters or other problems caused the users signficant issues and many failed tasks ... less common for the moment ... but who knows when that might return?

The other point is that the whole reason for the issue multiple times is to detect these issues ... but a task that fails on 5 different systems is not likely a task that is failing for the reasons you posit ... I mean, I cannot chose the tasks to run, the only way I can do that is to abort lots of tasks and check each one on the off-chance I can find myself as the 5th of the group that has already failed 4 times before ... but in doing that I run my total tasks per day into the toilet as well ...

Sorry, but everyone seems to be looking for even the slimmest excuse to not think about doing something like this on the off-chance that one person somewhere some time might get something they don't deserve instead of thinking of all of those that are putting in hours of compute time for no pay, and that is a far larger group ...

Or are you saying that the vast majority of participants are cheaters? Heck, even the OC crowd is fairly small ... most of the people I know don't OC, it is not all that common because it is hard to do right, easy to do wrong (which means lots of bad results) ...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16692 - Posted: 30 Apr 2010 | 20:19:50 UTC - in response to Message 16687.

I think that is a slightly limited take on what I said.

I dont like that idea of the scientists doing unnessary work, especially if it takes from the science, and as I explained we could all end up getting less points as a result of limiting development of apps for both NVidia and ATI cards!

Post to thread

Message boards : Graphics cards (GPUs) : Credit where Credit is due ...

//