Advanced search

Message boards : Number crunching : Processing path of GPUGrid data

Author Message
Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54657 - Posted: 11 May 2020 | 10:46:52 UTC

Richard Haselgrove wrote:
What we don't know - at least, I certainly don't know, and I've not seen it described here, ever - is what exactly the processing path of that data is after our raw results are returned to the server.

We do know that each of our tasks forms part of a sequential sequence of (currently) 10 tasks making up the entire job, and that at least some of our returned data is used to assemble the starting data for the next task in the sequence.
The returned data is the starting data of the next task.
The size of the next data may be less than the result, but its content is the same.

Richard Haselgrove wrote:
Is it all used in that way? Once it's been used, does it need to be kept? If so, how long? Can it (any of it) be discarded once the next task in sequence has been created? Has been completed? Once the whole 10-task job has been completed?
This should be answered by the staff.
My guess is that the useful part of the data should be kept forever.
The question is the percentage of the useful data.

Richard Haselgrove wrote:
People in other threads have mentioned SETI as a comparison. There, the process is that the scientific data returned by each task is assimilated into a gigantic, 20-year, scientific database. And that once assimilation has taken place, our raw, returned, data is erased (usually within 24 hours).

The difference in data path between SETI and GPUGrid is that there's no need for validating (at least by comparing) results from different hosts in GPUGrid, so there's no need for keeping the results until they get validated by another host.
While the SETI app transforms the raw data into an analyzable form, and that transformation should result in uniform output independently of the device it gets done on, it can be (therefore it should be) validated by sending it to a different device. Contrary to SETI, the GPUGrid app creates the data to be analyzed, and there's a lot of randomization going on in that process so two outputs of the same workunit will be different (the result can't be validated by a simple comparison with the output of the same workunit on a different device).

Richard Haselgrove wrote:
If we knew for certain that our returned data needed to be retained in quick-access online storage, say until the final paper had been accepted for publication following peer review, then I'd be prepared to contribute to a fundraising drive for additional disk spindles and a chassis to mount them in. But if the daily data is simply transferred over a slow link to an offsite backing store, then spindles aren't the answer: more drives would simply delay the need for an outage from a 5 day to a 10 day interval, and then extend that outage when it eventually arrived.

Post to thread

Message boards : Number crunching : Processing path of GPUGrid data

//