Posts by Saenger

1) Message boards : News : New VIL workunits (Message 29906)
Posted 3994 days ago by

Saenger

Same here:
http://www.gpugrid.net/workunit.php?wuid=4446165

2) Message boards : News : New workunits AGGMI1 on protein aggregation (Message 22356)
Posted 4561 days ago by

Saenger

I'm sending a number of these to further a study (almost complete) on the formation of pathological protein aggregates. I placed the WUs into the "short" queue - a GTX 275 (on Linux) should take around 8.5 hours for one.

On a GT240 they take over 50h, nothing even remotely short.
OK, my card is not the fastest, I usually don't even make the 24h deadline with normal, i.e. really short, WUs I get, but with those long running buggers there's no chance.
Why didn't you put them in the imho proper "long run" slot?
How can I avoid those WUs?

3) Message boards : Number crunching : My first NATHAN failed (Message 22078)
Posted 4603 days ago by

Saenger

Someone else had no problems processing that WU.
http://www.gpugrid.net/workunit.php?wuid=2694264

This indicates that it is your machine that caused the issue, not the WU.

As for using more CPU than GPU ... in the first couple of seconds this will always be true as the WU needs *stuff* from the main computer which it has to talk through the CPU to get.

As for the CPU:
It's not using more CPU-time than GPU-time, but more CPU-time than real-time.

As for NATHAN's:
The second one failed: I9R20-NATHAN_FA2-4-100-RND8746_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.34 GHz
# Total amount of global memory: 536150016 bytes
# Number of multiprocessors: 12
# Number of cores: 96
SIGABRT: abort called
Stack trace (13 frames):
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31(boinc_catch_signal+0x4d)[0x4819cd]
/lib/libc.so.6(+0x33af0)[0x7fb6dcd96af0]
/lib/libc.so.6(gsignal+0x35)[0x7fb6dcd96a75]
/lib/libc.so.6(abort+0x180)[0x7fb6dcd9a5c0]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x48f4ab]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x4341dc]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x430cd6]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x4303e7]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x414d99]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x407b1a]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x4083fe]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7fb6dcd81c4d]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x407899]

Exiting...

</stderr_txt>
]]>

This time more time on the clock than used CPU ;)

If it's my computer, that crunches most other GPUgrid stuff flawless, what's it?
It's a Linux (ubuntu 10.04) Intel (C2D9450@3.2GHz) nVidia (GT240) with new drivers (280.13) and 8GB RAM running BOINC 6.10.58.

The other WU type that fails always is GPCR:
74-KASHIF_GPCR_14_ba1-8-100-RND2374_2
Do they have anything in common the others don't have?

4) Message boards : Number crunching : My first NATHAN failed (Message 22047)
Posted 4605 days ago by

Saenger

I1R7-NATHAN_FA2-1-100-RND1497_0
No wingman yet, hasn't been resend.

Run time 1.011565
CPU time 2.68
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.34 GHz
# Total amount of global memory: 536150016 bytes
# Number of multiprocessors: 12
# Number of cores: 96
SIGABRT: abort called
Stack trace (13 frames):
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31(boinc_catch_signal+0x4d)[0x4819cd]
/lib/libc.so.6(+0x33af0)[0x7f40a2b22af0]
/lib/libc.so.6(gsignal+0x35)[0x7f40a2b22a75]
/lib/libc.so.6(abort+0x180)[0x7f40a2b265c0]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x48f4ab]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x4341dc]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x430cd6]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x4303e7]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x414d99]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x407b1a]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x4083fe]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f40a2b0dc4d]
../../projects/www.gpugrid.net/acemd2_6.14_x86_64-pc-linux-gnu__cuda31[0x407899]

Exiting...

</stderr_txt>
]]>

Is it something with my computer or a fault of the WU?

Edith asks:
How could the run time be less than the CPU-time? Multi-CPU-WU?

5) Message boards : Number crunching : too short deadline (Message 21981)
Posted 4606 days ago by

Saenger

http://www.gpugrid.net/workunit.php?wuid=2678561

it's any chance to give me the points?

I don't think so, you missed the deadline by 20h.
Perhaps you should consider to restrict that card to the short WUs.

6) Message boards : Number crunching : Duplicated work (Message 21975)
Posted 4608 days ago by

Saenger

I have just removed the duplication mechanism as announced in another thread.
It will take few days before all duplicated workunits disappear.
Deadline for now stays at 5 days.

gdf

Thank you very much!

So now it's like this:
less than 24h (except PYRT): bonus of 100%
less than 48h: bonus of 50%
before deadline: normal credits, no redundant results
after deadline: new result sent.

(BTW: What's the 100%-time for PYRT?)

7) Message boards : Number crunching : Relation run time -> Credit (Message 21970)
Posted 4609 days ago by

Saenger

Big differences in proportion to the run time of single WU with the Credits has struck me

Moin Werner,

I've lust asked a similar question here, and got kind of an answer.

I can't say anything about the browser problem though.

8) Message boards : Number crunching : How are credits and work done related? (Message 21964)
Posted 4610 days ago by

Saenger

Let me add also that *EGF*, *EGFR*, *PYRT*, *Fab* have a x2 of credits due to their unavoidable higher load of the CPU. In these experiments, we have to use a little iterative script that doesn't run on the GPU...
That was an old decision we took. Give extra credits for the burden of these WUs.

Hope this helps to explains the credit differences.

Not really.
To quote my starting post here:

IBUCH_1_mutEGF (10,591.10 claim): 750 c/h clock, 20.000 c/h CPU
IBUCH_PYRT (2771,44 claim): 297 c/h clock, 5.200 c/h CPU

Not 3x, but 2.5x the credits for EGF compared to PYRT.

9) Message boards : Number crunching : How are credits and work done related? (Message 21955)
Posted 4611 days ago by

Saenger

We can try to look at those two WU locally. 3x it should be impossible.

Here we go go for some examples of both extremes:
cut (with fixed credits of 5929.17476851852)
216-KASHIF_HIVPR_cut_ba1-71-100-RND5282_0: Run time: 89793.187448, CPU time: 1398.37
170-KASHIF_HIVPR_cut_ba1-70-100-RND8805_1: Run time: 89846.026353, CPU time: 1467.29

mutEGF (with fixed credits of 10591.0960648148)
p46-IBUCH_1_mutEGF_110726-15-20-RND1842_5: Run time: 50729.373009, CPU time: 1917.21

and some other ones:
GS_so (with fixed credits of 12822.1759259259)
357-KASHIF_HIVPR_GS_so_ba1-16-100-RND7458_1: Run time: 96936.183427, CPU time: 1260.3
122-KASHIF_HIVPR_GS_so_ba1-15-100-RND1587_2: Run time: 96907.045078, CPU time: 1270

PYRT (with fixed credits of 2771.44097222222)
184-IBUCH_PYRT_110728-27-50-RND8461_0: Run time: 33667.697068, CPU time: 1932.94

10) Message boards : Number crunching : How are credits and work done related? (Message 21939)
Posted 4613 days ago by

Saenger

Hi,
there is further case which adds variability to a workunit. This is depending on GPU RAM available. If you have less, the code is slower because it needs to accomodate for that with a different algorithm. We can correct for that in the next application release.

300% slower it should not happen unless we have made a mistake in the input or there is some sort of other problem.

It's on the same machine, all WUs have exactly the same resources, be it RAM (8GB), Nvidia-card (GT240), nice factor, whatever. All run in exactly the same environment but generate extremely different amounts of credit per hour, the extremes are 238 (cut) vs. 750 (mutEGF).

Regarding the 3-5 to days deadline. I agree. We are probably going to remove it. We have passed already from 2 to 3. It was created when we did not have the short and long application and is very complicated to do it once we will do a server software update.

Does that mean 3 days or 5 days deadline?

Next 10

	About	Science	Volunteers	Performance	Forum	Join us	Donate