Advanced search

Message boards : Number crunching : WU error

Author Message
Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 21855 - Posted: 20 Aug 2011 | 1:10:11 UTC

Hi,

I am new in GPUGrid, and I started to crunch some WU a few days ago.

1. This WU ended in error.

http://www.gpugrid.net/result.php?resultid=4259296

Do you know why?
Do I have to change anything in my configuration?

2. Which is the normal load for the GPU? I have only 48% for short WUs and 65% for long ones

Best regards,
Alejandro

Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 21858 - Posted: 20 Aug 2011 | 14:55:25 UTC - in response to Message 21855.

another one

http://www.gpugrid.net/result.php?resultid=4261811

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 580"
# Clock rate: 1.56 GHz
# Total amount of global memory: 1576468480 bytes
# Number of multiprocessors: 16
# Number of cores: 128
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
ERROR: get_Dvec() element 0 (b)
called boinc_finish

</stderr_txt>
]]>

I received the error at the end. So 11 hours running...

The long runs that I completed were without swan_sync = 0
SWAN: Using synchronization method 0


Best regards,
Alejandro

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3519
Credit: 935,094,407
RAC: 1,072,981
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21859 - Posted: 20 Aug 2011 | 19:06:04 UTC - in response to Message 21855.

I don't know the cause of this error, ERROR: get_Dvec() element 0 (b)
It has been seen before, this month and in July on other high end Fermi cards (480 and 590).
Perhaps the scientists know, or perhaps it really is just a client error (system or card related problem).

I would suggest you do a system restart, and check your system and GPU temperatures. Also check that your system is not down-clocking your GPU (NVidia control Panel, prefer Maximum Performance, rather than Adaptive).

Are you using any GPU intensive applications other than GPUGrid?
I guess it's possible that this is related to you running other projects, but I know nothing about what you ran before these two tasks.

In both your linked tasks you did use SWAN_SYNC:
"SWAN: Using synchronization method 0"

Are you also freeing up a CPU core? If not there is no point usine SWAN_SYNC whatsoever.

48% GPU utilization seems too low, but then I am not presently running a W7 system. You should be able to get it to within 15% as good as XP, on which I see up to 98% for Fermi tasks. That said the GIANNI tasks are only around 81% with SWAN_SYNC and Free CPU cores. But you should still be able to get around 65 to 70% utilization.

One last thing, you could try to update your driver, but I would check everything else first.


Good luck,
Kev

Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 21872 - Posted: 22 Aug 2011 | 14:18:59 UTC - in response to Message 21859.

Hi Kev,


I would suggest you do a system restart, and check your system and GPU temperatures.

done. GPU temperature : 65C

Also check that your system is not down-clocking your GPU (NVidia control Panel, prefer Maximum Performance, rather than Adaptive).

I changed in NVidia Control Panel -> Manage 3D settings -> Power Management Mode. From Adaptive to prefer maximum Performance

Are you using any GPU intensive applications other than GPUGrid?
I guess it's possible that this is related to you running other projects, but I know nothing about what you ran before these two tasks.

No. I am only using the GPU for gpugrid. the CPU for GPUGrid, einstein and test4theory


In both your linked tasks you did use SWAN_SYNC:
"SWAN: Using synchronization method 0"

Are you also freeing up a CPU core? If not there is no point usine SWAN_SYNC whatsoever.


How do I free up a CPU core? with swan_sync I can see in the task manager that the process acemdlong_6.15_windows_intelx86__cuda31 takes 25% (the core)
if not ony takes 5 to 9%



48% GPU utilization seems too low, but then I am not presently running a W7 system. You should be able to get it to within 15% as good as XP, on which I see up to 98% for Fermi tasks. That said the GIANNI tasks are only around 81% with SWAN_SYNC and Free CPU cores. But you should still be able to get around 65 to 70% utilization.


GIANNI tasks ?



One last thing, you could try to update your driver, but I would check everything else first.



I have already updated it. This happend with driver 280.26

on the weekend a disabled swan_sync in the middle of the task
http://www.gpugrid.net/result.php?resultid=4263691

and I restarted the system.
The task finished without troubles.

But yesterday I got for task http://www.gpugrid.net/result.php?resultid=4266224
the following error.

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 580"
# Clock rate: 1.56 GHz
# Total amount of global memory: 1576468480 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO: cannot open file "restart.coor"
SWAN: FATAL : swanMemcpyDtoH failed

Assertion failed: 0, file swanlib_nv.c, line 390

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>


So I am getting errors of all kind of flavors :)

Best regards,
Alejandro


Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 21874 - Posted: 22 Aug 2011 | 22:28:07 UTC - in response to Message 21872.

I changed in NVidia Control Panel -> Manage 3D settings -> Power Management Mode. From Adaptive to prefer maximum Performance


If you select Prefer Maximum performance in Linux it will revert back to Adaptive after rebooting the OS. Maybe it does that in Windows too?

How do I free up a CPU core?


Go into your BOINC preferences and adjust the "On multiprocessor systems, use at most __% of the processors". If you have 8 virtual cores (4 cores with HT turned on) then set it to 87.5% to allow 1 virtual core free. If you have 4 cores (no HT) then set it to 75% to use 3 cores for BOINC and leave 1 free.

Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 21883 - Posted: 23 Aug 2011 | 20:48:14 UTC - in response to Message 21874.

Hi Dagorath, Kev,

Now it is working fine. thxs!!!!

NO more errors. I activated swan_sinc again , freed up a CPU core and restarted windows.

I still have the issue of 48% GPU load for ACEMD2: GPU molecular dynamics v6.15 tasks.

I found out that for very long tasks (4000000 steps) the gpu used is 70%
and the use of the gpu is reducing accordingly with the steps (45-48% for 500000 steps). So, the time per step increase for smaller WU.

I checked with other users tasks and I saw the same behaviour ( for smaller WU increase the time per step).

So that brings me to the cpu.

It seems that the cpu Intel(R) Core(TM)2 Quad CPU Q9550 is not fast enough for the gtx 580. I can see on the task manager that the task acemd2_6.15_windows_intelx86__cuda31 always uses 25% ( 4 cores without HT)

What is better for gpugrid, that I only accept long tasks (application: ACEMD for long runs ) so the gpu is used to 70% or to accept all applications?

Best regards,
Alejandro


bigtuna
Volunteer moderator
Send message
Joined: 6 May 10
Posts: 80
Credit: 98,784,188
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 22449 - Posted: 2 Nov 2011 | 0:05:54 UTC - in response to Message 21883.

If you are wanting more points you should only accept long running work units as they give more points per time invested. If you just want to help GPUGrid and don't care about points I should think that accepting all tasks would be more helpful.

Also you might want to look into running Linux for better GPUGrid performance. W7 does not produce the best results. Windows XP and Linux both currently outperform W7 on GPUGrid tasks AFAIK.

You can try Linux for free without even installing it to your hard drive. All it would cost is a single CD or DVD blank disk and some time.

FatDog-64 is a super small, super cool Linux distro that can be downloaded in just a few minutes. It will run straight from CD/DVD or from a USB stick, or you can install it to your hard drive but this is not required.

Instructions are here:

http://www.gpugrid.net/forum_thread.php?id=2203#17646

Alejandro
Send message
Joined: 30 Apr 10
Posts: 12
Credit: 62,624,416
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 22456 - Posted: 2 Nov 2011 | 22:21:18 UTC

Hi Bigtuna,

I am not interested in collecting points. I have activated all the applications.

My plan is to move to linux next year. Now I am short of time, and moving to linux, will take me 2 days. I have 2 PCs. This with windows 7 and a linux box , but I cannot put a better power supply there and intall the graphic card. I made the mistake to by it from DELL.

So I have to install linux here and windows 7 on the DELL.

Best regards,
Alejandro

bigtuna
Volunteer moderator
Send message
Joined: 6 May 10
Posts: 80
Credit: 98,784,188
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 22466 - Posted: 3 Nov 2011 | 21:41:05 UTC - in response to Message 22456.

I understand that you are busy and don't have the time to spare right now. When you are ready let me know and I'll be happy to help with Linux as much as I can.

If you get time check out FatDog-64. FatDog-64 is so small it will run great from a USB stick without the need to install or write anything to your hard drive at all. You can keep Windows 7 on your hard drive and still run Linux by choosing to boot from USB when you want Linux and by booting from your hard drive when you want Windows.

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 125
Credit: 214,728,517
RAC: 470,166
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 22943 - Posted: 11 Jan 2012 | 11:00:07 UTC

What was wrong with that task?
10-KASHIF_HIVPR_cl_ba1-0-100-RND4144_0
Workunit 3028713
Created 10 Jan 2012 | 14:11:21 UTC
Sent 10 Jan 2012 | 16:32:37 UTC
Received 10 Jan 2012 | 20:48:01 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 100808
Report deadline 15 Jan 2012 | 16:32:37 UTC
Run time 11,822.64
CPU time 11,822.64
Validate state Valid
Credit 0.00
Application version ACEMD2: GPU molecular dynamics v6.14 (cuda31)
Stderr output

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 2
# There are 7 devices supporting CUDA
.
# Device 2: "GeForce GTX 580"
# Clock rate: 1.56 GHz
# Total amount of global memory: 1610285056 bytes
# Number of multiprocessors: 16
# Number of cores: 128
.

SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
# Time per step (avg over 1250000 steps): 9.484 ms
# Approximate elapsed time for entire WU: 11855.196 s
07:47:38 (24769): called boinc_finish
</stderr_txt>
]]>

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 523
Credit: 4,075,784
RAC: 0
Level
Ala
Scientific publications
watwatwat
Message 22944 - Posted: 11 Jan 2012 | 11:40:31 UTC - in response to Message 22943.
Last modified: 11 Jan 2012 | 12:25:12 UTC

Hi Nenym, thanks. Bug fixed.
T

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 125
Credit: 214,728,517
RAC: 470,166
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 23205 - Posted: 1 Feb 2012 | 10:44:59 UTC
Last modified: 1 Feb 2012 | 10:47:22 UTC

Windows Beta runDIG10-TONI_BADO9GG-4-5-RND0865_5 task errored out on startup. Debug info can bee seen via link.
Win XP, GTX560Ti, factory OC lovered to level, when NATHAN_CB tasks runs succesfully (900 -> 890) .

Post to thread

Message boards : Number crunching : WU error