Advanced search

Message boards : News : New workunits

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53010 - Posted: 21 Nov 2019 | 17:10:20 UTC
Last modified: 21 Nov 2019 | 17:17:54 UTC

I'm loading a first batch of 1000 workunits for a new project (GSN*) on the acemd3 app. This batch is both for a basic science investigation, and for load-testing the app. Thanks!

If you disabled "acemd3" from the preferences for some reason, please re-enable it.

Profile [PUGLIA] kidkidkid3
Avatar
Send message
Joined: 23 Feb 11
Posts: 79
Credit: 953,228,044
RAC: 1,959,858
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53011 - Posted: 21 Nov 2019 | 18:46:17 UTC - in response to Message 53010.

Hi,
my first new WU stop at 1% after 20 minutes of running.
I suspended and restart it, the elapsed time restart from 0.
After another 20 minutes of running, without increase of working progress i kill it.
For investigation see http://www.gpugrid.net/result.php?resultid=21501927
Thanks in advance.
K.
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

jjch
Send message
Joined: 10 Nov 13
Posts: 98
Credit: 15,288,150,388
RAC: 1,732,962
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53012 - Posted: 21 Nov 2019 | 19:03:27 UTC

Thank you Toni!

I already have my GTX 1070/1080 GPU's pegged to nearly 100% even with one WU running on each.

The GPU load does drop lower intermittently and also will drop PerfCap to Idle.

The new thing I am noticing is I am now hitting the Power PerfCap throttling the GPU's.





jjch
Send message
Joined: 10 Nov 13
Posts: 98
Credit: 15,288,150,388
RAC: 1,732,962
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53013 - Posted: 21 Nov 2019 | 19:22:08 UTC - in response to Message 53011.
Last modified: 21 Nov 2019 | 19:55:09 UTC

Hi,
my first new WU stop at 1% after 20 minutes of running.
I suspended and restart it, the elapsed time restart from 0.
After another 20 minutes of running, without increase of working progress i kill it.
For investigation see http://www.gpugrid.net/result.php?resultid=21501927
Thanks in advance.
K.


[PUGLIA] kidkidkid3

It would help a lot to know what your setup looks like. Your computers are hidden so we can't see them. Also, the configuration may make a difference

Please provide some details.

Profile [PUGLIA] kidkidkid3
Avatar
Send message
Joined: 23 Feb 11
Posts: 79
Credit: 953,228,044
RAC: 1,959,858
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53014 - Posted: 21 Nov 2019 | 20:26:58 UTC - in response to Message 53013.
Last modified: 21 Nov 2019 | 20:37:24 UTC

Sorry for mistake of configuration

Intel Quadcore Q9450 with 4GB (2*2 DDR3 at 1333) and GTX 750 TI

... here the log.

Output su Stderr
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
19:10:11 (2408): wrapper (7.9.26016): starting
19:10:11 (2408): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1583} normal block at 0x0000020099013380, 8 bytes long.
Data: < > 00 00 F9 98 00 02 00 00
..\lib\diagnostics_win.cpp(417) : {198} normal block at 0x0000020099011BA0, 1080 bytes long.
Data: <( ` > 28 11 00 00 CD CD CD CD 60 01 00 00 00 00 00 00
Object dump complete.

</stderr_txt>
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

jjch
Send message
Joined: 10 Nov 13
Posts: 98
Credit: 15,288,150,388
RAC: 1,732,962
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53015 - Posted: 21 Nov 2019 | 20:34:40 UTC - in response to Message 53014.

Sorry for mistake of configuration ... here the log.

Output su Stderr
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
19:10:11 (2408): wrapper (7.9.26016): starting
19:10:11 (2408): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1583} normal block at 0x0000020099013380, 8 bytes long.
Data: < > 00 00 F9 98 00 02 00 00
..\lib\diagnostics_win.cpp(417) : {198} normal block at 0x0000020099011BA0, 1080 bytes long.
Data: <( ` > 28 11 00 00 CD CD CD CD 60 01 00 00 00 00 00 00
Object dump complete.

</stderr_txt>
]]>



OK so I can check on the link to the computer and I see you have 2x GTX 750 Ti's
http://www.gpugrid.net/show_host_detail.php?hostid=208691

I'm not sure a GTX 750 series can run the new app. Let's see if one of the resident experts will know the answer.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53016 - Posted: 21 Nov 2019 | 20:48:13 UTC - in response to Message 53015.

I'm not sure a GTX 750 series can run the new app. Let's see if one of the resident experts will know the answer.

the strange thing with my hosts here is that the host with the GTX980ti and the host with the GTX970 received the new ACEMD v2.10 tasks this evening, but the two hosts with a GTX750ti did NOT. Was this coincidence, or is the new version not being sent to GTX750ti cards?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,944,002,024
RAC: 10,731,454
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53017 - Posted: 21 Nov 2019 | 21:03:13 UTC - in response to Message 53015.

I'm not sure a GTX 750 series can run the new app

I can confirm that I've finished successfully ACEMD3 test tasks on GTX750 and GTX750Ti graphics cards running under Linux OS.
I can also remark that I had some troubles under Windows 10 regarding some Antivirus interference.
This was commented at following thread:
http://www.gpugrid.net/forum_thread.php?id=4999

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,944,002,024
RAC: 10,731,454
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53018 - Posted: 21 Nov 2019 | 21:08:18 UTC - in response to Message 53016.

Was this coincidence, or is the new version not being sent to GTX750ti cards?

Please, try updating drivers

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53020 - Posted: 21 Nov 2019 | 23:23:13 UTC
Last modified: 21 Nov 2019 | 23:25:15 UTC

Just starting a task on my GTX 1050TI (fully updated drivers, no overdrive, default settings)
Been running 30 mins and it did 2% finally.
You should change something in the code so it spits out decimal update % done information. I use that to check if the task is moving in Boinc Tasks.
Your config only updates the 1-x% and no decimal.
Memory usage is minimal compared to LHC ATLAS. You use only 331 MB real memory and 648 Virtual, which is more in the range of what Rosetta uses.
So it looks like I should have this task done in about 26 hrs from now.
For a GPU task it is taking a lot of CPU, it needs 100%+ all the time in CPU usage.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53021 - Posted: 22 Nov 2019 | 1:12:12 UTC - in response to Message 53015.
Last modified: 22 Nov 2019 | 1:37:58 UTC

Hi, I'm running test 309 on an i7-860 with one GTX 750Ti and ACEMD 3 test is reporting 4.680%/Hr.
Better than my GTX 1060 and i7-7700K running test 725 @ 4.320%/Hr.
Why does the GTX 1060 run slower, Toni, anybody?
(running latest drivers)
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53022 - Posted: 22 Nov 2019 | 1:53:59 UTC

I got this one today http://www.gpugrid.net/workunit.php?wuid=16850979 and it ran fine. As I've said before, Linux machines are quite ready.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53023 - Posted: 22 Nov 2019 | 2:09:05 UTC

Three finished so far, working on a fourth. Keep 'em coming.

KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 16
Level
Lys
Scientific publications
watwat
Message 53025 - Posted: 22 Nov 2019 | 4:24:07 UTC

Got one task. GTX1060 with Max-Q. Windows 10. Task errored out. Following is the complete story.
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
23:16:17 (1648): wrapper (7.9.26016): starting
23:16:17 (1648): wrapper: running acemd3.exe (--boinc input --device 0)
# Engine failed: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
02:43:35 (1648): acemd3.exe exited; CPU time 12377.906250
02:43:35 (1648): app exit status: 0x1
02:43:35 (1648): called boinc_finish(195)
0 bytes in 0 Free Blocks.
506 bytes in 8 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 184328910 bytes.
Dumping objects ->
{1806} normal block at 0x000001C10FAA6190, 48 bytes long.
Data: <ACEMD_PLUGIN_DIR> 41 43 45 4D 44 5F 50 4C 55 47 49 4E 5F 44 49 52
{1795} normal block at 0x000001C10FAA6350, 48 bytes long.
Data: <HOME=D:\ProgramD> 48 4F 4D 45 3D 44 3A 5C 50 72 6F 67 72 61 6D 44
{1784} normal block at 0x000001C10FAA6580, 48 bytes long.
Data: <TMP=D:\ProgramDa> 54 4D 50 3D 44 3A 5C 50 72 6F 67 72 61 6D 44 61
{1773} normal block at 0x000001C10FAA6120, 48 bytes long.
Data: <TEMP=D:\ProgramD> 54 45 4D 50 3D 44 3A 5C 50 72 6F 67 72 61 6D 44
{1762} normal block at 0x000001C10FAA5CC0, 48 bytes long.
Data: <TMPDIR=D:\Progra> 54 4D 50 44 49 52 3D 44 3A 5C 50 72 6F 67 72 61
{1751} normal block at 0x000001C10FA8A0B0, 141 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
..\api\boinc_api.cpp(309) : {1748} normal block at 0x000001C10FAA86C0, 8 bytes long.
Data: < > 00 00 A4 0F C1 01 00 00
{977} normal block at 0x000001C10FA8D840, 141 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
{203} normal block at 0x000001C10FAA8CB0, 8 bytes long.
Data: < > 10 BB AA 0F C1 01 00 00
{197} normal block at 0x000001C10FAA5B70, 48 bytes long.
Data: <--boinc input --> 2D 2D 62 6F 69 6E 63 20 69 6E 70 75 74 20 2D 2D
{196} normal block at 0x000001C10FAA8030, 16 bytes long.
Data: < > 18 BA AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{195} normal block at 0x000001C10FAA83F0, 16 bytes long.
Data: < > F0 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{194} normal block at 0x000001C10FAA89E0, 16 bytes long.
Data: < > C8 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{193} normal block at 0x000001C10FAA7FE0, 16 bytes long.
Data: < > A0 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{192} normal block at 0x000001C10FAA8DA0, 16 bytes long.
Data: <x > 78 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{191} normal block at 0x000001C10FAA8B20, 16 bytes long.
Data: <P > 50 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{190} normal block at 0x000001C10FAA5A90, 48 bytes long.
Data: <ComSpec=C:\Windo> 43 6F 6D 53 70 65 63 3D 43 3A 5C 57 69 6E 64 6F
{189} normal block at 0x000001C10FAA7F90, 16 bytes long.
Data: < > D0 FE A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{188} normal block at 0x000001C10FA9D540, 32 bytes long.
Data: <SystemRoot=C:\Wi> 53 79 73 74 65 6D 52 6F 6F 74 3D 43 3A 5C 57 69
{187} normal block at 0x000001C10FAA88F0, 16 bytes long.
Data: < > A8 FE A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{185} normal block at 0x000001C10FAA8C10, 16 bytes long.
Data: < > 80 FE A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{184} normal block at 0x000001C10FAA81C0, 16 bytes long.
Data: <X > 58 FE A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{183} normal block at 0x000001C10FAA8210, 16 bytes long.
Data: <0 > 30 FE A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{182} normal block at 0x000001C10FAA85D0, 16 bytes long.
Data: < > 08 FE A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{181} normal block at 0x000001C10FAA88A0, 16 bytes long.
Data: < > E0 FD A8 0F C1 01 00 00 00 00 00 00 00 00 00 00
{180} normal block at 0x000001C10FA8FDE0, 280 bytes long.
Data: < \ > A0 88 AA 0F C1 01 00 00 C0 5C AA 0F C1 01 00 00
{179} normal block at 0x000001C10FAA8800, 16 bytes long.
Data: <0 > 30 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{178} normal block at 0x000001C10FAA8A80, 16 bytes long.
Data: < > 08 B9 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{177} normal block at 0x000001C10FAA8850, 16 bytes long.
Data: < > E0 B8 AA 0F C1 01 00 00 00 00 00 00 00 00 00 00
{176} normal block at 0x000001C10FAAB8E0, 496 bytes long.
Data: <P acemd3.e> 50 88 AA 0F C1 01 00 00 61 63 65 6D 64 33 2E 65
{65} normal block at 0x000001C10FAA8C60, 16 bytes long.
Data: < > 80 EA D6 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{64} normal block at 0x000001C10FA9BA00, 16 bytes long.
Data: <@ > 40 E9 D6 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{63} normal block at 0x000001C10FA9B9B0, 16 bytes long.
Data: < W > F8 57 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{62} normal block at 0x000001C10FA9B960, 16 bytes long.
Data: < W > D8 57 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{61} normal block at 0x000001C10FA9B910, 16 bytes long.
Data: <P > 50 04 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{60} normal block at 0x000001C10FA9B870, 16 bytes long.
Data: <0 > 30 04 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{59} normal block at 0x000001C10FA9B780, 16 bytes long.
Data: < > E0 02 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{58} normal block at 0x000001C10FA9B730, 16 bytes long.
Data: < > 10 04 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{57} normal block at 0x000001C10FA9B690, 16 bytes long.
Data: <p > 70 04 D3 DA F7 7F 00 00 00 00 00 00 00 00 00 00
{56} normal block at 0x000001C10FA9B640, 16 bytes long.
Data: < > 18 C0 D1 DA F7 7F 00 00 00 00 00 00 00 00 00 00
Object dump complete.

</stderr_txt>

Enjoy reading it.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53026 - Posted: 22 Nov 2019 | 5:43:18 UTC
Last modified: 22 Nov 2019 | 6:01:38 UTC

I'm not sure a GTX 750 series can run the new app.

I have a GTX 750 on a linux host that is processing an ACEMD3 task, it is about half way through and should complete the task in about 1 day.

A Win7 host with GTX 750 ti is also processing an ACEMD3 task. This should take 20 hours.

On a Win7 host with GTX 960, two ACEMD3 tasks have failed. Both with this error: # Engine failed: Particle coordinate is nan
Host can be found here: http://gpugrid.net/results.php?hostid=274119

What I have noticed on my Linux hosts is nvidia-smi reports the ACEMD3 tasks are using 10% more power than the ACEMD2 tasks. This would indicate that the ACEMD3 tasks are more efficient at pushing the GPU to it's full potential.

Because of this, I have reduced the overclocking on some hosts (particularly the GTX 960 above)

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53027 - Posted: 22 Nov 2019 | 5:59:40 UTC - in response to Message 53018.

Was this coincidence, or is the new version not being sent to GTX750ti cards?

Please, try updating drivers

would be useful if we were told which is the minimum required version number of the driver.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53028 - Posted: 22 Nov 2019 | 6:03:14 UTC - in response to Message 53027.

would be useful if we were told which is the minimum required version number of the driver.


This info can be found here:
http://gpugrid.net/forum_thread.php?id=5002

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53029 - Posted: 22 Nov 2019 | 6:11:17 UTC - in response to Message 53028.

would be useful if we were told which is the minimum required version number of the driver.


This info can be found here:
http://gpugrid.net/forum_thread.php?id=5002

oh, thanks very much; so all is clear now - I need to update my drivers on the two GTX750ti hosts.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53031 - Posted: 22 Nov 2019 | 7:05:16 UTC - in response to Message 53021.

Hi, I'm running test 309 on an i7-860 with one GTX 750Ti and ACEMD 3 test is reporting 4.680%/Hr.
Better than my GTX 1060 and i7-7700K running test 725 @ 4.320%/Hr.
Why does the GTX 1060 run slower, Toni, anybody?
(running latest drivers)


The gtx 1060 performance seems fine for the ACEMD2 task in your task list.
May find some clues to the slow ACEMD3 performance in the Stderr output when task completes.
The ACEMD3 task progress reporting is not as accurate as the ACEMD2 tasks, a side affect of using a Wrapper. So the performance should only be judged when it has completed.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53032 - Posted: 22 Nov 2019 | 7:31:07 UTC - in response to Message 53029.
Last modified: 22 Nov 2019 | 8:02:16 UTC

would be useful if we were told which is the minimum required version number of the driver.


This info can be found here:
http://gpugrid.net/forum_thread.php?id=5002

oh, thanks very much; so all is clear now - I need to update my drivers on the two GTX750ti hosts.


Driver updates complete, and 1 of my 2 GTX750ti has already received a task, it's running well.

What I noticed, also on the other hosts (GTX980ti and GTX970), is that the GPU usage (as shown in the NVIDIA Inspector and GPU-Z) now is up to 99% most of the time; this was not the case before, most probably due to the WDDM "brake" in Win7 and Win10 (it was at 99% in WinXP which had no WDDM).
And this is noticable, as the new software seems to have overcome this problem.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53034 - Posted: 22 Nov 2019 | 8:48:13 UTC - in response to Message 53032.
Last modified: 22 Nov 2019 | 8:53:13 UTC

Driver updates complete, and 1 of my 2 GTX750ti has already received a task, it's running well.

Good News!

What I noticed, also on the other hosts (GTX980ti and GTX970), is that the GPU usage (as shown in the NVIDIA Inspector and GPU-Z) now is up to 99% most of the time; this was not the case before, most probably due to the WDDM "brake" in Win7 and Win10 (it was at 99% in WinXP which had no WDDM).
And this is noticable, as the new software seems to have overcome this problem.

The ACEMD3 performance is impressive. Toni did indicate that the performance using the Wrapper will be better (here:
http://gpugrid.net/forum_thread.php?id=4935&nowrap=true#51939)...and he is right!
Toni (and GPUgrid team) set out with a vision to make the app more portable and faster. They have delivered. Thank you Toni (and GPUgrid team).

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53036 - Posted: 22 Nov 2019 | 8:52:30 UTC

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53037 - Posted: 22 Nov 2019 | 9:00:41 UTC - in response to Message 53034.

Toni (and GPUgrid team) set out with a vision to make the app more portable and faster. They have delivered. Thank you Toni (and GPUgrid team).

+ 1

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53038 - Posted: 22 Nov 2019 | 9:03:05 UTC - in response to Message 53036.
Last modified: 22 Nov 2019 | 9:06:17 UTC

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.


The memory leaks do appear on startup, probably not critical errors.

The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another.

From your STDerr Output:
.....
04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0)
.....
06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


It was started on Device 0
but failed when it was resumed on Device 1

Refer this FAQ post by Toni for further clarification:
http://www.gpugrid.net/forum_thread.php?id=5002

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53039 - Posted: 22 Nov 2019 | 9:20:03 UTC - in response to Message 53038.
Last modified: 22 Nov 2019 | 9:21:18 UTC

Thanks to all! To summarize some responses of the feedback above:

* GPU occupation is high (100% on my Linux machine)
* %/day is not an indication of performance because WU size differs between WU types
* Minimum required drivers, failures on notebook cards: see FAQ - thanks for those posting the links
* Tasks apparently stuck: may be an impression due to the % being rounded (e.g. 8h task divided in 100% fractions = no apparent progress for minutes)
* "Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53040 - Posted: 22 Nov 2019 | 9:30:14 UTC

Toni, since the new app is an obvious success - now the inevitable question: when will you send out the next batch of tasks?

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53041 - Posted: 22 Nov 2019 | 9:44:45 UTC - in response to Message 53039.

Hi Toni

"Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top.

I am not seeing the error at the top, am I missing it? All I find is the generic Wrapper error message stating there is an Error in the Client task.
The task error is buried in the STDerr Output.
Can the task error be passed to the Wrapper Error code?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53042 - Posted: 22 Nov 2019 | 11:04:23 UTC - in response to Message 53041.

@rod4x4 which error? no resume on different cards is known, please see the faq.

San-Fernando-Valley
Send message
Joined: 16 Jan 17
Posts: 8
Credit: 27,984,427
RAC: 0
Level
Val
Scientific publications
watwat
Message 53043 - Posted: 22 Nov 2019 | 11:42:23 UTC

WAITING FOR WU's

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53044 - Posted: 22 Nov 2019 | 13:26:53 UTC - in response to Message 53038.

oh interesting.
then I guess I have to write a script to keep all your tasks on the 1050.
That's my better GPU anyway.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53045 - Posted: 22 Nov 2019 | 13:28:46 UTC - in response to Message 53039.

Why is CPU usage so high?
I expect GPU to be high, but CPU?
One thread running between 85-100+% on CPU

jp de malo
Send message
Joined: 3 Jun 10
Posts: 4
Credit: 1,670,915,743
RAC: 251,232
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53046 - Posted: 22 Nov 2019 | 14:05:37 UTC - in response to Message 53043.
Last modified: 22 Nov 2019 | 14:06:39 UTC

c'est dΓ©ja fini le test aucune erreur sur mes 1050ti et sur ma 1080ti

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53047 - Posted: 22 Nov 2019 | 14:09:04 UTC - in response to Message 53044.

oh interesting.
then I guess I have to write a script to keep all your tasks on the 1050.
That's my better GPU anyway.


See faq, you can restrict usable gpus.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53048 - Posted: 22 Nov 2019 | 14:32:35 UTC - in response to Message 53038.

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.


The memory leaks do appear on startup, probably not critical errors.

The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another.

From your STDerr Output:
.....
04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0)
.....
06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


It was started on Device 0
but failed when it was resumed on Device 1

Refer this FAQ post by Toni for further clarification:
http://www.gpugrid.net/forum_thread.php?id=5002

Solve the issue of stopping processing one type of card and attempting to finish on another type of card by changing your compute preferences of "switch between tasks every xx minutes" to a larger value than the default 60. Change to a value that will allow the task to finish on your slowest card. I suggest 360-640 minutes depending on your hardware.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53049 - Posted: 22 Nov 2019 | 14:36:02 UTC

I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some


7:33:28 (10748): wrapper: running acemd3.exe (--boinc input --device 2)
# Engine failed: Illegal value for DeviceIndex: 2

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53051 - Posted: 22 Nov 2019 | 14:48:22 UTC - in response to Message 53045.

Why is CPU usage so high?
I expect GPU to be high, but CPU?
One thread running between 85-100+% on CPU

Because that is the way the gpu application and wrapper requires. The science application is faster and needs a constant supply of data fed to it by the cpu thread because of higher gpu utilization. The tasks finish in 1/3 to 1/2 the time that the old acemd2 app needed.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53052 - Posted: 22 Nov 2019 | 15:14:44 UTC - in response to Message 53039.

Thanks to all! To summarize some responses of the feedback above:

* GPU occupation is high (100% on my Linux machine)
* %/day is not an indication of performance because WU size differs between WU types
* Minimum required drivers, failures on notebook cards: see FAQ - thanks for those posting the links
* Tasks apparently stuck: may be an impression due to the % being rounded (e.g. 8h task divided in 100% fractions = no apparent progress for minutes)
* "Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top.

Toni, new features are available for CUDA-MEMCHECK in CUDA10.2. The CUDA-MEMCHECK tool seems useful. It can be called against the application with:
cuda-memcheck [memcheck_options] app_name [app_options]

https://docs.nvidia.com/cuda/cuda-memcheck/index.html#memcheck-tool

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53053 - Posted: 22 Nov 2019 | 15:25:17 UTC - in response to Message 53049.

I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some

7:33:28 (10748): wrapper: running acemd3.exe (--boinc input --device 2)
# Engine failed: Illegal value for DeviceIndex: 2


In one of my hosts I have 2 GTX980Ti. However, one of them I have excluded from GPUGRID via cc_config.xml since one of the fans became defective. But with regard to your request, I guess this does not matter.
At any rate, the other GPU processes the new app perfectly.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53054 - Posted: 22 Nov 2019 | 15:55:46 UTC - in response to Message 53048.

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.


The memory leaks do appear on startup, probably not critical errors.

The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another.

From your STDerr Output:
.....
04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0)
.....
06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


It was started on Device 0
but failed when it was resumed on Device 1

Refer this FAQ post by Toni for further clarification:
http://www.gpugrid.net/forum_thread.php?id=5002

Solve the issue of stopping processing one type of card and attempting to finish on another type of card by changing your compute preferences of "switch between tasks every xx minutes" to a larger value than the default 60. Change to a value that will allow the task to finish on your slowest card. I suggest 360-640 minutes depending on your hardware.


360 is already where it is at since I also run LHC ATLAS and that does not like to be disturbed and usually finishes in 6 hrs.

I added a cc_config file to force your project to use just the 1050. I will double check my placement a bit later.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53055 - Posted: 22 Nov 2019 | 18:57:10 UTC

The %Progress keeps resetting to zero on 2080 Ti's but seems normal on 1080 Ti's.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53056 - Posted: 22 Nov 2019 | 19:39:41 UTC - in response to Message 53049.

I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some

7:33:28 (10748): wrapper: running acemd3.exe (--boinc input --device 2)
# Engine failed: Illegal value for DeviceIndex: 2

I'm currently running test340-TONI_GSNTEST3-3-100-RND9632_0 on a GTX 1660 SUPER under Windows 7, BOINC v7.16.3

The machine has a secondary GPU, but is running on the primary: command line looks correct, as

"acemd3.exe" --boinc input --device 0

Progress is displaying plausibly as 50.000% after 2 hours 22 minutes, updating in 1% increments only.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53057 - Posted: 22 Nov 2019 | 22:03:04 UTC

Task completed and validated.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53058 - Posted: 22 Nov 2019 | 22:18:54 UTC - in response to Message 53055.

The %Progress keeps resetting to zero on 2080 Ti's but seems normal on 1080 Ti's.
My impression so far is that Win7-64 can run four WUs on two 1080 Ti's fine on the same computer fine.
The problem seems to be with 2080 Ti's running on Win7-64. I'm running four WUs on one 2080 Ti with four Einstein or four Milkyway on the second 2080 Ti seems ok so far. Earlier when I had two WUs on each 2080 Ti along with either two Einstein or two Milkyway that it kept resetting.
All Linux computers with 1080 Ti's seem normal.
Plan to move my two 2080 Ti's back to a Linux computer and try that.

____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53059 - Posted: 23 Nov 2019 | 2:32:37 UTC - in response to Message 53058.

The %Progress keeps resetting to zero on 2080 Ti's but seems normal on 1080 Ti's.
My impression so far is that Win7-64 can run four WUs on two 1080 Ti's fine on the same computer fine.
The problem seems to be with 2080 Ti's running on Win7-64. I'm running four WUs on one 2080 Ti with four Einstein or four Milkyway on the second 2080 Ti seems ok so far. Earlier when I had two WUs on each 2080 Ti along with either two Einstein or two Milkyway that it kept resetting.
All Linux computers with 1080 Ti's seem normal.
Plan to move my two 2080 Ti's back to a Linux computer and try that.

As a single ACEMD3 task can push the GPU to 100%, it would be interesting to see if there is any clear advantage to running multiple ACEMD3 tasks on a GPU.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53060 - Posted: 23 Nov 2019 | 2:42:50 UTC - in response to Message 53042.
Last modified: 23 Nov 2019 | 2:46:13 UTC

@rod4x4 which error? no resume on different cards is known, please see the faq.

Hi Toni
Not referring to any particular error.
When the ACEMD3 task (Child task) experiences an error, the Wrapper always reports a generic error (195) in the Exit Status:
Exit status 195 (0xc3) EXIT_CHILD_FAILED

Can the specific (Child) task error be passed to the Exit Status?

KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 16
Level
Lys
Scientific publications
watwat
Message 53061 - Posted: 23 Nov 2019 | 4:05:58 UTC

Okay, my 1060 with Max-Q design completed one task and validated.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53062 - Posted: 23 Nov 2019 | 5:16:43 UTC - in response to Message 53061.

Okay, my 1060 with Max-Q design completed one task and validated.

Good news.
Did you make any changes to the config after the first failure?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53065 - Posted: 23 Nov 2019 | 10:30:42 UTC

My windows 10 computer on the RTX 2080 ti is finishing these WUs in about 6100 seconds, which is about the same time as computers running linux with same card.

Is the WDDM lag gone or is it my imagination?


The 18000 to 19000 seconds range are the these WUs running on the GTX 980 ti.



http://www.gpugrid.net/results.php?hostid=263612&offset=0&show_names=0&state=0&appid=32




KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 16
Level
Lys
Scientific publications
watwat
Message 53066 - Posted: 23 Nov 2019 | 11:03:36 UTC

@Rod 4*4. I did make a change but I do not know it's relevance. I set SWAN_SYNC to 0. I did that for some other reason. Anyway, second WU completed and validated.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53069 - Posted: 23 Nov 2019 | 13:12:50 UTC - in response to Message 53065.
Last modified: 23 Nov 2019 | 13:13:10 UTC

Is the WDDM lag gone or is it my imagination?

Given that the various tool now show a GPU utilization of mostly up to 99% or even 100% (as it was with WinXP before), it would seem to me that the WDDM does not play a role any more.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 53070 - Posted: 23 Nov 2019 | 13:32:02 UTC

WU now require 1 CPU core - WU run slower on 4/5 GPUs with (4) CPU cores.

3 GPUs full speed while 4 or 5 cause GPU usage to tank.

ELISA WU GPU max power draw (55C GPU temp)

330W on 2080ti 95% GPU utilization (PCIe 3.0 x4) @ 1995MHz

115W on 1660 99% "" (PCIe 3.0 x4) @ 1995MHz

215W on 2080 89% "" (PCIe 2.0 x1) @ 1995MHz

Progress bar runtime (2080ti 1hr 40min) / (2080 2hr 40min) / (1660 5hr)

1660 runtime equal to the 980ti.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53071 - Posted: 23 Nov 2019 | 13:48:57 UTC - in response to Message 53065.

My windows 10 computer on the RTX 2080 ti is finishing these WUs in about 6100 seconds, which is about the same time as computers running linux with same card.

Is the WDDM lag gone or is it my imagination?
I came to this conclusion too. The runtimes on Windows 10 are about 10880 sec (3h 1m 20s) (11200 sec on my other host), while on Linux it's about 10280 sec (2h 51m 20s) on GTX 1080 Ti (Linux is about 5.5% faster). These are different cards, and the fastest GPU appears to be the slowest in this list. It's possible that the CPU feeding the GPU(s) is more important for the ACEMD3 than it was for the ACEMD2, as my ACEMD3-wise slowest host has the oldest CPU (i7-4930k, which is 3rd gen.: Ivy Bridge E) while the other has an i3-4330 (which is 4rd gen.: Haswell). The other difference between the two Windows host is that the i7 had 2 rosetta@home tasks running, while the i3 had only the ACEMD3 running. Now I reduced the number of rosetta@home tasks to 1. I will suspend rosetta@home if there will be a steady flow of GPUGrid workunits.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53072 - Posted: 23 Nov 2019 | 14:22:14 UTC - in response to Message 53069.
Last modified: 23 Nov 2019 | 14:23:21 UTC

Is the WDDM lag gone or is it my imagination?
Given that the various tool now show a GPU utilization of mostly up to 99% or even 100% (as it was with WinXP before), it would seem to me that the WDDM does not play a role any more.
While this high readout of GPU usage could be misleading, I think it's true this time. I expected this to happen on Windows 10 v1703, but apparently it didn't. So it seems that older CUDA versions (8.0) don't have their appropriate drivers to get around WDDM, but CUDA 10 has it.
I mentioned it at the end of a post almost 2 years ago.
There are new abbreviations from Microsoft to memorize (the links lead to TLDR pages, so click on them at your own risk):
DCH: Declarative Componentized Hardware supported apps
UWP: Universal Windows Platform
WDF: Windows Driver Frameworks
- KMDF: Kernel-Mode Driver Framework
- UMDF: User-Mode Driver Framework
This 'new' Windows Driver Framework is responsible for the 'lack of WDDM' and its overhead. Good work!

[AF>Libristes]on2vhf
Send message
Joined: 7 Oct 17
Posts: 2
Credit: 7,615,020
RAC: 0
Level
Ser
Scientific publications
wat
Message 53073 - Posted: 23 Nov 2019 | 15:09:36 UTC

Hi,
many thanks for add new tasks, but please add again more tasks lol
bests regards
laurent from Belgium

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53075 - Posted: 23 Nov 2019 | 16:45:21 UTC - in response to Message 53073.
Last modified: 23 Nov 2019 | 16:48:13 UTC

100% GPU use and low WDDM overhead are nice news. However, they may be a specific to this particular WU type - we'll see in the future. (The swan sync variable is ignored and plays no role.)

Profile [AF>Libristes] hermes
Send message
Joined: 11 Nov 16
Posts: 26
Credit: 710,087,297
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwat
Message 53084 - Posted: 24 Nov 2019 | 12:18:53 UTC - in response to Message 53075.
Last modified: 24 Nov 2019 | 12:24:26 UTC

For me, 100% on GPU is not the best ;-)
Because I have just one card on the pc, and I can't see videos when GPUgrid is running. Even if I ask to smplayer or vlc to use the CPU So I have to pause this project when I use my pc.
Maybe one day we will can put some priority to the use of GPU (on linux).
I think I will buy a cheap card for manage the TV and play movies. But well, in general I am at work or somewhere else...

Nice to have some work. Folding@Home will wait. I was thinking to change, the others BOINC projects running on GPU doesn't interest me.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53085 - Posted: 24 Nov 2019 | 12:58:27 UTC

there was a task which ended after 41 seconds with:
195 (0xc3) EXIT_CHILD_FAILED

stderr here: http://www.gpugrid.net/result.php?resultid=21514460

mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 53087 - Posted: 24 Nov 2019 | 13:16:38 UTC

Are tasks being sent out for CUDA80 plan_class? I have only received new tasks on my 1080Ti with driver 418 and none on another system with 10/1070Ti with driver 396, which doesn't support CUDA100

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53088 - Posted: 24 Nov 2019 | 13:25:52 UTC - in response to Message 53085.

there was a task which ended after 41 seconds with:
195 (0xc3) EXIT_CHILD_FAILED

stderr here: http://www.gpugrid.net/result.php?resultid=21514460

unfortunately ACEMD3 no longer tells you the real error. The wrapper provides a meaningless generic message. (error 195)
The task error in your STDerr Output is
# Engine failed: Particle coordinate is nan

I had this twice on one host. Not sure if I am completely correct as ACEMD3 is a new beast we have to learn and tame, but in my case I reduced the Overclocking and it seemed to fix the issue, though that could just be a coincidence.

ALAIN_13013
Avatar
Send message
Joined: 11 Sep 08
Posts: 18
Credit: 1,547,879,462
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53089 - Posted: 24 Nov 2019 | 13:29:53 UTC - in response to Message 53084.

For me, 100% on GPU is not the best ;-)
Because I have just one card on the pc, and I can't see videos when GPUgrid is running. Even if I ask to smplayer or vlc to use the CPU So I have to pause this project when I use my pc.
Maybe one day we will can put some priority to the use of GPU (on linux).
I think I will buy a cheap card for manage the TV and play movies. But well, in general I am at work or somewhere else...

Nice to have some work. Folding@Home will wait. I was thinking to change, the others BOINC projects running on GPU doesn't interest me.

C'est exactement ce que j'ai fait en installant une GT710 juste pour la sortie vidΓ©o, c'est au top, du coup ma 980 Ti Γ  100% de charge ne me dΓ©range pas du tout !
____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53090 - Posted: 24 Nov 2019 | 13:32:19 UTC - in response to Message 53087.

Are tasks being sent out for CUDA80 plan_class? I have only received new tasks on my 1080Ti with driver 418 and none on another system with 10/1070Ti with driver 396, which doesn't support CUDA100

Yes CUDA80 is supported, see apps page here:https://www.gpugrid.net/apps.php
Also see FAQ for ACEMD3 here: https://www.gpugrid.net/forum_thread.php?id=5002

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53092 - Posted: 24 Nov 2019 | 15:43:53 UTC - in response to Message 53088.

there was a task which ended after 41 seconds with:
195 (0xc3) EXIT_CHILD_FAILED

stderr here: http://www.gpugrid.net/result.php?resultid=21514460

unfortunately ACEMD3 no longer tells you the real error. The wrapper provides a meaningless generic message. (error 195)
The task error in your STDerr Output is
# Engine failed: Particle coordinate is nan

I had this twice on one host. Not sure if I am completely correct as ACEMD3 is a new beast we have to learn and tame, but in my case I reduced the Overclocking and it seemed to fix the issue, though that could just be a coincidence.



I had a couple errors on my windows 7 computer, and none on my windows 10 computer, so far. In my case, it's not overclocking, since I don't overclock.

http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32

Yes, I do believe we need some more testing.





mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 53093 - Posted: 24 Nov 2019 | 15:50:53 UTC - in response to Message 53090.

Are tasks being sent out for CUDA80 plan_class? I have only received new tasks on my 1080Ti with driver 418 and none on another system with 10/1070Ti with driver 396, which doesn't support CUDA100

Yes CUDA80 is supported, see apps page here:https://www.gpugrid.net/apps.php
Also see FAQ for ACEMD3 here: https://www.gpugrid.net/forum_thread.php?id=5002


Then the app requires an odd situation in Linux where it supposedly supports CUDA 80 but to use it requires a newer driver beyond it.

What driver/card/OS combinations are supported?

Windows, CUDA80 Minimum Driver r367.48 or higher
Linux, CUDA92 Minimum Driver r396.26 or higher
Linux, CUDA100 Minimum Driver r410.48 or higher
Windows, CUDA101 Minimum Driver r418.39 or higher


There's not even a Linux CUDA92 plan_class so I'm not sure what thats for in the FAQ.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53096 - Posted: 24 Nov 2019 | 18:56:12 UTC

I just wanted to confirm, you need a driver supporting CUDA100 or CUDA101, then even a GTX670 can crunch the "acemd3" app.

See computer: http://www.gpugrid.net/show_host_detail.php?hostid=486229

Although it will not make the 24 hours deadline, and I can tell, the GPU is extremely stressed. I will run some more WUs on it, to confirm that it can handle the new app. And afterwards it will go to the summer pause or might be retired from BOINC altogether.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 53098 - Posted: 24 Nov 2019 | 20:01:08 UTC - in response to Message 53093.

Are tasks being sent out for CUDA80 plan_class? I have only received new tasks on my 1080Ti with driver 418 and none on another system with 10/1070Ti with driver 396, which doesn't support CUDA100

Yes CUDA80 is supported, see apps page here:https://www.gpugrid.net/apps.php
Also see FAQ for ACEMD3 here: https://www.gpugrid.net/forum_thread.php?id=5002


Then the app requires an odd situation in Linux where it supposedly supports CUDA 80 but to use it requires a newer driver beyond it.

What driver/card/OS combinations are supported?

Windows, CUDA80 Minimum Driver r367.48 or higher
Linux, CUDA92 Minimum Driver r396.26 or higher
Linux, CUDA100 Minimum Driver r410.48 or higher
Windows, CUDA101 Minimum Driver r418.39 or higher


There's not even a Linux CUDA92 plan_class so I'm not sure what thats for in the FAQ.


And now I got the 1st CUDA80 task on that system w/o any driver changes.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53100 - Posted: 24 Nov 2019 | 21:24:31 UTC - in response to Message 53085.
Last modified: 24 Nov 2019 | 21:25:46 UTC

there was a task which ended after 41 seconds with:
195 (0xc3) EXIT_CHILD_FAILED

stderr here: http://www.gpugrid.net/result.php?resultid=21514460

Checking this task, it has failed on 8 computers so it is just a faulty work unit.
clocking would not be the cause as previously stated.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53101 - Posted: 24 Nov 2019 | 21:44:32 UTC - in response to Message 53092.

there was a task which ended after 41 seconds with:
195 (0xc3) EXIT_CHILD_FAILED

stderr here: http://www.gpugrid.net/result.php?resultid=21514460

unfortunately ACEMD3 no longer tells you the real error. The wrapper provides a meaningless generic message. (error 195)
The task error in your STDerr Output is
# Engine failed: Particle coordinate is nan

I had this twice on one host. Not sure if I am completely correct as ACEMD3 is a new beast we have to learn and tame, but in my case I reduced the Overclocking and it seemed to fix the issue, though that could just be a coincidence.



I had a couple errors on my windows 7 computer, and none on my windows 10 computer, so far. In my case, it's not overclocking, since I don't overclock.

http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32

Yes, I do believe we need some more testing


Agreed, testing will be an ongoing process...some errors cannot be fixed.

this task had an error code 194...
finish file present too long</message>

This error has been seen in ACEMD2 and listed as "Unknown"

Matt Harvey did a FAQ on error codes for ACEMD2 here
http://gpugrid.net/forum_thread.php?id=3468

icg studio
Send message
Joined: 24 Nov 11
Posts: 3
Credit: 954,677
RAC: 0
Level
Gly
Scientific publications
wat
Message 53102 - Posted: 24 Nov 2019 | 23:47:10 UTC

Finally Cuda 10.1! Supprot for Turing Cuda Cores other words.
My RTX 2060 start crunching.
Will post later time.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53103 - Posted: 25 Nov 2019 | 0:20:18 UTC - in response to Message 53101.
Last modified: 25 Nov 2019 | 0:24:00 UTC

this task had an error code 194...
finish file present too long</message>

This is a bug in the BOINC 7.14.2 client and earlier versions. You need to update to the 7.16 branch to fix it.
Identified/quantified in https://github.com/BOINC/boinc/issues/3017
And resolved for the client in:
https://github.com/BOINC/boinc/pull/3019
And in the server code in:
https://github.com/BOINC/boinc/pull/3300

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53104 - Posted: 25 Nov 2019 | 0:42:18 UTC - in response to Message 53103.
Last modified: 25 Nov 2019 | 0:57:09 UTC

this task had an error code 194...
finish file present too long</message>

This is a bug in the BOINC 7.14.2 client and earlier versions. You need to update to the 7.16 branch to fix it.
Identified/quantified in https://github.com/BOINC/boinc/issues/3017
And resolved for the client in:
https://github.com/BOINC/boinc/pull/3019
And in the server code in:
https://github.com/BOINC/boinc/pull/3300

Thanks for the info and links. Sometimes we overlook the Boinc Client performance.

From the Berkeley download page(https://boinc.berkeley.edu/download_all.php):

7.16.3 Development version
(MAY BE UNSTABLE - USE ONLY FOR TESTING)

and
7.14.2 Recommended version

This needs to be considered by volunteers, install latest version if you are feeling adventurous. (any issues you may find will help the Berkeley team develop the new client)

Alternatively,
- reducing the CPU load on your PC and/or
- ensuring the PC is not rebooted as the finish file is written,
may avert this error.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53105 - Posted: 25 Nov 2019 | 5:45:50 UTC

I haven't had a single instance of "finish file present" errors since moving to the 7.16 branch. I used to get a couple or more a day before on 7.14.2 or earlier.

It may be labelled an unstable development revision, but it is as close to general release stable as you can get. The only issue is that it is still in flux as more commits get added to it and the version number gets increased.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53109 - Posted: 25 Nov 2019 | 18:39:45 UTC - in response to Message 53084.

For me, 100% on GPU is not the best ;-)
Because I have just one card on the pc, and I can't see videos when GPUgrid is running. Even if I ask to smplayer or vlc to use the CPU So I have to pause this project when I use my pc.
Maybe one day we will can put some priority to the use of GPU (on linux).
I think I will buy a cheap card for manage the TV and play movies. But well, in general I am at work or somewhere else...

Nice to have some work. Folding@Home will wait. I was thinking to change, the others BOINC projects running on GPU doesn't interest me.



I see you have a RTX and a GTX.
You could save your GTX for video and general PC usage and put the RTX full time on GPU tasks.

I find this odd that you are having issues seeing videos. I noticed that with my system as well and it was not the GPU that was having trouble, it was the CPU that was overloaded. After I changed the CPU time to like 95% then I had no trouble watching videos.

After much tweaking on the way BOINC and all the projects I run use my system, I finally have it to where I can watch videos without any problems and I use a GTX 1050TI as my primary card along with a Ryzen 2700 with no video processor.

There must be something overloading your system if you can't watch videos on a RTX GPU while running GPU Grid.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53110 - Posted: 25 Nov 2019 | 20:19:01 UTC
Last modified: 25 Nov 2019 | 20:20:26 UTC

I am getting high CPU/South bridge temps on one of my PCs with these latest work units.

The PC is http://www.gpugrid.net/show_host_detail.php?hostid=160668
and the current work unit is http://www.gpugrid.net/workunit.php?wuid=16866756

Every WU since November 22, 2019 had been exhibiting high temperatures on this PC. The previous apps never exhibited this. In addition, I found the PC unresponsive this afternoon. I was able to reboot, however, this does not give me a warm fuzzy feeling about continuing to run GPUGrid on this PC.

Anyone else seeing something similar or is there a solution for this?

Thanks.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53111 - Posted: 25 Nov 2019 | 23:03:47 UTC - in response to Message 53110.

I am getting high CPU/South bridge temps on one of my PCs with these latest work units.
That's because of two reasons:
1. The new app uses a whole CPU thread (or core, if there's no HT or SMT) to feed the GPU
2. The new app is not hindered by WDDM.

Every WU since November 22, 2019 had been exhibiting high temperatures on this PC. The previous apps never exhibited this.
That's because of two reasons:
1. The old app didn't feed the GPU with a full CPU thread unless the user configured it with the SWAN_SYNC environmental variable.
2. The performance of the old app was hindered by WDDM (under Windows Vista...10)

In addition, I found the PC unresponsive this afternoon. I was able to reboot, however, this does not give me a warm fuzzy feeling about continuing to run GPUGrid on this PC.

Anyone else seeing something similar or is there a solution for this?
There are a few options:
1. reduce the GPU's clock frequency (and the GPU voltage accordingly) or its power target.
2. increase cooling (cleaning fins, increasing air ventilation/fan speed).
If the card is overclocked (by you, or the factory) you should re-calibrate the overclock settings for the new app.
A small reduction in GPU voltage and frequency results in perceptible decrease of the power consumption (=heat output), as the power consumption is in direct ratio of the clock frequency multiplied by the GPU voltage squared.

RFGuy_KCCO
Send message
Joined: 13 Feb 14
Posts: 6
Credit: 1,047,426,005
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53114 - Posted: 26 Nov 2019 | 4:32:11 UTC
Last modified: 26 Nov 2019 | 4:36:14 UTC

I have found that running GPU's at 60-70% of their stock power level is the sweet spot in the compromise between PPD and power consumption/temps. I usually run all of my GPU's at 60% power level.

icg studio
Send message
Joined: 24 Nov 11
Posts: 3
Credit: 954,677
RAC: 0
Level
Gly
Scientific publications
wat
Message 53119 - Posted: 26 Nov 2019 | 10:27:17 UTC - in response to Message 53102.
Last modified: 26 Nov 2019 | 10:28:37 UTC

Finally Cuda 10.1! Supprot for Turing Cuda Cores other words.
My RTX 2060 start crunching.
Will post run-time later.


13134.75 seconds run-time @ RTX 2060, Ryzen 2600,Windows 10 1909.
Average GPU CUDA utilisation 99%.
No Issue at all with those workunit.

KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 16
Level
Lys
Scientific publications
watwat
Message 53126 - Posted: 26 Nov 2019 | 17:36:55 UTC - in response to Message 53111.

[quote]1. The old app didn't feed the GPU with a full CPU thread unless the user configured it with the SWAN_SYNC environmental variable.




Something was making my Climate models unstable and crashing them. That was the reason I lassoed in the GPU through SWAN_SYNC. Now my Climate models are stable. Plus I am getting better clock speeds.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,944,002,024
RAC: 10,731,454
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53129 - Posted: 26 Nov 2019 | 19:16:01 UTC - in response to Message 53110.
Last modified: 26 Nov 2019 | 19:25:35 UTC

I am getting high CPU/South bridge temps on one of my PCs with these latest work units.

As commented in several threads along GPUGrid forum, new ACEMD3 tasks are challenging our computers to their maximum.
They can be taken as a true hardware Quality Control!
Either CPUs, GPUs, PSUs and MoBos seem to be squeezed simultaneously while processing theese tasks.
I'm thinking of printing stickers for my computers: "I processed ACEMD3 and survived" ;-)

Regarding your processor:
Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
It has a rated TDP of 130W. A lot of heat to dissipate...
It was launched on Q3/2013.
If it has been running for more than three years, I would recommend to renew CPU cooler's thermal paste.
A clean CPU cooler and a fresh thermal paste usually help to reduce CPU temperature by several degrees.

Regarding chipset temperature:
I can't remember any motherboard that I can touch chipset heatsinks with confidence.
Chipset heat evacuation is based in most of standard motherboards on passive air convection heatsinks.
If there is room at the upper back of your computer case, I would recommend to install an extra fan to extract heated air and improve air circulation.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53132 - Posted: 26 Nov 2019 | 22:56:01 UTC

Wow. My GTX 980 on Ubuntu 18.04.3 is running at 80C. It is a three-fan version, not overclocked, with a large heatsink. I don't recall seeing it above 65C before.

I can't tell about the CPU yet. It is a Ryzen 3700x, and apparently the Linux kernel does not support temperature measurements yet. But "Tdie" and "Tctl", whatever they are, report 76C on Psensor.

That is good. I want to use my hardware, and it is getting colder around here.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53133 - Posted: 26 Nov 2019 | 23:56:56 UTC - in response to Message 53132.

Tdie is the cpu temp of the 3700X. Tctl is the package power limit offset temp. The offset is 0 on Ryzen 3000. The offset is 20Β° C. on Ryzen 1000 and 10Β° C. on Ryzen 2000. The offset is used for cpu fan control.

Tdie and Tctl is provided by the k10temp driver. You still have access to the sensors command if you install lm-sensors.

Ryzen only provides a monolithic single core temp for all cores. It doesn't support individual core temps like Intel cpus.

If you have a ASUS motherboard with a WMI BIOS, there is a driver that can report all the sensors on the motherboard, the same as you would get in Windows.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53136 - Posted: 27 Nov 2019 | 2:54:05 UTC - in response to Message 53133.

Thanks. It is an ASRock board, and it probably has the same capability. I will look around.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53139 - Posted: 27 Nov 2019 | 4:34:07 UTC - in response to Message 53136.
Last modified: 27 Nov 2019 | 4:34:37 UTC

AFAIK, only ASUS implemented an WMI BIOS to overcome the limitations and restrictions of using a crappy SIO chip on most of their boards.

The latest X570 boards went with a tried and true NCT6775 SIO chip that is well supported in both Windows and Linux.

To give you an idea of what the independent developer driver accomplished with his asus-wmi-sensor driver, this is the output of sensors command on my Crosshair VII Hero motherboard.

keith@Serenity:~$ sensors
asus-isa-0000
Adapter: ISA adapter
cpu_fan: 0 RPM

asuswmisensors-isa-0000
Adapter: ISA adapter
CPU Core Voltage: +1.24 V
CPU SOC Voltage: +1.07 V
DRAM Voltage: +1.42 V
VDDP Voltage: +0.64 V
1.8V PLL Voltage: +2.14 V
+12V Voltage: +11.83 V
+5V Voltage: +4.80 V
3VSB Voltage: +3.36 V
VBAT Voltage: +3.27 V
AVCC3 Voltage: +3.36 V
SB 1.05V Voltage: +1.11 V
CPU Core Voltage: +1.26 V
CPU SOC Voltage: +1.09 V
DRAM Voltage: +1.46 V
CPU Fan: 1985 RPM
Chassis Fan 1: 0 RPM
Chassis Fan 2: 0 RPM
Chassis Fan 3: 0 RPM
HAMP Fan: 0 RPM
Water Pump: 0 RPM
CPU OPT: 0 RPM
Water Flow: 648 RPM
AIO Pump: 0 RPM
CPU Temperature: +72.0Β°C
CPU Socket Temperature: +45.0Β°C
Motherboard Temperature: +36.0Β°C
Chipset Temperature: +52.0Β°C
Tsensor 1 Temperature: +216.0Β°C
CPU VRM Temperature: +50.0Β°C
Water In: +216.0Β°C
Water Out: +35.0Β°C
CPU VRM Output Current: +71.00 A

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +72.2Β°C (high = +70.0Β°C)
Tctl: +72.2Β°C

keith@Serenity:~
$

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53141 - Posted: 27 Nov 2019 | 4:37:55 UTC

So you can at least look at the driver project at github, this is the link.
https://github.com/electrified/asus-wmi-sensors

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53146 - Posted: 27 Nov 2019 | 8:03:28 UTC - in response to Message 53141.
Last modified: 27 Nov 2019 | 8:18:14 UTC

OK, I will look at it occasionally. I think Psensor is probably good enough. Fortunately, the case has room for two (or even three) 120 mm fans side by side, so I can cool the length of the card better, I just don't normally have to.

It is now running Einstein FGRBP1, and is down to 69C. It will probably go down a little more.

EDIT:
I also have a Ryzen 3600 machine with the same motherboard (ASRock B450M PRO4) and BIOS. Tdie and Tctl are reporting 95C. I will shut it down and put on a better cooler; I just used the stock one that came with the CPU.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53147 - Posted: 27 Nov 2019 | 8:30:16 UTC

I am running at GTX 1050 at full load and full OC and it goes to only 56C. Fan speed is about 90% of capacity.

For heat, my system with a Ryzen7 2700 running at 40.75 GHZ in OC and running a wide range of projects ranging from LHC (including ATLAS) to easier stuff like Rosetta plus this new stuff, rarely gets above 81C.

I upgraded my case recently and have a Corsair case with 2x 240 fans on the front intake, 1 x 120 exhaust fan on the rear and 1 x 120 intake fan on the bottom. Cooling is with a Artic Cooling Freezer using stock fans which are as good or better than Noctura fans. My top grill can do a 360mm radiator but I opted for a 240 due to budget. So I have one extra slot for hot air to escape.

I burned a Corsair single radiator with push/pull fans after 3 years. And a gamer I met at a electronics box store in the US while I was home visiting family told me she uses the Arctic radiator and has no problems. It is also refillable.

This kind of cooling is what I consider the best short of a gas cooled system or one of those 1,000 dollar external systems.

computezrmle
Send message
Joined: 10 Jun 13
Posts: 9
Credit: 295,692,471
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwat
Message 53148 - Posted: 27 Nov 2019 | 11:27:59 UTC

@ Keith Myers
This would be steam!

Water In: +216.0Β°C




@ Greg _BE
my system with a Ryzen7 2700 running at 40.75 GHZ ... rarely gets above 81C.

Wow!!



@ Jim1348
This is the output from standard sensors package.
>sensors nct6779-isa-0290 nct6779-isa-0290 Adapter: ISA adapter Vcore: +0.57 V (min = +0.00 V, max = +1.74 V) in1: +1.09 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: +3.23 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.23 V (min = +2.98 V, max = +3.63 V) in4: +1.79 V (min = +0.00 V, max = +0.00 V) ALARM in5: +0.92 V (min = +0.00 V, max = +0.00 V) ALARM in6: +1.35 V (min = +0.00 V, max = +0.00 V) ALARM 3VSB: +3.46 V (min = +2.98 V, max = +3.63 V) Vbat: +3.28 V (min = +2.70 V, max = +3.63 V) in9: +0.00 V (min = +0.00 V, max = +0.00 V) in10: +0.75 V (min = +0.00 V, max = +0.00 V) ALARM in11: +0.78 V (min = +0.00 V, max = +0.00 V) ALARM in12: +1.66 V (min = +0.00 V, max = +0.00 V) ALARM in13: +0.91 V (min = +0.00 V, max = +0.00 V) ALARM in14: +0.74 V (min = +0.00 V, max = +0.00 V) ALARM fan1: 3479 RPM (min = 0 RPM) fan2: 0 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +40.0Β°C (high = +0.0Β°C, hyst = +0.0Β°C) sensor = thermistor CPUTIN: +48.5Β°C (high = +80.0Β°C, hyst = +75.0Β°C) sensor = thermistor AUXTIN0: +8.0Β°C sensor = thermistor AUXTIN1: +40.0Β°C sensor = thermistor AUXTIN2: +38.0Β°C sensor = thermistor AUXTIN3: +40.0Β°C sensor = thermistor SMBUSMASTER 0: +57.5Β°C PCH_CHIP_CPU_MAX_TEMP: +0.0Β°C PCH_CHIP_TEMP: +0.0Β°C PCH_CPU_TEMP: +0.0Β°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled


The real Tdie is shown as "SMBUSMASTER 0" already reduced by 27Β° (Threadripper offset) using the following formula in /etc/sensors.d/x399.conf
chip "nct6779-isa-0290" compute temp7 @-27, @+27

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53149 - Posted: 27 Nov 2019 | 12:42:59 UTC - in response to Message 53148.
Last modified: 27 Nov 2019 | 13:12:48 UTC

No..its just 177F. No idea where you got that value from.
Water boils at 212F and that would trigger a thermal shutdown on the CPU.

AMD specifies 85Β°C as the maximum safe temperature for a Ryzenβ„’ 7 2700X processor. That's the chip with the video co-processor, I am just pure CPU.
From what I can see, thermal shut down is 100-115C according to some posts.

If the chip is 80C, then I guess the outgoing water would be that, but the radiator does not feel that hot. According to NZXT CAM monitoring I am only using 75% of the temperature range.

Checked AMD website. Max Temp for my CPU is 95C or 203F. So I am well within the limits of the design specs of this CPU. Your temperature calculation was way off.




@ Keith Myers
This would be steam!
Water In: +216.0Β°C





@ Greg _BE
my system with a Ryzen7 2700 running at 40.75 GHZ ... rarely gets above 81C.

Wow!!



@ Jim1348
This is the output from standard sensors package.
>sensors nct6779-isa-0290 nct6779-isa-0290 Adapter: ISA adapter Vcore: +0.57 V (min = +0.00 V, max = +1.74 V) in1: +1.09 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: +3.23 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.23 V (min = +2.98 V, max = +3.63 V) in4: +1.79 V (min = +0.00 V, max = +0.00 V) ALARM in5: +0.92 V (min = +0.00 V, max = +0.00 V) ALARM in6: +1.35 V (min = +0.00 V, max = +0.00 V) ALARM 3VSB: +3.46 V (min = +2.98 V, max = +3.63 V) Vbat: +3.28 V (min = +2.70 V, max = +3.63 V) in9: +0.00 V (min = +0.00 V, max = +0.00 V) in10: +0.75 V (min = +0.00 V, max = +0.00 V) ALARM in11: +0.78 V (min = +0.00 V, max = +0.00 V) ALARM in12: +1.66 V (min = +0.00 V, max = +0.00 V) ALARM in13: +0.91 V (min = +0.00 V, max = +0.00 V) ALARM in14: +0.74 V (min = +0.00 V, max = +0.00 V) ALARM fan1: 3479 RPM (min = 0 RPM) fan2: 0 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +40.0Β°C (high = +0.0Β°C, hyst = +0.0Β°C) sensor = thermistor CPUTIN: +48.5Β°C (high = +80.0Β°C, hyst = +75.0Β°C) sensor = thermistor AUXTIN0: +8.0Β°C sensor = thermistor AUXTIN1: +40.0Β°C sensor = thermistor AUXTIN2: +38.0Β°C sensor = thermistor AUXTIN3: +40.0Β°C sensor = thermistor SMBUSMASTER 0: +57.5Β°C PCH_CHIP_CPU_MAX_TEMP: +0.0Β°C PCH_CHIP_TEMP: +0.0Β°C PCH_CPU_TEMP: +0.0Β°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled


The real Tdie is shown as "SMBUSMASTER 0" already reduced by 27Β° (Threadripper offset) using the following formula in /etc/sensors.d/x399.conf
chip "nct6779-isa-0290" compute temp7 @-27, @+27

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53150 - Posted: 27 Nov 2019 | 13:04:30 UTC

@ Keith Myers
This would be steam!
Water In: +216.0Β°C


I saw the same thing. Funny Huh!

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53151 - Posted: 27 Nov 2019 | 13:15:32 UTC
Last modified: 27 Nov 2019 | 13:47:59 UTC

The heatsink on the Ryzen 3600 that reports Tdie and Tctl at 95C is only moderately warm to the touch. That was the case when I installed it.

So there are two possibilities: the heatsink is not making good contact to the chip, or else the reading is wrong. I will find out soon.

EDIT: The paste is spread out properly and sticking fine. It must be that the Ryzen 3600 reports the temps differently, or else is interpreted wrong. That is my latest build by the way, put together within the last month.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53152 - Posted: 27 Nov 2019 | 13:16:36 UTC
Last modified: 27 Nov 2019 | 13:17:46 UTC

https://i.pinimg.com/originals/94/63/2d/94632de14e0b1612e4c70111396dc03f.jpg

c to f degrees chart

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53153 - Posted: 27 Nov 2019 | 13:20:31 UTC

I have checked my system with HW Monitor,CAM,MSI Command Center and Ryzen Master. All report the same thing. 80C and AMD says max 95C before shutdown.

I'll leave it at that.

computezrmle
Send message
Joined: 10 Jun 13
Posts: 9
Credit: 295,692,471
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwat
Message 53154 - Posted: 27 Nov 2019 | 13:30:37 UTC - in response to Message 53149.

No idea where you got that value from.

I got it from this message:
http://www.gpugrid.net/forum_thread.php?id=5015&nowrap=true#53139
If this is really Β°C then 216 would be steam or
if it is Β°F then 35 would be close to ice.
Water In: +216.0Β°C
Water Out: +35.0Β°C



If the chip is 80C, then I guess the outgoing water would be that, but the radiator does not feel that hot.

Seriously (don't try this!) -> any temp >60 Β°C would burn your fingers.
Most components used in watercooling circuits are specified for a Tmax (water!) of 65 Β°C.
Any cooling medium must be (much) cooler than the device to establish a heat flow.


But are you sure you really run your Ryzen at 40.75 GHZ?
It's from this post:
http://www.gpugrid.net/forum_thread.php?id=5015&nowrap=true#53147
;-)

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53156 - Posted: 27 Nov 2019 | 15:06:26 UTC - in response to Message 53148.

This would be steam!
Water In: +216.0Β°C
Not at 312 PSIA.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53159 - Posted: 27 Nov 2019 | 15:58:46 UTC - in response to Message 53150.
Last modified: 27 Nov 2019 | 16:04:19 UTC

@ Keith Myers
This would be steam!
Water In: +216.0Β°C


I saw the same thing. Funny Huh!

No, it is just the value you get from an unterminated input on the ASUS boards. Put a standard 10K thermistor on it and it reads normally.

Just ignore any input with the +216.0 Β°C value. If you are so annoyed,you could fabricate two-pin headers with a resistor to pull the inputs down.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53163 - Posted: 27 Nov 2019 | 19:43:32 UTC

I just made an interesting observation comparing my computers with GTX1650 and GTX1660ti with ServicEnginICΒ΄s computers:
http://www.gpugrid.net/show_host_detail.php?hostid=147723
http://www.gpugrid.net/show_host_detail.php?hostid=482132
mine:
http://www.gpugrid.net/show_host_detail.php?hostid=512242
http://www.gpugrid.net/show_host_detail.php?hostid=512293
The computers of ServicEnginIC are approx. 10% slower than mine. His CPUs are Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz and Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz, mine are two AMD Ryzen 5 2600 Six-Core Processors.
Might it be that the Wrapper is slower on slower CPUs and therefore slows down the GPUs? Is this the experience from other users as well?

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 6,493,864,375
RAC: 2,796,812
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53167 - Posted: 27 Nov 2019 | 20:50:07 UTC - in response to Message 53151.
Last modified: 27 Nov 2019 | 20:58:05 UTC

The heatsink on the Ryzen 3600 that reports Tdie and Tctl at 95C is only moderately warm to the touch. That was the case when I installed it.

So there are two possibilities: the heatsink is not making good contact to the chip, or else the reading is wrong. I will find out soon.

EDIT: The paste is spread out properly and sticking fine. It must be that the Ryzen 3600 reports the temps differently, or else is interpreted wrong. That is my latest build by the way, put together within the last month.


One option to cool your processor down a bit is to run it at base frequency using the cTDP and PPL (package power limit) settings in the bios. Both are set at auto in the "optimized defaults" bios setting. AMD and the motherboard manufacturers assume we are gamers or enthusiasts that want to automatically overclock the processors to the thermal limit.

Buried somewhere in the bios AMD CBS folder there should be an option to set the cTDP and PPL to manual mode. When set to manual you can key in values for watts. I have my 3700X rigs set to 65 and 65 watts for cTDP and PPL. My 3900X is set to 105 and 105 watts respectively. The numbers come from the TDP of the processor. So for a 3600 it would be 65 and for a 3600X the number is 95 watts. Save the bios settings and the processor will now run at base clock speed at full load and will draw quite a bit less power at the wall.

Here's some data I collected on my 3900X.

3900X (105 TDP; AGESA 1.0.0.3 ABBA) data running WCG at full load:

bios optimized defaults (PPL at 142?): 4.0 GHz pulls 267 watts at the wall.
TDP/PPL (package power limit) set at 105/105: 3.8 GHz pulls 218 watts at the wall
TDP/PPL set at 65/88: 3.7 GHz pulls 199 watts at the wall
TDP/PPL set at 65/65: 3.0 GHz pulls 167 watts at the wall

3.8 to 4 GHz requires 52 watts
3.7 to 4 GHz requires 68 watts
3.7 -3.8 GHz requires 20 watts
3.0 -3.7 GHz requires 32 watts

Note: The latest bios with 1.0.0.4 B does not allow me to underclock using TDP/PPL bios settings.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53168 - Posted: 27 Nov 2019 | 21:43:14 UTC - in response to Message 53163.
Last modified: 27 Nov 2019 | 21:43:48 UTC

Might it be that the Wrapper is slower on slower CPUs and therefore slows down the GPUs?
Is this the experience from other users as well?
I have similar experiences with my hosts.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53170 - Posted: 27 Nov 2019 | 22:09:13 UTC - in response to Message 53031.

Thank you Rod4x4, I later saw the first WU speed up and subsequent units have been running over 12%/Hr without issues. Guess I jumped on that too fast. The 1% increments are OK with me. Thanks again.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53171 - Posted: 27 Nov 2019 | 22:24:42 UTC - in response to Message 53167.

The heatsink on the Ryzen 3600 that reports Tdie and Tctl at 95C is only moderately warm to the touch. That was the case when I installed it.

So there are two possibilities: the heatsink is not making good contact to the chip, or else the reading is wrong. I will find out soon.

EDIT: The paste is spread out properly and sticking fine. It must be that the Ryzen 3600 reports the temps differently, or else is interpreted wrong. That is my latest build by the way, put together within the last month.


One option to cool your processor down a bit is to run it at base frequency using the cTDP and PPL (package power limit) settings in the bios. Both are set at auto in the "optimized defaults" bios setting. AMD and the motherboard manufacturers assume we are gamers or enthusiasts that want to automatically overclock the processors to the thermal limit.

Thanks, but I believe you misread me. The CPU is fine. The measurement is wrong.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,944,002,024
RAC: 10,731,454
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53172 - Posted: 27 Nov 2019 | 22:30:57 UTC

The computers of ServicEnginIC are approx. 10% slower than mine. His CPUs are Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz and Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz, mine are two AMD Ryzen 5 2600 Six-Core Processors.
Might it be that the Wrapper is slower on slower CPUs and therefore slows down the GPUs?

I have similar experiences with my hosts.

+1

And some other cons for my veteran rigs:
- DDR3 @1.333 MHZ DRAM
- Both Motherboards are PCIE 2.0, probably bottlenecking PCIE 3.0 for newest cards

10% performance loss seems to be congruent with all of this

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53175 - Posted: 28 Nov 2019 | 0:49:23 UTC - in response to Message 53171.

Thanks, but I believe you misread me. The CPU is fine. The measurement is wrong.

No, I believe the measurement is incorrect but is still going to be rather high in actuality. The Ryzen 3600 ships with the Wraith Stealth cooler which is just the normal Intel solution of a copper plug embedded into a aluminum casting. It just doesn't have the ability to quickly move heat away from the IHS.

You would see much better temps if you switched to the Wraith MAX or Wraith Prism cooler which have real heat pipes and normal sized fans.

The temps are correct for the Ryzen and Ryzen+ cpus, but the k10temp driver which is stock in Ubuntu didn't get the change needed to accommodate the Ryzen 2 cpus with the correct 0 temp offset. That only is shipping in the 5.3.4 or 5.4 kernels.

https://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen2-k10temp-Patches

There are other solutions you could use in the meantime like the ASUS temp driver if you have a compatible motherboard or there also is a zenpower driver that can report the proper temp as well as the cpu power.

https://github.com/ocerman/zenpower

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53176 - Posted: 28 Nov 2019 | 0:56:03 UTC - in response to Message 53154.
Last modified: 28 Nov 2019 | 1:05:06 UTC

Damn! Wishful thinking!

How about 4.75. To many numbers on my screen
It's because it shows 4075 and then I automatically drop in the . at 2 places not realizing my mistake!

As far as temperature goes, I am only reporting the CPU temp at the sensor point.
I have sent a webform to Arctic asking them what the temp would be at the radiator after passing by the CPU heatsink. The air temp of the exhaust air does not feel anywhere near 80. I would put it down around 40C or less.

I will see what they say and let you know.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53177 - Posted: 28 Nov 2019 | 1:06:19 UTC
Last modified: 28 Nov 2019 | 1:07:59 UTC

Tony - I keep getting this on random tasks
unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
13:11:40 (25792): wrapper (7.9.26016): starting
13:11:40 (25792): wrapper: running acemd3.exe (--boinc input --device 1)
# Engine failed: Particle coordinate is nan
13:37:25 (25792): acemd3.exe exited; CPU time 1524.765625
13:37:25 (25792): app exit status: 0x1
13:37:25 (25792): called boinc_finish(195)

It runs 1524 seconds and bombs.
What's up with that?

It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050. I see another task that is starting on the 1050 and then jumping to the 650 since the 1050 is tied up with another project.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53178 - Posted: 28 Nov 2019 | 3:06:52 UTC - in response to Message 53175.

The temps are correct for the Ryzen and Ryzen+ cpus, but the k10temp driver which is stock in Ubuntu didn't get the change needed to accommodate the Ryzen 2 cpus with the correct 0 temp offset. That only is shipping in the 5.3.4 or 5.4 kernels.

Then it is probably reading 20C too high, and the CPU is really at 75C.
Yes, I can improve on that. Thanks.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53179 - Posted: 28 Nov 2019 | 3:47:00 UTC - in response to Message 53177.

Tony - I keep getting this on random tasks
unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
13:11:40 (25792): wrapper (7.9.26016): starting
13:11:40 (25792): wrapper: running acemd3.exe (--boinc input --device 1)
# Engine failed: Particle coordinate is nan
13:37:25 (25792): acemd3.exe exited; CPU time 1524.765625
13:37:25 (25792): app exit status: 0x1
13:37:25 (25792): called boinc_finish(195)

It runs 1524 seconds and bombs.
What's up with that?

It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050. I see another task that is starting on the 1050 and then jumping to the 650 since the 1050 is tied up with another project.


# Engine failed: Particle coordinate is nan

Two issues can cause this error:
1. Error in the Task. This would mean all Hosts fail the task. See this link for details: https://github.com/openmm/openmm/issues/2308
2. If other Hosts do not fail the task, the error could be in the GPU Clock rate. I have tested this on one of my hosts and am able to produce this error when I Clock the GPU too high.

It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050.

One setting to try....In Boinc Manager, Computer Preferences, set the "Switch between tasks every xxx minutes" to between 800 - 9999. This should allow the task to finish on the same GPU it started on.
Can you post your app_config.xml file contents?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53180 - Posted: 28 Nov 2019 | 4:34:21 UTC

I've had a couple of the NaN errors. One where everyone errors out the task and another recently where it errored out after running through to completion. I had already removed all overclocking on the card but it still must have been too hot for the stock clockrate. It is my hottest card being sandwiched in the middle of the gpu stack with very little airflow. I am going to have to start putting in negative clock offset on it to get the temps down I think to avoid any further NaN errors on that card.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53181 - Posted: 28 Nov 2019 | 5:52:47 UTC - in response to Message 53180.
Last modified: 28 Nov 2019 | 5:55:02 UTC

I've had a couple of the NaN errors. One where everyone errors out the task and another recently where it errored out after running through to completion. I had already removed all overclocking on the card but it still must have been too hot for the stock clockrate. It is my hottest card being sandwiched in the middle of the gpu stack with very little airflow. I am going to have to start putting in negative clock offset on it to get the temps down I think to avoid any further NaN errors on that card.

Would be interested to hear if the Under Clocking / Heat reduction fixes the issue.
I am fairly confident this is the issue, but need validation / more data from fellow volunteers to be sure.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53183 - Posted: 28 Nov 2019 | 6:32:03 UTC - in response to Message 53163.

http://www.gpugrid.net/show_host_detail.php?hostid=147723
http://www.gpugrid.net/show_host_detail.php?hostid=482132
...
Might it be that the Wrapper is slower on slower CPUs and therefore slows down the GPUs? Is this the experience from other users as well?


that's really interesting: the comparison of above two tasks shows that the host with the GTX1660ti yields lower GFLOP figures (single as well as double) as the host with the GTX1650.
In both hosts, the CPU ist the same: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz.

And now the even more surprising fact: by coincidence, exactly the same CPU is running in one of my hosts (http://www.gpugrid.net/show_host_detail.php?hostid=205584) with a GTX750ti - and here the GFLOP figures are even markedly higher than in the abeove cited hosts with more modern GPUs.

So, is the conclusion now: the weaker the GPU, the higher the number of GFLOPs generated by the system?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53184 - Posted: 28 Nov 2019 | 9:28:26 UTC - in response to Message 53183.
Last modified: 28 Nov 2019 | 9:34:47 UTC

http://www.gpugrid.net/show_host_detail.php?hostid=147723
http://www.gpugrid.net/show_host_detail.php?hostid=482132
...
Might it be that the Wrapper is slower on slower CPUs and therefore slows down the GPUs? Is this the experience from other users as well?
that's really interesting: the comparison of above two tasks shows that the host with the GTX1660ti yields lower GFLOP figures (single as well as double) as the host with the GTX1650.
In both hosts, the CPU ist the same: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz.

And now the even more surprising fact: by coincidence, exactly the same CPU is running in one of my hosts (http://www.gpugrid.net/show_host_detail.php?hostid=205584) with a GTX750ti - and here the GFLOP figures are even markedly higher than in the abeove cited hosts with more modern GPUs.

So, is the conclusion now: the weaker the GPU, the higher the number of GFLOPs generated by the system?
The "Integer" (I hope it's called this way in English) speed measured is way much higher under Linux than under Windows.
(the 1st and 2nd host use Linux, the 3rd use Windows)
See the stats of my dual boot host:
Linux 139876.18 - Windows 12615.42
There's more than one order of magnitude difference between the two OS on the same hardware, one of them must be wrong.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53185 - Posted: 28 Nov 2019 | 13:27:26 UTC - in response to Message 53176.

Damn! Wishful thinking!

How about 4.75. To many numbers on my screen
It's because it shows 4075 and then I automatically drop in the . at 2 places not realizing my mistake!

As far as temperature goes, I am only reporting the CPU temp at the sensor point.
I have sent a webform to Arctic asking them what the temp would be at the radiator after passing by the CPU heatsink. The air temp of the exhaust air does not feel anywhere near 80. I would put it down around 40C or less.

I will see what they say and let you know.

------------------------

Hi Greg

I talked to my colleague who is in the Liquid Freezer II Dev. Team and he said that theese temps are normal with this kind of load.
Installation sounds good to me.


With kind regards


Your ARCTIC Team,
Stephan
Arctic/Service Manager

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53186 - Posted: 28 Nov 2019 | 13:30:27 UTC - in response to Message 53179.

Tony - I keep getting this on random tasks
unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
13:11:40 (25792): wrapper (7.9.26016): starting
13:11:40 (25792): wrapper: running acemd3.exe (--boinc input --device 1)
# Engine failed: Particle coordinate is nan
13:37:25 (25792): acemd3.exe exited; CPU time 1524.765625
13:37:25 (25792): app exit status: 0x1
13:37:25 (25792): called boinc_finish(195)

It runs 1524 seconds and bombs.
What's up with that?

It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050. I see another task that is starting on the 1050 and then jumping to the 650 since the 1050 is tied up with another project.


# Engine failed: Particle coordinate is nan

Two issues can cause this error:
1. Error in the Task. This would mean all Hosts fail the task. See this link for details: https://github.com/openmm/openmm/issues/2308
2. If other Hosts do not fail the task, the error could be in the GPU Clock rate. I have tested this on one of my hosts and am able to produce this error when I Clock the GPU too high.

It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050.

One setting to try....In Boinc Manager, Computer Preferences, set the "Switch between tasks every xxx minutes" to between 800 - 9999. This should allow the task to finish on the same GPU it started on.
Can you post your app_config.xml file contents?


--------------------

<?xml version="1.0"?>

-<app_config>


-<exclude_gpu>

<url>www.gpugrid.net</url>

<device_num>1</device_num>

<type>NVIDIA</type>

</exclude_gpu>

</app_config>

I was having some issues with LHC ATLAS and was in the process of putting the tasks on pause and then disconnecting the client. In this process I discovered that another instance popped up right after I closed the one I was looking at and then I got another instance popping up with a message saying that there were two running. I shut that down and it shut down the last instance. This is a first for me.

I have restarted my computer and now will wait and see whats going on.

computezrmle
Send message
Joined: 10 Jun 13
Posts: 9
Credit: 295,692,471
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwat
Message 53187 - Posted: 28 Nov 2019 | 16:52:29 UTC - in response to Message 53186.

What you posted is a mix of app_config.xml and cc_config.xml.
Be so kind as to strictly follow the hints and examples on this page:
https://boinc.berkeley.edu/wiki/Client_configuration

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53189 - Posted: 28 Nov 2019 | 19:24:12 UTC - in response to Message 53187.
Last modified: 28 Nov 2019 | 19:26:52 UTC

What you posted is a mix of app_config.xml and cc_config.xml.
Be so kind as to strictly follow the hints and examples on this page:
https://boinc.berkeley.edu/wiki/Client_configuration



You give me a page on CC config. I jumped down to what appears to be stuff related to app_config and copied this

<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>NVIDIA|ATI|intel_gpu</type>]
[<app>appname</app>]
</exclude_gpu>

project id is the gpugrid.net
device = 1
type is nvidia
removed app name since app name changes so much

*****GPUGRID: Notice from BOINC
Missing <app_config> in app_config.xml
11/28/2019 8:24:51 PM***

This is why I had it in the text.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53190 - Posted: 28 Nov 2019 | 19:57:02 UTC

What the heck now???!!!

A slew of Exit child errors! What is this? Is this speed problems with OC?
Also getting restart on different device errors!!!
Now this...is that because something is not right in the app_config?

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 53191 - Posted: 28 Nov 2019 | 20:05:23 UTC - in response to Message 53189.



You give me a page on CC config. I jumped down to what appears to be stuff related to app_config and copied this

<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>NVIDIA|ATI|intel_gpu</type>]
[<app>appname</app>]
</exclude_gpu>

project id is the gpugrid.net
device = 1
type is nvidia
removed app name since app name changes so much

*****GPUGRID: Notice from BOINC
Missing <app_config> in app_config.xml
11/28/2019 8:24:51 PM***

This is why I had it in the text.



<cc_config>
<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>NVIDIA|ATI|intel_gpu</type>]
[<app>appname</app>]
</exclude_gpu
</cc_config>

This needs to go into the Boinc folder not the GPUGrid project folder

____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53192 - Posted: 28 Nov 2019 | 20:56:21 UTC - in response to Message 53190.

If you are going to use an exclude, then you need to exclude all dissimilar devices than the one you want to use. That is how to get rid of restart on different device errors. Or just set the switch between tasks to 360minutes or greater and don't exit BOINC while the task is running.

The device number you use in the exclude statement is defined by how BOINC enumerates the cards in the Event Log at startup.

The gpu_exclude statement goes into cc_config.xml in the main BOINC directory, not a project directory.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53193 - Posted: 28 Nov 2019 | 21:18:18 UTC - in response to Message 53190.

What the heck now???!!!

A slew of Exit child errors! What is this? Is this speed problems with OC?
Also getting restart on different device errors!!!
Now this...is that because something is not right in the app_config?


I see two types of errors:

ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


as the name says, exclusion not working. And

# Engine failed: Particle coordinate is nan


this usually indicates mathematical errors in the operations performed, memory corruption, or similar (or a faulty wu, unlikely in this case). Maybe a reboot will solve it.

computezrmle
Send message
Joined: 10 Jun 13
Posts: 9
Credit: 295,692,471
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwat
Message 53194 - Posted: 28 Nov 2019 | 21:28:01 UTC - in response to Message 53189.

You give me a page on CC config.

I posted the official documentation for more than just cc_config.xml:
cc_config.xml
nvc_config.xml
app_config.xml

It's worth to carefully read this page a couple of times as it provides all you need to know.

Long ago the page had a direct link to the app_config.xml section.
Unfortunately that link is not available any more but you may use your browser's find function.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53195 - Posted: 28 Nov 2019 | 22:58:46 UTC - in response to Message 53192.
Last modified: 28 Nov 2019 | 23:08:50 UTC

If you are going to use an exclude, then you need to exclude all dissimilar devices than the one you want to use. That is how to get rid of restart on different device errors. Or just set the switch between tasks to 360minutes or greater and don't exit BOINC while the task is running.

The device number you use in the exclude statement is defined by how BOINC enumerates the cards in the Event Log at startup.

The gpu_exclude statement goes into cc_config.xml in the main BOINC directory, not a project directory.



Ok, on point 1, it was set for 360 already because that's a good time for LHC ATLAS to run complete. I moved it up to 480 now to try and deal with this stuff in GPUGRID.

Point 2 - Going to try a cc_config with a triple exclude gpu code block for here and for 2 other projects. From what I read this should be possible.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53196 - Posted: 28 Nov 2019 | 23:00:43 UTC - in response to Message 53193.
Last modified: 28 Nov 2019 | 23:01:58 UTC

What the heck now???!!!

A slew of Exit child errors! What is this? Is this speed problems with OC?
Also getting restart on different device errors!!!
Now this...is that because something is not right in the app_config?


I see two types of errors:

ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


as the name says, exclusion not working. And

# Engine failed: Particle coordinate is nan


this usually indicates mathematical errors in the operations performed, memory corruption, or similar (or a faulty wu, unlikely in this case). Maybe a reboot will solve it.



One of these days I will get this problem solved. Driving me nuts!

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53197 - Posted: 29 Nov 2019 | 0:15:26 UTC - in response to Message 53195.
Last modified: 29 Nov 2019 | 0:18:28 UTC

Ok, on point 1, it was set for 360 already because that's a good time for LHC ATLAS to run complete. I moved it up to 480 now to try and deal with this stuff in GPUGRID.

As your GPU is taking 728 minutes to complete the current batch of Tasks, this setting needs to be MORE that 728 to have a positive effect. Times for other projects don't suit GPUgrid requirements as tasks here can be longer.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53205 - Posted: 29 Nov 2019 | 14:31:21 UTC - in response to Message 53197.
Last modified: 29 Nov 2019 | 14:34:45 UTC

Ok, on point 1, it was set for 360 already because that's a good time for LHC ATLAS to run complete. I moved it up to 480 now to try and deal with this stuff in GPUGRID.

As your GPU is taking 728 minutes to complete the current batch of Tasks, this setting needs to be MORE that 728 to have a positive effect. Times for other projects don't suit GPUgrid requirements as tasks here can be longer.


Oh? That's interesting. Changed to 750 minutes.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53224 - Posted: 30 Nov 2019 | 17:11:11 UTC

Just suffered DPC_WATCHDOG_VIOLATION on my system. Will be offline ba few days.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53225 - Posted: 30 Nov 2019 | 20:04:53 UTC - in response to Message 53193.
Last modified: 30 Nov 2019 | 20:14:23 UTC

# Engine failed: Particle coordinate is nan

this usually indicates mathematical errors in the operations performed, memory corruption, or similar (or a faulty wu, unlikely in this case). Maybe a reboot will solve it.
These workunits has failed on all 8 hosts with this error condition.
initial_1923-ELISA_GSN4V1-12-100-RND5980
initial_1086-ELISA_GSN0V1-2-100-RND9613
Perhaps these workunits inherited a NaN (=Not a Number) from their previous stage.
I don't think this could be solved by a reboot.
I'm eagerly waiting to see how many batches will survive through all the 100 stages.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53290 - Posted: 6 Dec 2019 | 23:38:21 UTC

I ran the following unit:

1_7-GERARD_pocket_discovery_d89241c4_7afa_4928_b469_bad3dc186521-0-2-RND1542_1, which ran well and would have finished as valid, if the following error did not occur:

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>1_7-GERARD_pocket_discovery_d89241c4_7afa_4928_b469_bad3dc186521-0-2-RND1542_1_9</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>
]]>

Scroll to the bottom on this page:

http://www.gpugrid.net/result.php?resultid=21553962


It looks like you need to increase the size limits of the output files for it to upload. It should be done for all the subsequent WUs.





Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53291 - Posted: 7 Dec 2019 | 6:12:34 UTC - in response to Message 53290.

I must have squeaked in under the wire by just this much with this GERARD_pocket_discovery task.
https://www.gpugrid.net/result.php?resultid=21551650

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53293 - Posted: 7 Dec 2019 | 12:45:41 UTC - in response to Message 53291.

I must have squeaked in under the wire by just this much with this GERARD_pocket_discovery task.
https://www.gpugrid.net/result.php?resultid=21551650



Apparently, these units vary in length. Here is another one with the same problem:

http://www.gpugrid.net/workunit.php?wuid=16894092



Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53294 - Posted: 7 Dec 2019 | 17:11:49 UTC
Last modified: 7 Dec 2019 | 17:20:36 UTC

I've got one running from 1_5-GERARD_pocket_discovery_d89241c4_7afa_4928_b469_bad3dc186521-0-2-RND2573 - I'll try to catch some figures to see how bad the problem is.

Edit - the _9 upload file (the one named in previous error messages) is set to allow

<max_nbytes>256000000.000000</max_nbytes>

or 256,000,000 bytes. You'd have thought that was enough.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53295 - Posted: 7 Dec 2019 | 17:47:56 UTC - in response to Message 53294.

The 256 MB is the new limit - I raised it today. There are only a handful of WUs like that.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53301 - Posted: 7 Dec 2019 | 23:16:16 UTC

I put precautions in place, but you beat me to it - final file size was 155,265,144 bytes. Plenty of room. Uploading now.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53303 - Posted: 8 Dec 2019 | 5:54:33 UTC

what I also noticed with the GERARD tasks (currently is running 0_2-GERARD_pocket_discovery ...):

the GPU utilization oscillates between 76% and 95% (in contrast to the ELISA tasks, where it was permanently close to or even at 100%)

Profile God is Love, JC proves it...
Avatar
Send message
Joined: 24 Nov 11
Posts: 30
Credit: 199,308,758
RAC: 459
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 53316 - Posted: 9 Dec 2019 | 20:12:48 UTC - in response to Message 53295.
Last modified: 9 Dec 2019 | 20:16:22 UTC

I am getting upload errors too, on most but not all (4 of 6) WUs...
but, only on my 950M, not on my 1660 Ti, ... or EVEN my GeForce 640 !!

need to increase the size limits of the output files

So, how is this done?
Via the Options, Computing preferences, under Network, the default values are not shown (that I can see). I WOULD have assumed that boinc manager would have these as only limited by the system constraints unless tighter limits are desired.
AND, only download rate, upload rate, and usage limits can be set.
Again, how should output file size limits be increased.

It would have been VERY polite of GpuGrid to post some notice about this with the new WU releases.

I am very miffed, and justifiably so, at having wasted so much of my GPU time and energy, and effort on my part to hunt down the problem. Indeed, there was NO feedback from GpuGrid on this at all; I only noticed that my RAC kept falling even though I was running WUs pretty much nonstop.

I realize that getting research done is the primary goal, but if GpuGrid is asking people to donate their PC time and GPU time, then please be more polite to your donors.

LLP, PhD

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53317 - Posted: 9 Dec 2019 | 21:38:27 UTC

You can't control the result output file. That is set by the science application under control of the project administrators. The quote you referenced was from Toni acknowledging that he needed to increase the size of the upload server input buffer to handle the larger result files that a few tasks were producing. Not the norm of the usual work we have processed so far. Should be rare cases the results files exceed 250MB.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53320 - Posted: 9 Dec 2019 | 23:39:40 UTC

Neither of those two. The maximum file size is specified in the job specification associated with the task in question. You can (as I did) increase the maximum size by careful editing of the file 'client_state.xml', but it needs a steady hand, some knowledge, and is not for the faint of heart. It shouldn't be needed now, after Toni's correction at source.

Profile God is Love, JC proves it...
Avatar
Send message
Joined: 24 Nov 11
Posts: 30
Credit: 199,308,758
RAC: 459
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 53321 - Posted: 9 Dec 2019 | 23:59:42 UTC - in response to Message 53317.
Last modified: 10 Dec 2019 | 0:03:52 UTC

Hm,
Toni's message (53295) was posted on the 7th. Toni used the past tense on the 7th ("I raised");
yet, https://gpugrid.net/result.php?resultid=21553648
ended on the 8th and still had the same frustrating error.
After running for hours, the results were nonetheless lost:
upload failure: <file_xfer_error>
<file_name>initial_1497-ELISA_GSN4V1-20-100-RND8978_0_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>

Also, I must be just extremely unlucky. Toni says this came up on 'only a handful' of WUs, yet this happened to at least five of the WUs my GPUs ran.

I am holding off on running any GpuGrid WUs for a while, until this problem is more fully corrected.

Just for full disclosure... Industrial Engineers hate waste.

LLP
MS and PhD in Industrial & Systems Engineering.
Registered Prof. Engr. (Industrial Engineering)

Profile God is Love, JC proves it...
Avatar
Send message
Joined: 24 Nov 11
Posts: 30
Credit: 199,308,758
RAC: 459
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 53322 - Posted: 10 Dec 2019 | 0:15:47 UTC
Last modified: 10 Dec 2019 | 0:18:53 UTC

Besides the upload errors,
a couple, resultid=21544426 and resultid=21532174, had said:
"Detected memory leaks!"
So I ran extensive memory diagnostics, but no errors were reported by windoze (extensive as in some eight hours of diagnostics).
Boinc did not indicate if this was RAM or GPU 'memory leaks'

In fact, now I am wondering whether these 'memory leaks' were on my end at all, or on the GpuGrid servers...

LLP
____________
I think ∴ I THINK I am
My thinking neither is the source of my being
NOR proves it to you
God Is Love, Jesus proves it! ∴ we are

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53325 - Posted: 10 Dec 2019 | 8:26:48 UTC - in response to Message 53321.

Hm,
Toni's message (53295) was posted on the 7th. Toni used the past tense on the 7th ("I raised");
yet, https://gpugrid.net/result.php?resultid=21553648
ended on the 8th and still had the same frustrating error.
After running for hours, the results were nonetheless lost:
upload failure: <file_xfer_error>
<file_name>initial_1497-ELISA_GSN4V1-20-100-RND8978_0_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>

That's a different error. Toni's post was about a file size error.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53326 - Posted: 10 Dec 2019 | 9:24:58 UTC - in response to Message 53322.

Besides the upload errors,
a couple, resultid=21544426 and resultid=21532174, had said:
"Detected memory leaks!"
So I ran extensive memory diagnostics, but no errors were reported by windoze (extensive as in some eight hours of diagnostics).
Boinc did not indicate if this was RAM or GPU 'memory leaks'

In fact, now I am wondering whether these 'memory leaks' were on my end at all, or on the GpuGrid servers...

LLP



Such messages are always present in Windows. They are not related to successful or not termination. If an error message is present, it's elsewhere in the output.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53327 - Posted: 10 Dec 2019 | 9:26:15 UTC - in response to Message 53326.
Last modified: 10 Dec 2019 | 9:27:26 UTC

Also, slow and mobile cards should not be used for crunching for the reasons you mention.

Gustav
Send message
Joined: 24 Jul 19
Posts: 1
Credit: 112,819,584
RAC: 0
Level
Cys
Scientific publications
wat
Message 53328 - Posted: 10 Dec 2019 | 9:46:06 UTC

Hi,

I have not received any new WU in like 30-40 days.

Why? Are there no available WU:s for anyone or could it be bad settings on my side?

My PC:s are starving...


Br Thomas

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53329 - Posted: 10 Dec 2019 | 13:50:59 UTC - in response to Message 53328.

Hello,
I think you have to install latest version Nvidia drivers.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53330 - Posted: 10 Dec 2019 | 17:32:54 UTC - in response to Message 53328.

I have not received any new WU in like 30-40 days.Why?
Did you check ACEMD3 in Prefs?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53338 - Posted: 12 Dec 2019 | 2:20:02 UTC

I have another observation to add. One of my computers had an abrupt shutdown (in words, the power was shut off, accidentally, off course), while crunching this unit: initial_1609-ELISA_GSN4V1-19-100-RND7717_0. Upon restart, the unit finished as valid. Which would not have happened with the previous ACEMD app. See link:


http://www.gpugrid.net/result.php?resultid=21561458



Off course the run time is wrong, it should be about 2000 seconds more.


Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53339 - Posted: 12 Dec 2019 | 6:14:45 UTC - in response to Message 53338.

I have another observation to add. One of my computers had an abrupt shutdown (in words, the power was shut off, accidentally, off course)

now that you are saying this - I had a similar situation with one my hosts 2 days ago. The PC shut down and restarted.
I had/have no idea whether this was caused by crunching a GPUGRID task or whether there was any other reasond behind that.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53345 - Posted: 15 Dec 2019 | 8:01:46 UTC

After solving the windows problem and fighting with the MOBO and Windows some more, my system is stable.

BUT.....I still cannot complete a task without an error.

What is error -44 (0xffffffd4)?
BOINC calls it an unknown error.

And all it does is repeat this message: GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# ECC : Disabled
# Global mem : 4096MB
# Capability : 6.1
# PCI ID : 0000:07:00.0
# Device clock : 1480MHz
# Memory clock : 3504MHz
# Memory width : 128bit
# Driver version : r390_99 : 39101
# GPU 0 : 53C
# GPU 0 : 54C
# GPU 0 : 55C
# GPU 0 : 56C
# GPU 0 : 57C

Is there to much OC on the card for it to complete the task in a stable manner or what is going on now?

All other projects are doing ok. Just here I keep getting errors.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53348 - Posted: 15 Dec 2019 | 12:11:24 UTC - in response to Message 53345.

What is error -44 (0xffffffd4)?
BOINC calls it an unknown error.

This is a date issue on your computer. Is your date correct?
Can also be associated with Nvidia license issues but we haven't see that recently.

And all it does is repeat this message: GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# ECC : Disabled
# Global mem : 4096MB
# Capability : 6.1
# PCI ID : 0000:07:00.0
# Device clock : 1480MHz
# Memory clock : 3504MHz
# Memory width : 128bit
# Driver version : r390_99 : 39101
# GPU 0 : 53C
# GPU 0 : 54C
# GPU 0 : 55C
# GPU 0 : 56C
# GPU 0 : 57C

This is STDerr output from ACEMD2 tasks, not the current ACEMD3 tasks.
I cant see any failed tasks on your account, do you have a link to the host or Work unit generating this error?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,944,002,024
RAC: 10,731,454
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53351 - Posted: 15 Dec 2019 | 14:15:04 UTC
Last modified: 15 Dec 2019 | 14:16:52 UTC

Also, Toni has given some general guidelines at his FAQ - Acemd3 application thread.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53356 - Posted: 15 Dec 2019 | 23:34:02 UTC - in response to Message 53348.
Last modified: 15 Dec 2019 | 23:34:54 UTC

What is error -44 (0xffffffd4)?
BOINC calls it an unknown error.

This is a date issue on your computer. Is your date correct?
Can also be associated with Nvidia license issues but we haven't see that recently.

And all it does is repeat this message: GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# ECC : Disabled
# Global mem : 4096MB
# Capability : 6.1
# PCI ID : 0000:07:00.0
# Device clock : 1480MHz
# Memory clock : 3504MHz
# Memory width : 128bit
# Driver version : r390_99 : 39101
# GPU 0 : 53C
# GPU 0 : 54C
# GPU 0 : 55C
# GPU 0 : 56C
# GPU 0 : 57C

This is STDerr output from ACEMD2 tasks, not the current ACEMD3 tasks.
I cant see any failed tasks on your account, do you have a link to the host or Work unit generating this error?



Clock date is correct.
Link http://www.gpugrid.net/result.php?resultid=18119786
and
http://www.gpugrid.net/result.php?resultid=18126912

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53357 - Posted: 15 Dec 2019 | 23:38:19 UTC - in response to Message 53351.

Also, Toni has given some general guidelines at his FAQ - Acemd3 application thread.



Hmm..have to see what those do when I get them.
Right now I am OC'd to the max on my 1050TI.
If I see this stuff show up on my system then I better turn it back to default.
Still running version 2 stuff.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53358 - Posted: 15 Dec 2019 | 23:44:25 UTC - in response to Message 53356.
Last modified: 15 Dec 2019 | 23:46:10 UTC

Clock date is correct.
Link http://www.gpugrid.net/result.php?resultid=18119786
and
http://www.gpugrid.net/result.php?resultid=18126912

First link is from 17th July 2018
Second Link is from 19th July 2018

Yes, there were issues for all volunteers in July 2018.

Do you have any recent errors?

William Lathan
Send message
Joined: 13 Jul 18
Posts: 2
Credit: 15,041,475
RAC: 0
Level
Pro
Scientific publications
wat
Message 53361 - Posted: 16 Dec 2019 | 13:53:35 UTC - in response to Message 53330.

Where in prefs do you find this options?

Thanks,

Bill

P.S. Sorry, I'm new to GPUGrid, and I'm wondering, too, why I haven't received any work to do.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53362 - Posted: 16 Dec 2019 | 14:04:57 UTC - in response to Message 53361.

Where in prefs do you find this options?
Thanks, Bill
P.S. Sorry, I'm new to GPUGrid, and I'm wondering, too, why I haven't received any work to do.
Click your username link at the top of the page.
Then click GPUGrid Preferences.
Then click Edit GPUGrid Preferences.
Then check the box ACEMD3.
Then click Update Preferences.
Then you'll get WUs when they're available. Right now there's not much work so I get only one or two WUs a day.
____________

William Lathan
Send message
Joined: 13 Jul 18
Posts: 2
Credit: 15,041,475
RAC: 0
Level
Pro
Scientific publications
wat
Message 53363 - Posted: 16 Dec 2019 | 14:18:47 UTC - in response to Message 53362.

Many thanks. I modified the settings and now we'll see. Thanks again, Bill

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53364 - Posted: 16 Dec 2019 | 14:32:01 UTC - in response to Message 53358.

Clock date is correct.
Link http://www.gpugrid.net/result.php?resultid=18119786
and
http://www.gpugrid.net/result.php?resultid=18126912

First link is from 17th July 2018
Second Link is from 19th July 2018

Yes, there were issues for all volunteers in July 2018.

Do you have any recent errors?


No..sorry for the confusion.
Just a validate error.
But no running errors yet.
Most current task is in queue to start again and sitting at 38%.
I have a 8 hr cycle currently.
I thought I had seen a task show up in BOINC as an error. Must have been a different project. Oh well.

I could do without all the errors. My system has been driving me crazy earlier.

So I am ok for now. Thanks for the pointer on the date.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53366 - Posted: 16 Dec 2019 | 19:40:57 UTC - in response to Message 53328.
Last modified: 16 Dec 2019 | 20:13:22 UTC

"I have not received any new WU in like 30-40 days."

GUSTAV If you are running other BOINC project GPU tasks also , have you tried giving GPUGRID twice the priority of the other projects you run? This seems to work for me.

Another thing I did was increase the amount of tasks in my cue to 1.5 additional days of work. I've theorized that if my task cue is satisfied when a GPUGRID task comes along, the BOINC server will pass me by with it for that task update. I think the same might apply if your computer is shut down at the time WUs are being dispensed.

Someone more experienced than myself needs to confirm or disprove my guess. I hope I've been helpful.

Profile petnek
Send message
Joined: 30 May 09
Posts: 3
Credit: 32,491,012
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 53368 - Posted: 17 Dec 2019 | 22:41:20 UTC

Hi, bar is empty and my gpu is thirsty. Some news about new batch to crunch? :-)

KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 16
Level
Lys
Scientific publications
watwat
Message 53382 - Posted: 21 Dec 2019 | 14:41:05 UTC - in response to Message 53368.

Hi, bar is empty and my gpu is thirsty. Some news about new batch to crunch? :-)


Second that.

KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 16
Level
Lys
Scientific publications
watwat
Message 53383 - Posted: 21 Dec 2019 | 14:49:48 UTC - in response to Message 53366.

"I have not received any new WU in like 30-40 days."

GUSTAV If you are running other BOINC project GPU tasks also , have you tried giving GPUGRID twice the priority of the other projects you run? This seems to work for me.

Another thing I did was increase the amount of tasks in my cue to 1.5 additional days of work. I've theorized that if my task cue is satisfied when a GPUGRID task comes along, the BOINC server will pass me by with it for that task update. I think the same might apply if your computer is shut down at the time WUs are being dispensed.

Someone more experienced than myself needs to confirm or disprove my guess. I hope I've been helpful.


If you are running other projects, especially Collatz Then you will have to manually control them or Boinc will report cache full, no tasks required. Collatz is prone to flooding the machine with WU's. I have given it one per cent resources, even then it floods my machine.
You have to fish for GPUGRID WU's these days. Starve the queue and let the computer hammer at the server itself.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53384 - Posted: 21 Dec 2019 | 15:23:37 UTC

If you run empty, then go look at the server status. Current server status says there is no work. Also if you check your notices in BOINC manager you will see that it communicates to the project and the project reports back no work to send.

So if there is any work here, maybe you get it maybe you don't. You can set your resource share to 150 or higher and then BOINC manager will try and contact more often to try and get work. They are just not generating a lot of tasks to crunch right now. It's like winning the lottery if you do get some.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53385 - Posted: 21 Dec 2019 | 18:09:14 UTC - in response to Message 53383.

Thanks very much, KAMasud. That clarifies it completely. The "no new tasks" and "suspend" buttons have already proven useful to me during my brief time volunteering on BOINC.

Milkyway just flooded my cue with quick-running opencl Separation 1.46 WUs, so Ill be "GPU productive" while checking the server status here for the next batch.

Incidentally, the Asteroids project is also out of available jobs.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53387 - Posted: 21 Dec 2019 | 20:25:05 UTC

I have set GPU-Grid as my main project and Einstein as second project with 1% the ressource share of GPU-Grid. Works well for me: if there is GPU-Grid work, my machine keeps asking for it and runs it with priority over any Einstein task I have in my buffer. And there are always a few but never too many Einstein task in my buffer. And I'm using a rather short buffer (4h or so) to avoid flooding with backup tasks.

MrS
____________
Scanning for our furry friends since Jan 2002

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53388 - Posted: 21 Dec 2019 | 23:35:57 UTC - in response to Message 53385.

Asteroids project is also out of available jobs.
With Asteroids it's feast or famine. Any day they'll toss up a million WUs and then let it run dry again. It's a nice project since it only needs 0.01 CPU and it's CUDA55 so works good on legacy GPUs.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53389 - Posted: 22 Dec 2019 | 21:10:48 UTC - in response to Message 53387.

Thanks for the tips ETApes & everyone.
I'll shrink my buffer to 4hrs and reduce the other GPU project priorities similarly.

Cheers and a Happy Holiday to all.🍻

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53390 - Posted: 23 Dec 2019 | 0:25:07 UTC

Since this project is so sporadic, I'll leave it at 150% resource share and if something new shows up I'll get 3-4 out of the whole batch.
I run tons of other projects that keep my GPU busy.

You want one to test your system capabilities both in cpu and gpu then go over to primegrid. They have stuff that will use every watt of energy your system has both in GPU and CPU. And they never run out of work.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53391 - Posted: 24 Dec 2019 | 19:07:01 UTC
Last modified: 24 Dec 2019 | 19:13:54 UTC

just had another task which errored out with

195 (0xc3) EXIT_CHILD_FAILED
...
# Engine failed: Particle coordinate is nan
...
http://www.gpugrid.net/result.php?resultid=21577998

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53392 - Posted: 24 Dec 2019 | 19:33:02 UTC

I had this error, and so did everyone else:

(unknown error) - exit code 195 (0xc3)</message>

EXCEPTIONAL CONDITION: src\mdio\bincoord.c, line 193: "nelems != 1"


I wonder what this is?

http://www.gpugrid.net/result.php?resultid=21569667

http://www.gpugrid.net/workunit.php?wuid=16908898



Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53393 - Posted: 24 Dec 2019 | 20:11:53 UTC

Likely an error in retrieving the task from the server. Bad index on the server for the file. Error is in the Management Data Input module which deals with serial communication for example in the ethernet protocol.

Get this type of error on Seti occasionally when the file index in the database the download is trying to access is missing or pointing to the wrong place.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53394 - Posted: 26 Dec 2019 | 13:51:31 UTC - in response to Message 53392.

The previous-step WU created a corrupted output file. This is used as an input in the next workunit, which therefore fails on start.

Remanco
Send message
Joined: 4 Mar 13
Posts: 3
Credit: 30,169,077
RAC: 0
Level
Val
Scientific publications
watwatwat
Message 53423 - Posted: 1 Jan 2020 | 7:26:53 UTC

Can we have more details on this GSN Project?

Thanks!

Sylvain

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53445 - Posted: 14 Jan 2020 | 8:17:15 UTC

Think this is a case of a bad work unit again.

https://www.gpugrid.net/result.php?resultid=21605523

<core_client_version>7.16.3</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
15:18:37 (20880): wrapper (7.7.26016): starting
15:18:37 (20880): wrapper (7.7.26016): starting
15:18:37 (20880): wrapper: running acemd3 (--boinc input --device 1)
ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/trajectory.cpp line 129: Incorrect XSC file
15:18:41 (20880): acemd3 exited; CPU time 3.067561
15:18:41 (20880): app exit status: 0x9e
15:18:41 (20880): called boinc_finish(195)

</stderr_txt>
]]>

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53446 - Posted: 14 Jan 2020 | 11:57:35 UTC - in response to Message 53445.

...
ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/trajectory.cpp line 129: Incorrect XSC file
...

what exactly is an XSC file ?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53447 - Posted: 14 Jan 2020 | 13:22:27 UTC - in response to Message 53446.

It's part of the state which is carried between one simulation piece and the next.

Clive
Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 53459 - Posted: 25 Jan 2020 | 4:11:02 UTC

Hi:

My hungry GPU wants some more workunits.

Anymore coming down the pipe?

Clive



Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53460 - Posted: 25 Jan 2020 | 8:59:44 UTC

Uhh, the floodgates have opened. I'm being inundated with work units.
46 WU's so far.

Clive
Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 53487 - Posted: 26 Jan 2020 | 0:06:31 UTC

Hi:

Good to see that the GPU WUs flood gate have opened up. My GPU has finished the WUs that arrived this morning.

Can I have more GPU WUs?

Clive

Werinbert
Send message
Joined: 12 May 13
Posts: 5
Credit: 100,032,540
RAC: 0
Level
Cys
Scientific publications
wat
Message 53510 - Posted: 27 Jan 2020 | 5:17:51 UTC

I really wish GPUGRID would spread out the work units among all the volunteers rather than give big bunches of WUs to a few volunteers.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53512 - Posted: 27 Jan 2020 | 8:28:28 UTC - in response to Message 53510.

I really wish GPUGRID would spread out the work units among all the volunteers rather than give big bunches of WUs to a few volunteers.


We don't do a selection. When "bursts" of WUs are created, the already connected users tend to get them.

This said, if the host does not meet all criteria (e.g. driver version), it won't get WUs, but there is no explanation why. This is an unfortunate consequence of the BOINC machinery and out of our control.

Clive
Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 53518 - Posted: 28 Jan 2020 | 4:50:42 UTC - in response to Message 53512.

So much for opening the floodgates of GPU WUs to my disappointment I only received 4 workunits. I do not know whether GPUGRID.NET is a victim of its own success, I thought I could use my fast GPU to advance medical research while I am doing my emails and other other tasks.

I am moving on Folding@Home. I trust I will be able to advance in finding a cure for Alzheimer Disease, a disease that killed my loving mom.

Clive

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53533 - Posted: 28 Jan 2020 | 20:20:30 UTC

=====INCREDIBLE - GOT A BOATLOAD - all 6 GPUs are crunching==========


Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53535 - Posted: 28 Jan 2020 | 20:40:16 UTC - in response to Message 53533.


πŸ‘

Most importantly, with success ;)

Profile BladeD
Send message
Joined: 1 May 11
Posts: 9
Credit: 144,358,529
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 53536 - Posted: 29 Jan 2020 | 0:21:22 UTC - in response to Message 53518.

So much for opening the floodgates of GPU WUs to my disappointment I only received 4 workunits. I do not know whether GPUGRID.NET is a victim of its own success, I thought I could use my fast GPU to advance medical research while I am doing my emails and other other tasks.

I am moving on Folding@Home. I trust I will be able to advance in finding a cure for Alzheimer Disease, a disease that killed my loving mom.

Clive

I only get 2 at a time, my GPU has been busy all day!
____________

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53581 - Posted: 31 Jan 2020 | 21:51:04 UTC

When can we expect a solid number of WUs again? I'm dry here, pour me a drink! ;)
____________

Team Belgium

Shayol Ghul
Send message
Joined: 11 Aug 17
Posts: 2
Credit: 1,024,938,819
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 53583 - Posted: 1 Feb 2020 | 12:43:10 UTC

Where are the work units?

Werkstatt
Send message
Joined: 23 May 09
Posts: 121
Credit: 321,525,386
RAC: 177,358
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53594 - Posted: 2 Feb 2020 | 21:35:12 UTC

Where are the work units?

I hoped you could tell me ...

Clive
Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 53595 - Posted: 3 Feb 2020 | 6:07:05 UTC

If you are looking for GPU WUs, head over to Folding@Home. I am there right now and there is plenty of GPU WUs to keep your GPU busy 24/7.

I also work on the occasional WUs that I get from GPUGRID.NET.

COME ON OVER!

Clive
British Columbia, Canada

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53596 - Posted: 3 Feb 2020 | 8:07:01 UTC - in response to Message 53595.
Last modified: 3 Feb 2020 | 8:08:00 UTC

Hang on

gravitonian
Send message
Joined: 24 May 13
Posts: 3
Credit: 20,738,042
RAC: 0
Level
Pro
Scientific publications
wat
Message 53597 - Posted: 3 Feb 2020 | 9:00:42 UTC - in response to Message 53512.

This said, if the host does not meet all criteria (e.g. driver version), it won't get WUs, but there is no explanation why. This is an unfortunate consequence of the BOINC machinery and out of our control.


How to find out if the computer meets requirements? I have a gtx 1660 super and latest drivers, but I can’t get any wu for a month.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53599 - Posted: 3 Feb 2020 | 13:33:10 UTC - in response to Message 53595.

If you are looking for GPU WUs, head over to Folding@Home. I am there right now and there is plenty of GPU WUs to keep your GPU busy 24/7.

I also work on the occasional WUs that I get from GPUGRID.NET.

COME ON OVER!

Clive
British Columbia, Canada

Their software is too buggy to waste my time. F@H should come over to BOINC.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53600 - Posted: 3 Feb 2020 | 13:34:55 UTC - in response to Message 53597.

This said, if the host does not meet all criteria (e.g. driver version), it won't get WUs, but there is no explanation why. This is an unfortunate consequence of the BOINC machinery and out of our control.


How to find out if the computer meets requirements? I have a gtx 1660 super and latest drivers, but I can’t get any wu for a month.

Do you have a recent driver with cuda10 and did you check the ACEMD3 box in Prefs?

gravitonian
Send message
Joined: 24 May 13
Posts: 3
Credit: 20,738,042
RAC: 0
Level
Pro
Scientific publications
wat
Message 53602 - Posted: 3 Feb 2020 | 17:10:48 UTC - in response to Message 53600.

Yes, cuda version 10.2 and ACEMD3 box picked.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,194,946,966
RAC: 10,431,404
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53604 - Posted: 3 Feb 2020 | 23:32:37 UTC - in response to Message 53599.

If you are looking for GPU WUs, head over to Folding@Home. I am there right now and there is plenty of GPU WUs to keep your GPU busy 24/7.

I also work on the occasional WUs that I get from GPUGRID.NET.

COME ON OVER!
Clive
British Columbia, Canada

Their software is too buggy to waste my time.


This is true. I had to reinstall the software after it stopped working several time.


F@H should come over to BOINC.


That's is highly unlikely, since University of California at Berkeley and Stanford University are arch rivals.




Profile BladeD
Send message
Joined: 1 May 11
Posts: 9
Credit: 144,358,529
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 53610 - Posted: 4 Feb 2020 | 19:52:04 UTC - in response to Message 53604.

If you are looking for GPU WUs, head over to Folding@Home. I am there right now and there is plenty of GPU WUs to keep your GPU busy 24/7.

I also work on the occasional WUs that I get from GPUGRID.NET.

COME ON OVER!
Clive
British Columbia, Canada

Their software is too buggy to waste my time.


This is true. I had to reinstall the software after it stopped working several time.


F@H should come over to BOINC.


That's is highly unlikely, since University of California at Berkeley and Stanford University are arch rivals.



So, there's BOINC and....Stanford doesn't have a dog in this fight. If they ever had to do a major rework of their software, I'd bet they would look closely at BOINC!
____________

Clive
Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 53616 - Posted: 6 Feb 2020 | 6:55:44 UTC

If you are looking for GPU WUs, head over to Folding@Home. I am there right now and there is plenty of GPU WUs to keep your GPU busy 24/7.

I also work on the occasional WUs that I get from GPUGRID.NET.

COME ON OVER!
Clive
British Columbia, Canada


Their software is too buggy to waste my time.



This is true. I had to reinstall the software after it stopped working several time.


F@H should come over to BOINC.

Hi

With reference to the gentleman claiming that F@H is too buggy, I have never had to reinstall the Folding@Home s/w. I have two Alienware PCs running fully patched Windows 10 running F@H s/w with no issues.

Clive


Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 53617 - Posted: 6 Feb 2020 | 10:15:15 UTC - in response to Message 53616.
Last modified: 6 Feb 2020 | 10:16:57 UTC

With reference to the gentleman claiming that F@H is too buggy, I have never had to reinstall the Folding@Home s/w.

Apparently I'm not the only one that thinks so and voted with their feet (click Monthly):
https://folding.extremeoverclocking.com/team_summary.php?s=&t=224497

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53618 - Posted: 6 Feb 2020 | 17:52:45 UTC - in response to Message 53602.
Last modified: 6 Feb 2020 | 18:02:06 UTC

@ Gravitonian ===> Are you running more than one GPU project?

If so and you haven't tried this, try giving more priority to GPUGRID by changeing from 100 to 900 on the resource share, the first line of the GPUGRID preferences. This will cause the others to let this project have control of the GPU. When the BOINC server sees that high of a project priority it should download a WU even if the others have filled the cue and run it immediately.

I run Asteroids@Home (3.4% resource share) and Milkyway@Home (1.3% share) to keep my GPUs from thermal cycling and keep 0.3 days of work in each cue. These programs run well and quick on my overclocked gtx750ti and gtx1060.

Hope I helped out.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,944,002,024
RAC: 10,731,454
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53684 - Posted: 18 Feb 2020 | 20:05:39 UTC

On February 3rd 2020 Toni wrote at this same thread:

Hang on

Traduction: "Hi Guys, wait a little longer and you will find out what is good..."

Tasks ready to send (February 18th 2020 - 19:57 UTC): 73.105

pututu
Send message
Joined: 8 Oct 16
Posts: 14
Credit: 613,876,869
RAC: 203,233
Level
Lys
Scientific publications
watwatwatwat
Message 53687 - Posted: 19 Feb 2020 | 16:20:33 UTC

Looking at the server status stat page:
New version of ACEMD 119,657 5,467 0.97 (0.27 - 3.78) 1,622

Average number of task that can be completed per day is 24hrs/0.97 = 24.74 tasks per day per GPU (I assume, some may have more GPUs, some stopped crunching in the last 24hrs but just to get a feel of the task availability).

Since we have 1622 users in the last 24hrs, assuming each user has 1 GPU, then in a day, we can crunch 24.74 x 1622 = 40128 tasks per day.

Current tasks availability is around 3 days (120K), so it seems that we are good unless the new tasks are no longer generated....

pututu
Send message
Joined: 8 Oct 16
Posts: 14
Credit: 613,876,869
RAC: 203,233
Level
Lys
Scientific publications
watwatwatwat
Message 53688 - Posted: 19 Feb 2020 | 16:20:35 UTC

Looking at the server status stat page:
New version of ACEMD 119,657 5,467 0.97 (0.27 - 3.78) 1,622

Average number of task that can be completed per day is 24hrs/0.97 = 24.74 tasks per day per GPU (I assume, some may have more GPUs, some stopped crunching in the last 24hrs but just to get a feel of the task availability).

Since we have 1622 users in the last 24hrs, assuming each user has 1 GPU, then in a day, we can crunch 24.74 x 1622 = 40128 tasks per day.

Current tasks availability is around 3 days (120K), so it seems that we are good unless the new tasks are no longer generated....

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53689 - Posted: 19 Feb 2020 | 16:30:58 UTC - in response to Message 53688.


Current tasks availability is around 3 days (120K), so it seems that we are good unless the new tasks are no longer generated....


Thanks for the calculations.

New tasks are automatically generated 1:1 when existing ones finish, until approx. 10x the current load.

pututu
Send message
Joined: 8 Oct 16
Posts: 14
Credit: 613,876,869
RAC: 203,233
Level
Lys
Scientific publications
watwatwatwat
Message 53690 - Posted: 19 Feb 2020 | 17:16:01 UTC - in response to Message 53689.


Current tasks availability is around 3 days (120K), so it seems that we are good unless the new tasks are no longer generated....


Thanks for the calculations.

New tasks are automatically generated 1:1 when existing ones finish, until approx. 10x the current load.



Awesome. I think having at least 3 days of task or more will keep the GPU cards busy and the crunchers happy.

SolidAir79
Send message
Joined: 22 Aug 19
Posts: 7
Credit: 168,393,363
RAC: 0
Level
Ile
Scientific publications
wat
Message 53692 - Posted: 19 Feb 2020 | 17:29:39 UTC

First hello, im crunching for the TSBT ,I'm getting errors probably 1 in 5 of the wu's here is one of the messages:
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
16:47:24 (2960): wrapper (7.9.26016): starting
16:47:24 (2960): wrapper: running acemd3.exe (--boinc input --device 0)
# Engine failed: Particle coordinate is nan
16:50:06 (2960): acemd3.exe exited; CPU time 160.390625
16:50:06 (2960): app exit status: 0x1
16:50:06 (2960): called boinc_finish(195)
0 bytes in 0 Free Blocks.
434 bytes in 8 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 1685764 bytes.

I could give you the full dump detail if needed.
My rig is i9 9900k & rtx 2080, 16gb 4000mhz mem, WIN 10 PRO

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53695 - Posted: 19 Feb 2020 | 18:31:03 UTC - in response to Message 53692.

# Engine failed: Particle coordinate is nan

Unless the task itself is misformulated, and you can check with others running the same series, the error says the card made a math error.

Too far overclocked or not enough cooling and the card is running hot.

SolidAir79
Send message
Joined: 22 Aug 19
Posts: 7
Credit: 168,393,363
RAC: 0
Level
Ile
Scientific publications
wat
Message 53696 - Posted: 19 Feb 2020 | 19:11:42 UTC - in response to Message 53695.

Temps not an issue will try lowering the clocks cheers

Finrond
Send message
Joined: 26 Jun 12
Posts: 12
Credit: 868,186,385
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53697 - Posted: 19 Feb 2020 | 19:23:36 UTC

Is it normal for the credits to be much lower than with the old version?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53700 - Posted: 19 Feb 2020 | 20:11:33 UTC - in response to Message 53697.

Is it normal for the credits to be much lower than with the old version?

Yes, the credit awarded is scaled to the GFLOPS required to crunch the task or roughly equivalent to the time it takes to crunch.

The old tasks with the old app ran for several more hours apiece compared to the current work.

SolidAir79
Send message
Joined: 22 Aug 19
Posts: 7
Credit: 168,393,363
RAC: 0
Level
Ile
Scientific publications
wat
Message 53708 - Posted: 20 Feb 2020 | 9:41:18 UTC - in response to Message 53696.

Temps not an issue will try lowering the clocks cheers

Seems to have worked cheers funny though benchmarked the card played games and crunched on other projects with no issue but no errors so far so good!

Finrond
Send message
Joined: 26 Jun 12
Posts: 12
Credit: 868,186,385
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53709 - Posted: 20 Feb 2020 | 14:25:20 UTC - in response to Message 53700.
Last modified: 20 Feb 2020 | 14:25:48 UTC

Is it normal for the credits to be much lower than with the old version?

Yes, the credit awarded is scaled to the GFLOPS required to crunch the task or roughly equivalent to the time it takes to crunch.

The old tasks with the old app ran for several more hours apiece compared to the current work.



Yes, I understand shorter work units will grant less credit, I meant during the course of a day. I am getting roughly half or less PPD than with the older longer units.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53710 - Posted: 20 Feb 2020 | 15:08:16 UTC - in response to Message 53709.

The old MDAD WUs miscalculated credits.

Finrond
Send message
Joined: 26 Jun 12
Posts: 12
Credit: 868,186,385
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53717 - Posted: 21 Feb 2020 | 19:05:19 UTC - in response to Message 53710.

The old MDAD WUs miscalculated credits.



Are those the old ACEMD Long Runs WU's? I am getting about half as much PPD on the New Version of ACEMD vs the Long Runs wu's on the previous version.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53718 - Posted: 21 Feb 2020 | 19:11:19 UTC - in response to Message 53717.

The old MDAD WUs miscalculated credits.



Are those the old ACEMD Long Runs WU's? I am getting about half as much PPD on the New Version of ACEMD vs the Long Runs wu's on the previous version.

No, completely different application and different tasks. No relationship to previous work.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 53733 - Posted: 22 Feb 2020 | 3:08:11 UTC - in response to Message 53710.
Last modified: 22 Feb 2020 | 3:26:26 UTC

The old MDAD WUs miscalculated credits.


...But it was fun while it lasted! πŸ’ΈπŸ“ˆ

Finrond
Send message
Joined: 26 Jun 12
Posts: 12
Credit: 868,186,385
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53761 - Posted: 25 Feb 2020 | 15:41:50 UTC - in response to Message 53718.

The old MDAD WUs miscalculated credits.



Are those the old ACEMD Long Runs WU's? I am getting about half as much PPD on the New Version of ACEMD vs the Long Runs wu's on the previous version.

No, completely different application and different tasks. No relationship to previous work.



So it is normal to get fewer credits per day than with the old Long run WU's? I am just wondering if I am the only one, thats all.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 53762 - Posted: 25 Feb 2020 | 15:49:10 UTC - in response to Message 53761.

The old MDAD WUs miscalculated credits.



Are those the old ACEMD Long Runs WU's? I am getting about half as much PPD on the New Version of ACEMD vs the Long Runs wu's on the previous version.

No, completely different application and different tasks. No relationship to previous work.



So it is normal to get fewer credits per day than with the old Long run WU's? I am just wondering if I am the only one, thats all.

Yes.

Wilgard
Send message
Joined: 4 Mar 20
Posts: 14
Credit: 3,127,716
RAC: 0
Level
Ala
Scientific publications
wat
Message 53849 - Posted: 4 Mar 2020 | 14:26:06 UTC

Hello,

I have just add GPUGRID as a new projet in BOINC.
But the client cannot get new WUs.
The server status as WUs available.


https://i.ibb.co/827wQZN/GPUGRID-server-status.jpg


As you can see in my logs file


04/03/2020 14:36:06 | | Fetching configuration file from http://www.gpugrid.net/get_project_config.php
04/03/2020 14:36:49 | GPUGRID | Master file download succeeded
04/03/2020 14:36:54 | GPUGRID | Sending scheduler request: Project initialization.
04/03/2020 14:36:54 | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
04/03/2020 14:36:56 | GPUGRID | Scheduler request completed: got 0 new tasks
04/03/2020 14:36:56 | GPUGRID | No tasks sent
04/03/2020 14:36:58 | GPUGRID | Started download of logogpugrid.png
04/03/2020 14:36:58 | GPUGRID | Started download of project_1.png
04/03/2020 14:36:58 | GPUGRID | Started download of project_2.png
04/03/2020 14:36:58 | GPUGRID | Started download of project_3.png
04/03/2020 14:36:59 | GPUGRID | Finished download of logogpugrid.png
04/03/2020 14:36:59 | GPUGRID | Finished download of project_1.png
04/03/2020 14:36:59 | GPUGRID | Finished download of project_2.png
04/03/2020 14:36:59 | GPUGRID | Finished download of project_3.png
04/03/2020 14:37:31 | GPUGRID | Sending scheduler request: To fetch work.
04/03/2020 14:37:31 | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
04/03/2020 14:37:32 | GPUGRID | Scheduler request completed: got 0 new tasks
04/03/2020 14:37:32 | GPUGRID | No tasks sent


Do you have any idea ?


Best Regards,

Wilgard

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53850 - Posted: 4 Mar 2020 | 17:27:05 UTC - in response to Message 53849.

I have just add GPUGRID as a new projet in BOINC.
But the client cannot get new WUs.
The server status as WUs available.

Check your card and drivers. One or both may be too old.
http://www.gpugrid.net/forum_thread.php?id=5002#52865

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,647,457,483
RAC: 75,402,602
Level
Trp
Scientific publications
wat
Message 53852 - Posted: 4 Mar 2020 | 18:40:51 UTC - in response to Message 53850.

looks like Windows only has CUDA92 and CUDA101 apps. his driver version (382.xx) is only compatible with CUDA80.

for the CUDA92 app you need driver 396.26+
for the CUDA101 app you need driver 418.39+

CUDA80 apps exist for Linux, however.

Wilgard,

probably best to update the drivers to the latest if you want to stick to Windows, you have CUDA 10+ drivers available for that Quadro M520 you have.

Wilgard
Send message
Joined: 4 Mar 20
Posts: 14
Credit: 3,127,716
RAC: 0
Level
Ala
Scientific publications
wat
Message 53863 - Posted: 5 Mar 2020 | 13:23:50 UTC - in response to Message 53852.

I am really imppresed. That was the issue I had.
My NVIDIA driver was in 382.29 and after an update it is in 422.50
So now IT WORKS !!
Many thanks "Jim1348" and "Ian&Steve C."

Best Regards,

Wilgard

Post to thread

Message boards : News : New workunits

//