Advanced search

Message boards : News : Windows GPU Applications broken

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 49925 - Posted: 17 Jul 2018 | 12:34:38 UTC

Currently we have the windows applications broken. We are looking into it.

Linux and CPU jobs work fine

Jim1348
Send message
Joined: 28 Jul 12
Posts: 632
Credit: 1,201,962,935
RAC: 144,313
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 49926 - Posted: 17 Jul 2018 | 13:11:15 UTC - in response to Message 49925.

Thanks, but all the GPU work has been cancelled for Linux also. Maybe you could add some back?

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 49929 - Posted: 17 Jul 2018 | 15:24:14 UTC - in response to Message 49926.

I have now deprecated the Windows apps, so we can put some more work for Linux


Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 49930 - Posted: 17 Jul 2018 | 15:28:29 UTC - in response to Message 49929.

We are trying to create a new app for Windows, but it might take few days.

gdf

PS: can you post here some of the WU failed for Windows so that I can easily find the error message?

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 49931 - Posted: 17 Jul 2018 | 15:28:30 UTC - in response to Message 49929.

We are trying to create a new app for Windows, but it might take few days.

gdf

PS: can you post here some of the WU failed for Windows so that I can easily find the error message?

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 271
Credit: 797,167,484
RAC: 1,319,186
Level
Glu
Scientific publications
watwat
Message 49932 - Posted: 17 Jul 2018 | 16:08:24 UTC - in response to Message 49931.

PS: can you post here some of the WU failed for Windows so that I can easily find the error message?


Error message is always the same..

<core_client_version>7.10.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)</message>
]]>


take some of my recent ones.
https://www.gpugrid.net/results.php?userid=146761
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

Profile SMTB1963
Avatar
Send message
Joined: 27 Jun 10
Posts: 38
Credit: 326,236,371
RAC: 650,517
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 49933 - Posted: 17 Jul 2018 | 17:04:17 UTC

Thanks for the update, GDF. Noticed this yesterday on my 2 Win boxes. Here's a couple of the errors:

https://www.gpugrid.net/result.php?resultid=18149877

Stderr output
<core_client_version>7.10.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)</message>
]]>


https://www.gpugrid.net/result.php?resultid=18146999

Stderr output
<core_client_version>7.10.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)</message>
]]>

Good luck sorting it out...

Cheers!

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49934 - Posted: 17 Jul 2018 | 18:20:47 UTC

We had the same error number (I searched for 0xffffffd4) and the same symptoms - every task failing on Windows machines, for one of the Windows apps, at the same time - on 14/15 April 2017.

The conclusion reached in all WUs downloaded recently produce "computation error" right away (that's last years thread, not the similarly named thread this week) was that the licence for one component included in the app had expired.

Profile [AF] fansyl
Send message
Joined: 26 Sep 13
Posts: 8
Credit: 759,769,022
RAC: 93,373
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 49936 - Posted: 17 Jul 2018 | 19:31:02 UTC

I think if all UTs generate the same error, it should be pretty simple.

Good luck to fix!

mmonnin
Send message
Joined: 2 Jul 16
Posts: 177
Credit: 323,408,389
RAC: 1,558,026
Level
Asp
Scientific publications
wat
Message 49937 - Posted: 17 Jul 2018 | 22:15:15 UTC - in response to Message 49926.

Thanks, but all the GPU work has been cancelled for Linux also. Maybe you could add some back?


This is still the case. Now everyone is doing nothing.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 49938 - Posted: 18 Jul 2018 | 2:42:09 UTC

I just got 243000 credits for a GPU task on my main Linux box.It and the the Linux laptop are running QC tasks.
Tullio

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49973 - Posted: 20 Jul 2018 | 23:00:12 UTC

Could someone please explain this? (Windows hosts still receive and fail GPU workunits, while the Windows app is deprecated)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49974 - Posted: 20 Jul 2018 | 23:23:30 UTC - in response to Message 49973.

Just caught myself a couple of those on my mine canary - the one which will tell me when good work starts to flow again.

I think it may result from a mis-understanding of what it means to 'deprecate' an application in the BOINC world. There are other oddities, like a string of

21/07/2018 00:16:01 | | [unparsed_xml] FILE_REF::parse(): unrecognized: 'rboinc/'

in the log - one for every downloaded task file. They look like

<file_ref>
<file_name>e46s10_e41s8p0f164-ADRIA_FOLDT1008_v2_predicted_pred_ss_contacts_50_T1008_2-0-LICENSE</file_name>
<open_name>LICENSE</open_name>
<rboinc/>
</file_ref>

<rboinc/> is meaningless in that context and shouldn't be there.

Ben_New_PC
Send message
Joined: 20 Jun 15
Posts: 5
Credit: 13,375,425
RAC: 26,263
Level
Pro
Scientific publications
wat
Message 49976 - Posted: 21 Jul 2018 | 3:17:32 UTC - in response to Message 49937.

A volunteer effort. doubtless "Measure twice cut once" at this time.

Possibly related to the thing that derailed BOINC,
some new windows patch. ??

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 49978 - Posted: 21 Jul 2018 | 7:19:00 UTC - in response to Message 49973.

... (Windows hosts still receive and fail GPU workunits, while the Windows app is deprecated)

this seems to explain why on the Server Status Page the tasks show error rates of 81% and higher.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49985 - Posted: 21 Jul 2018 | 19:34:54 UTC - in response to Message 49925.
Last modified: 21 Jul 2018 | 19:37:39 UTC

I had been running the Windows App on my fastest system successfully after the 14th July license problems (which occur annually but seem to take you by surprise every time) by turning my system time back. However, now you seem to have deprecated it and I get nothing.

I am continually disappointed by the way this project is run.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 49986 - Posted: 21 Jul 2018 | 22:36:26 UTC - in response to Message 49985.

I had been running the Windows App on my fastest system successfully after the 14th July license problems (which occur annually but seem to take you by surprise every time) by turning my system time back. However, now you seem to have deprecated it and I get nothing.

How come you didn't tell us to set the time back before? We could've kept crunching

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49987 - Posted: 21 Jul 2018 | 23:30:01 UTC - in response to Message 49986.

Yes, sorry that I didn't. However, the problem is a recurring one and I thought everyone knew from the last time it happened.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49988 - Posted: 22 Jul 2018 | 3:58:39 UTC

We just have to be patient for the scientists to fix the problems. In the meantime, some other GPU projects I'm attached to (with 0 resource share, as backup projects), are getting some extra work done for them :)

flashawk
Send message
Joined: 18 Jun 12
Posts: 284
Credit: 2,292,766,947
RAC: 1,334,028
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 49989 - Posted: 22 Jul 2018 | 5:37:12 UTC

If your running other projects other than GPUGrid, I would strongly recommend that you don't set your clock back, you could get validation errors or WU's canceled because of deadline over-run.

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 49990 - Posted: 22 Jul 2018 | 7:41:37 UTC - in response to Message 49987.

... the problem is a recurring one ...

indeed it is.
What I am wondering is that obviously no one keeps track of the expiration dates of the various licenses :-(

James C. Owens
Send message
Joined: 16 Apr 09
Posts: 2
Credit: 1,642,321,882
RAC: 1,299,835
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49994 - Posted: 22 Jul 2018 | 20:15:25 UTC

I am surprised it is taking this long to fix the issue...
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49995 - Posted: 22 Jul 2018 | 20:51:27 UTC - in response to Message 49994.

I am surprised it is taking this long to fix the issue...

Holidays, weekends, too few people.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49996 - Posted: 22 Jul 2018 | 21:43:06 UTC - in response to Message 49995.

I am surprised it is taking this long to fix the issue...

Holidays, weekends, too few people.

Oh, c'mon. We know that there are too few people on workdays too.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49997 - Posted: 23 Jul 2018 | 8:34:48 UTC - in response to Message 49996.
Last modified: 23 Jul 2018 | 8:48:11 UTC

Maybe just too few people interested.

They by their own admission have few people who understand BOINC, the very platform that gives them 2.5 Pflops/sec of free computing power.
I think if someone donated that much computing power to me at no cost I would have made sure I had people that understood BOINC on hand and paid attention to very simple issues like licensing.
But apparently it's not as important as they keep saying or they would look after the resource much better than they do.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 49998 - Posted: 23 Jul 2018 | 8:40:32 UTC - in response to Message 49997.
Last modified: 23 Jul 2018 | 8:51:59 UTC

As far as I know a new app was uploaded but still not working. Licensing is not related to BOINC.

The problem with BOINC is that it runs the apps in an almost opaque environment, so when things go wrong there is no useful indication on the direction to take to a speedy fix.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49999 - Posted: 23 Jul 2018 | 8:55:00 UTC - in response to Message 49998.

Licensing is not related to BOINC.



I didn't say it was but it is related to your project and App.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50000 - Posted: 23 Jul 2018 | 10:02:08 UTC - in response to Message 49999.

@betting slip

for some reason your host 171874 is the only one where the app is still working.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50001 - Posted: 23 Jul 2018 | 10:27:32 UTC - in response to Message 50000.

I turned the system clock back to before the time the license expired which allowed the Windows App to work but then someone deprecated the App thus I couldn't do that any longer.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50002 - Posted: 23 Jul 2018 | 10:39:31 UTC - in response to Message 50001.

I turned the system clock back to before the time the license expired ...

hm, how did you know in advance ?

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50003 - Posted: 23 Jul 2018 | 10:42:31 UTC - in response to Message 49998.

The problem with BOINC is that it runs the apps in an almost opaque environment, so when things go wrong there is no useful indication on the direction to take to a speedy fix.

did you ever consider to come up with your own application, indipendent of BOINC (similar to what FAH is doing)?

Stefan
Volunteer moderator
Project developer
Project scientist
Send message
Joined: 5 Mar 13
Posts: 329
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 50004 - Posted: 23 Jul 2018 | 11:36:00 UTC - in response to Message 50003.

Yes but it's a huge development effort. Admittedly once done you have full control but we don't have the resources to do this.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50005 - Posted: 23 Jul 2018 | 12:05:33 UTC - in response to Message 50004.

This may be incredibly ignorant of me but why not release the old windows app with a different version number and renewed license.
Everything works again...no?
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50008 - Posted: 23 Jul 2018 | 12:56:12 UTC - in response to Message 49985.

I had been running the Windows App on my fastest system successfully after the 14th July license problems (which occur annually but seem to take you by surprise every time) by turning my system time back.

If your running other projects other than GPUGrid, I would strongly recommend that you don't set your clock back, you could get validation errors or WU's canceled because of deadline over-run.

I would like to add that setting of the system clock back in time will break Windows Update and the update automation of many antivirus products, so this is highly not recommended thus could serve only as a temporary measure and do it only at your own risk.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50009 - Posted: 23 Jul 2018 | 14:01:14 UTC

I will be switching all of my crunching systems to Linux as I don't see this will be fixed anytime soon.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50010 - Posted: 23 Jul 2018 | 14:07:21 UTC

I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50011 - Posted: 23 Jul 2018 | 14:33:58 UTC - in response to Message 50010.

I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude.

Well their data still needs to be processed at the end of the day, and if we wait around and don't adapt to the situation the simulations will still be sitting there waiting to be processed and we will be that much further away from a cure.

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 271
Credit: 797,167,484
RAC: 1,319,186
Level
Glu
Scientific publications
watwat
Message 50012 - Posted: 23 Jul 2018 | 16:07:24 UTC

I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems


+1
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50014 - Posted: 23 Jul 2018 | 16:59:42 UTC - in response to Message 50011.
Last modified: 23 Jul 2018 | 17:00:35 UTC

I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude.

Well their data still needs to be processed at the end of the day, and if we wait around and don't adapt to the situation the simulations will still be sitting there waiting to be processed and we will be that much further away from a cure.

I will be switching all of my crunching systems to Linux as I don't see this will be fixed anytime soon.
+1
I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend.
I wanted to do it anyway to get rid of WDDM.
I'd like to have SWAN_SYNC in the Linux app too.

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50020 - Posted: 23 Jul 2018 | 19:09:10 UTC - in response to Message 50014.

...I wanted to do it anyway to get rid of WDDM.
I'd like to have SWAN_SYNC in the Linux app too.

good point.
Just out of interest, my question is: what slows up GPUGRID crunching more: the Windows WDDM or the missing Swan_sync with Linux?

James C. Owens
Send message
Joined: 16 Apr 09
Posts: 2
Credit: 1,642,321,882
RAC: 1,299,835
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50021 - Posted: 23 Jul 2018 | 19:29:45 UTC - in response to Message 49998.
Last modified: 23 Jul 2018 | 19:38:25 UTC

BOINC provides the capabilities/procedures to run a BOINC application in a test environment outside of the normal download/upload process. The licensing problem has nothing to do with BOINC, otherwise other projects would be failing right and left...

This particular issue everyone is having is 100% reproducible on Windows. I am probably providing what you already have, but see...

https://boinc.berkeley.edu/trac/wiki/AppDebug
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 632
Credit: 1,201,962,935
RAC: 144,313
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 50022 - Posted: 23 Jul 2018 | 19:32:24 UTC - in response to Message 50020.

Just out of interest, my question is: what slows up GPUGRID crunching more: the Windows WDDM or the missing Swan_sync with Linux?

I will offer my 2 cents. There is a much bigger gain getting rid of WDDM and going to Linux. I have used Swan_sync with Windows only, but I would be surprised if you see much gain using Swan_sync with Linux, even if you can figure out how to do it.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 177
Credit: 323,408,389
RAC: 1,558,026
Level
Asp
Scientific publications
wat
Message 50023 - Posted: 23 Jul 2018 | 19:33:24 UTC - in response to Message 50014.

I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude.

Well their data still needs to be processed at the end of the day, and if we wait around and don't adapt to the situation the simulations will still be sitting there waiting to be processed and we will be that much further away from a cure.

I will be switching all of my crunching systems to Linux as I don't see this will be fixed anytime soon.
+1
I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend.
I wanted to do it anyway to get rid of WDDM.
I'd like to have SWAN_SYNC in the Linux app too.


SWAN_SYNC is not needed. I have never had to adjust priority of CPU apps or reserve a CPU thread for GPU projects in Linux. GPU apps just take what is needed and the CPUs apps get what is left.

GPU utilization is just higher in Linux w/o any settings and CPU util is around 15-20% for GPUGrid.

I am currently running FAH on a GPU and 4 BOINC CPU tasks on one PC. Another PC is running GPUGrid and 16 CPU tasks.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50024 - Posted: 23 Jul 2018 | 20:11:12 UTC - in response to Message 50023.
Last modified: 23 Jul 2018 | 20:18:51 UTC

I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend.
I wanted to do it anyway to get rid of WDDM.
I'd like to have SWAN_SYNC in the Linux app too.

SWAN_SYNC is not needed. I have never had to adjust priority of CPU apps or reserve a CPU thread for GPU projects in Linux. GPU apps just take what is needed and the CPUs apps get what is left.
Well, the stats pages do not support your argument.
1. Before the Windows app broke down, I was the #1 on the "Performance" tab in the "Top average performers (last week Long Runs)" with my three Windows 10 + SWAN_SYNC ON + GTX 1080 Ti hosts (my GPUs are factory overclocked, but I don't use fancy water cooling)
2. Check the following batches on the Performance page the "Top performers per batch":
PABLO_IDP_P01106_2_ASNP21P_ID
PABLO_IDP_P01106_2_ASNP3P_ID
PABLO_IDP_P01106_4_LEUP14P_ID

You'll find that my GTX 980 Ti beats, or gets very close to GTX 1080 Tis, and GTX TITAN X (Pascal) GPUs running under Linux.
That's because it was running under Windows XP (without WDDM) and with SWAN_SYNC ON.

GPU utilization is just higher in Linux w/o any settings and CPU util is around 15-20% for GPUGrid.
True, but it could be even higher with SWAN_SYNC ON.

I am currently running FAH on a GPU and 4 BOINC CPU tasks on one PC. Another PC is running GPUGrid and 16 CPU tasks.
Well, that's irrelevant for me, as I don't run CPU tasks at all, as I want to optimize my PC for GPUGrid.

All in all: I'd like to have the option under Linux to assign a full CPU thread / core to my GPUGrid tasks with SWAN_SYNC on, as it will make tasks crunch faster on Linux too.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 177
Credit: 323,408,389
RAC: 1,558,026
Level
Asp
Scientific publications
wat
Message 50026 - Posted: 24 Jul 2018 | 2:47:09 UTC - in response to Message 50024.
Last modified: 24 Jul 2018 | 2:48:32 UTC

I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend.
I wanted to do it anyway to get rid of WDDM.
I'd like to have SWAN_SYNC in the Linux app too.

SWAN_SYNC is not needed. I have never had to adjust priority of CPU apps or reserve a CPU thread for GPU projects in Linux. GPU apps just take what is needed and the CPUs apps get what is left.
Well, the stats pages do not support your argument.
1. Before the Windows app broke down, I was the #1 on the "Performance" tab in the "Top average performers (last week Long Runs)" with my three Windows 10 + SWAN_SYNC ON + GTX 1080 Ti hosts (my GPUs are factory overclocked, but I don't use fancy water cooling)
2. Check the following batches on the Performance page the "Top performers per batch":
PABLO_IDP_P01106_2_ASNP21P_ID
PABLO_IDP_P01106_2_ASNP3P_ID
PABLO_IDP_P01106_4_LEUP14P_ID

You'll find that my GTX 980 Ti beats, or gets very close to GTX 1080 Tis, and GTX TITAN X (Pascal) GPUs running under Linux.
That's because it was running under Windows XP (without WDDM) and with SWAN_SYNC ON.

GPU utilization is just higher in Linux w/o any settings and CPU util is around 15-20% for GPUGrid.
True, but it could be even higher with SWAN_SYNC ON.

I am currently running FAH on a GPU and 4 BOINC CPU tasks on one PC. Another PC is running GPUGrid and 16 CPU tasks.
Well, that's irrelevant for me, as I don't run CPU tasks at all, as I want to optimize my PC for GPUGrid.

All in all: I'd like to have the option under Linux to assign a full CPU thread / core to my GPUGrid tasks with SWAN_SYNC on, as it will make tasks crunch faster on Linux too.


I never once mentioned overall performance and was not referencing anything about performance but GPU utilization.

You failed to see that even with many things running in different situations that the GPU is fully utilized in Linux w/o wasting a CPU thread.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50028 - Posted: 24 Jul 2018 | 8:42:02 UTC - in response to Message 50026.
Last modified: 24 Jul 2018 | 8:43:09 UTC

You failed to see that even with many things running in different situations that the GPU is fully utilized in Linux w/o wasting a CPU thread.
You failed to see that a GTX 1080 Ti can't be fully utilized under Linux if a fully utilized GTX 980 Ti (previous generation) can achieve 98.66% of its performance.
I see that you and I use or computers in a different manner:
I do not consider feeding a GPU with a full CPU thread as waste, because I know that otherwise I'm wasting 5-15% performance of my GPUs. The lack of SWAN_SYNC in the Linux client forces me to waste that much GPU performance. I want to have this choice, while you don't. Therefore you don't need SWAN_SYNC, while I (and many others) do. So there's no point for us to go on with this argument.
Also, this argument is off topic here. This is my last post in this thread about this topic.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50029 - Posted: 24 Jul 2018 | 10:51:43 UTC

The GPU is clocked up to its maximum clock frequency when computing, let's say 1999mhz for Pascal. It takes upwards of 1.0620 volts to maintain this frequency. If you aren't feeding this frequency with data at an acceptable rate, you are technically wasting power because most of the cycles are going to waste. The GPU only draws slightly more current when loading the GPU at the same voltage, thus making the whole process more efficient.

I too, would like SWAN_SYNC on Linux as an OPTION so we can choose whether or not we can use it. But if this means development work for the researchers then I definitely think fixing the GPU windows app is the main priority and making a windows CPU app comes before this new option.

DRSMT
Send message
Joined: 23 Feb 17
Posts: 15
Credit: 287,647,318
RAC: 865,813
Level
Asn
Scientific publications
wat
Message 50030 - Posted: 24 Jul 2018 | 13:26:42 UTC - in response to Message 50011.

Is our crunching work directly used for a cure / medicine, or is it just published as theoretically simulated / calculated results?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50031 - Posted: 24 Jul 2018 | 13:27:51 UTC - in response to Message 50021.
Last modified: 24 Jul 2018 | 13:31:35 UTC

Toni said (23 Jul 2018 | 8:40:32 UTC):

As far as I know a new app was uploaded but still not working. Licensing is not related to BOINC.

The problem with BOINC is that it runs the apps in an almost opaque environment, so when things go wrong there is no useful indication on the direction to take to a speedy fix.


James C. Owens said (23 Jul 2018 | 19:29:45 UTC):
BOINC provides the capabilities/procedures to run a BOINC application in a test environment outside of the normal download/upload process. The licensing problem has nothing to do with BOINC, otherwise other projects would be failing right and left...

This particular issue everyone is having is 100% reproducible on Windows. I am probably providing what you already have, but see...

https://boinc.berkeley.edu/trac/wiki/AppDebug


Exactly!

I've been told that BOINC provides tons of tools for figuring where and why failures happen. And that link seems very useful.
https://boinc.berkeley.edu/trac/wiki/AppDebug

Also, if the admins are looking for help in ways to solve problems or improve BOINC, they might post to the boinc_projects email list:
https://boinc.berkeley.edu/trac/wiki/EmailLists

Regards,
Jacob

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50032 - Posted: 24 Jul 2018 | 14:00:57 UTC - in response to Message 50030.

Is our crunching work directly used for a cure / medicine, or is it just published as theoretically simulated / calculated results?

Keep in mind, both are useful. Other researchers can use simulated protein folding to calculate what to do for their drug.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50033 - Posted: 24 Jul 2018 | 14:30:31 UTC - in response to Message 50030.
Last modified: 24 Jul 2018 | 14:32:35 UTC

Is our crunching work directly used for a cure / medicine, or is it just published as theoretically simulated / calculated results?


The short answer is no to the first part and yes to the second.

One of its purposes is as a teaching tool for PHD students. If they discovered a method or an insight that was commercially valuable and helped the biomedical industry they would patent it and sell or license it.

In the meantime they produce scientific papers with methods or insights that gets the student their PHD or not.

The best you can hope for as far as a cure is concerned is that the simulations may point the way for someone else to explore or that one of their successful PHD students goes on in later years to make a difference such as finding a real cure for cancer or other major disease.

But really, they are never going to run anything seriously groundbreaking on your computer
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 271
Credit: 797,167,484
RAC: 1,319,186
Level
Glu
Scientific publications
watwat
Message 50034 - Posted: 24 Jul 2018 | 15:14:34 UTC - in response to Message 50033.
Last modified: 24 Jul 2018 | 15:20:40 UTC

The short answer is no to the first part and yes to the second.


Well, my answer to the first part would be a little more optimistic, say "not...yet", as it is all about computing power. Imagine, modern high end GPUs are now as powerful as super-computers back in the year 2000. Still too slow to handle big proteins, but there is some progress. The upcoming Turing Generation seems to be again 20-40% faster than its predecessor Pascal and this will continue until tunnel effects obstruct further shrinks. Having said this, there are some new technologies in development to reduce that effects. And of course Quantum Computers will be in the ascendant in a couple of years, as big companies like IBM, Microsoft or Google put a lot of capital in it (but for reasons other than drug science).

Lets keep on crunching and see where this road goes to. One is for sure, computers science and medicine will be entirely different in 10 years from now.
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 271
Credit: 797,167,484
RAC: 1,319,186
Level
Glu
Scientific publications
watwat
Message 50036 - Posted: 25 Jul 2018 | 8:09:08 UTC

As I wrote, there is some progress...
https://journals.aps.org/prx/abstract/10.1103/PhysRevX.8.031022
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50037 - Posted: 25 Jul 2018 | 8:28:00 UTC

It appears to use a 4 qubits quantum computer, probably provided by Google since one of the authors is a Goggle person.
Tullio

Stefan
Volunteer moderator
Project developer
Project scientist
Send message
Joined: 5 Mar 13
Posts: 329
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 50038 - Posted: 25 Jul 2018 | 8:30:40 UTC

Eh, I would not go as far as to say it's mostly a tool for PhD students as BettingSlip mentioned (although I'm sure he didn't mean it in a negative way). The theoretical research being published is used to progress science and the specific field, it's not like this work ends up as fluff for a PhD thesis.

Simulations we have done have provided interesting insights in disordered proteins, protein-protein associations and more which can be used by the industry as BettingSlip mentioned.

Currently we are also trying to get more into drug design in our lab so you might see interesting more direct applications in the next few years.

But yes, the connection between theoretical and direct application is often hard to see or appreciate, which is what leads to lack of funding for basic research which is arguably inefficient in the long term.

DRSMT
Send message
Joined: 23 Feb 17
Posts: 15
Credit: 287,647,318
RAC: 865,813
Level
Asn
Scientific publications
wat
Message 50040 - Posted: 25 Jul 2018 | 8:54:03 UTC - in response to Message 50038.

Eh, I would not go as far as to say it's mostly a tool for PhD students as BettingSlip mentioned (although I'm sure he didn't mean it in a negative way). The theoretical research being published is used to progress science and the specific field, it's not like this work ends up as fluff for a PhD thesis.

Simulations we have done have provided interesting insights in disordered proteins, protein-protein associations and more which can be used by the industry as BettingSlip mentioned.

Currently we are also trying to get more into drug design in our lab so you might see interesting more direct applications in the next few years.

But yes, the connection between theoretical and direct application is often hard to see or appreciate, which is what leads to lack of funding for basic research which is arguably inefficient in the long term.

Good clarification - then it's just the way I already assumed it to be.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50045 - Posted: 25 Jul 2018 | 11:00:53 UTC

In the realm of non-profit research, it is entirely collaborative. Even if what someone is researching doesn't seem like it would make a difference, what they discovered could be the holy grail for another researcher team. You see this time and time again throughout our scientific history.

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50047 - Posted: 25 Jul 2018 | 12:01:11 UTC - in response to Message 50045.

Question related to the thread-title:

any idea when an app for Windows will be available?
Further, will there be a version for XP as well?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50048 - Posted: 25 Jul 2018 | 12:42:25 UTC - in response to Message 50047.

Question related to the thread-title:

any idea when an app for Windows will be available?
Further, will there be a version for XP as well?


I know you're not directing your question at me but as far as XP is concerned read this post https://gpugrid.net/forum_thread.php?id=4552&nowrap=true#46982
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

DRSMT
Send message
Joined: 23 Feb 17
Posts: 15
Credit: 287,647,318
RAC: 865,813
Level
Asn
Scientific publications
wat
Message 50049 - Posted: 25 Jul 2018 | 12:47:08 UTC - in response to Message 50045.

In the realm of non-profit research, it is entirely collaborative. Even if what someone is researching doesn't seem like it would make a difference, what they discovered could be the holy grail for another researcher team. You see this time and time again throughout our scientific history.

I am electrical engineer and software developer myself, so I'm aware of how development processes take place in general. Also I know, that our work is / can be helpful, otherwise I would not be here, of course - but this did not answer my question / is nothing new to me. But then finally Stefan answered my question perfectly. I would like to see direct drug design in the future :)

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50050 - Posted: 25 Jul 2018 | 13:23:06 UTC - in response to Message 50049.

I would like to see direct drug design in the future :)


As exciting as that prospect might sound to all contributors to public distributed computing projects such as this one it will never happen.
The reasons for this are many and you may like to read this
https://sciencenode.org/feature/isgtw-opinion-volunteer-computing-grid-or-not-grid.php

There is always the inconvenient fact that few (if not all) scientists have little confidence in public distributed computing.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50051 - Posted: 25 Jul 2018 | 13:25:55 UTC - in response to Message 50048.

Question related to the thread-title:

any idea when an app for Windows will be available?
Further, will there be a version for XP as well?


I know you're not directing your question at me but as far as XP is concerned read this post https://gpugrid.net/forum_thread.php?id=4552&nowrap=true#46982

I know this thread and it's content. Still I don't stop hoping that they may have changed their mind and provide once more an app for XP.
So, we'll see ...

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50052 - Posted: 25 Jul 2018 | 13:34:29 UTC - in response to Message 50051.

Question related to the thread-title:

any idea when an app for Windows will be available?
Further, will there be a version for XP as well?


I know you're not directing your question at me but as far as XP is concerned read this post https://gpugrid.net/forum_thread.php?id=4552&nowrap=true#46982

I know this thread and it's content. Still I don't stop hoping that they may have changed their mind and provide once more an app for XP.
So, we'll see ...


The only reply I can give to someone who asks a question that they already have read the official answer to is "dream on"
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Jim1348
Send message
Joined: 28 Jul 12
Posts: 632
Credit: 1,201,962,935
RAC: 144,313
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 50053 - Posted: 25 Jul 2018 | 14:32:16 UTC - in response to Message 50050.

There is always the inconvenient fact that few (if not all) scientists have little confidence in public distributed computing.

But confidence in what? It may not be used for developing commercial drugs, but for basic science the results could be quite useful.

That really depends on how relevant the questions are that the researchers are asking. The real limitation is that in the academic world, they may not know what issues to investigate that are most relevant. A tie-in between the university and industry (as in an advisory board) might help that.

kain
Send message
Joined: 3 Sep 14
Posts: 139
Credit: 229,750,765
RAC: 404,889
Level
Leu
Scientific publications
watwatwatwatwat
Message 50056 - Posted: 25 Jul 2018 | 18:02:39 UTC

Well guys... It doesn't work that way. Science as a "whole thing" is so complex and unpredictable that we can't even assume what is going to be important and what is not.
One hundred years ago Maria Sklodowska Curie found out that some strange piece of scrap is destroying photographic film. Who cares, right? There was war, people were hungry and have others "big and now" problems. But she studied it and now we have radiotherapy, MRI, gamma knife and many others things. Who could ever know?
Science is a team sport and we are part of the team :)

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50057 - Posted: 25 Jul 2018 | 18:32:31 UTC - in response to Message 50056.

Well guys... It doesn't work that way. Science as a "whole thing" is so complex and unpredictable that we can't even assume what is going to be important and what is not.
One hundred years ago Maria Sklodowska Curie found out that some strange piece of scrap is destroying photographic film. Who cares, right? There was war, people were hungry and have others "big and now" problems. But she studied it and now we have radiotherapy, MRI, gamma knife and many others things. Who could ever know?
Science is a team sport and we are part of the team :)

+1

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50058 - Posted: 25 Jul 2018 | 18:43:02 UTC

We also had nuclear bombs, nuclear fission reactors and, hopefully, fusion reactors. Science can make both good and bad fruits.
Tullio
____________

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 271
Credit: 797,167,484
RAC: 1,319,186
Level
Glu
Scientific publications
watwat
Message 50059 - Posted: 25 Jul 2018 | 18:59:20 UTC - in response to Message 50058.
Last modified: 25 Jul 2018 | 19:05:05 UTC

We also had nuclear bombs, nuclear fission reactors and, hopefully, fusion reactors. Science can make both good and bad fruits.
Tullio


Yea, but relax ... we are the good guys. Claims to the contrary are FAKE NEWS! ;)
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50060 - Posted: 25 Jul 2018 | 19:05:51 UTC - in response to Message 50059.
Last modified: 25 Jul 2018 | 19:06:52 UTC

Are we? I read the Bulletin of the Atomic Scientists. USA have 4850 nuclear warheads, 160 of which are stored in Italy, 20 at an air base 200 km from my home. Russia has the same amount, and I don't mention UK,France,China, India,Pakistan and Israel.
Tullio

Jim1348
Send message
Joined: 28 Jul 12
Posts: 632
Credit: 1,201,962,935
RAC: 144,313
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 50061 - Posted: 25 Jul 2018 | 19:08:41 UTC - in response to Message 50060.

Are we? I read the Bulletin of the Atomic Scientists. USA have 4850 nuclear warheads, 160 of which are stored in Italy, 20 at an air base 200 km from my home. Russia has the same amount, and I don't mention UK,France,China, India,Pakistan and Israel.
Tullio

We can withdraw the U.S. ones from Europe. Then the Russians will be free to move in theirs.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50062 - Posted: 25 Jul 2018 | 19:20:03 UTC - in response to Message 50061.

I don't think UK and France would allow it. They have nuclear warheads too.
Tullio

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50063 - Posted: 25 Jul 2018 | 19:33:50 UTC

Let's keep the explosions on-topic -- Let's talk about exploding GPUGrid apps and tasks :)

w1hue
Send message
Joined: 28 Sep 09
Posts: 7
Credit: 46,872,950
RAC: 27,957
Level
Val
Scientific publications
watwat
Message 50064 - Posted: 25 Jul 2018 | 19:53:39 UTC - in response to Message 50063.

Let's keep the explosions on-topic -- Let's talk about exploding GPUGrid apps and tasks :)

Why not keep it totally on topic -- shutup and wait for an announcement that the damn thing is fixed!!

MatthiasLeimbach
Send message
Joined: 18 Mar 09
Posts: 2
Credit: 400,578,404
RAC: 294,247
Level
Gln
Scientific publications
watwat
Message 50065 - Posted: 25 Jul 2018 | 19:57:59 UTC

hi Gianni

doe you know how it is possible to brake an application, there are back ups

i think it may be conveniant to push towards the linux application/ broken windows gpu application ? > for me it doesn't feel like scientific truth and want some more answer of what happened please

if you want me to switch to Linux please say it

regards







Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50066 - Posted: 25 Jul 2018 | 20:55:01 UTC - in response to Message 50064.

Let's keep the explosions on-topic -- Let's talk about exploding GPUGrid apps and tasks :)

Why not keep it totally on topic -- shutup and wait for an announcement that the damn thing is fixed!!


That was a bit unnecessarily rude. I also hope that they can fix it.

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50068 - Posted: 26 Jul 2018 | 16:50:18 UTC

@ GDF, Toni, Stefan ...

any vague idea when the Windows app will be available?

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50069 - Posted: 26 Jul 2018 | 17:40:12 UTC - in response to Message 50068.
Last modified: 26 Jul 2018 | 17:40:45 UTC

We are working round the clock to restore it... sorry for the delay. It's not easy. By the way the cuda65 app should be ok, although there are no WUs.

Profile [AF] fansyl
Send message
Joined: 26 Sep 13
Posts: 8
Credit: 759,769,022
RAC: 93,373
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 50070 - Posted: 26 Jul 2018 | 18:02:44 UTC

We know you're doing your best, be strong and good luck!

Don't listen to the haters.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 345
Credit: 4,211,713,159
RAC: 1,887,545
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50072 - Posted: 26 Jul 2018 | 22:15:44 UTC - in response to Message 50069.

We are working round the clock to restore it... sorry for the delay. It's not easy. By the way the cuda65 app should be ok, although there are no WUs.



It works on windows xp.

http://www.gpugrid.net/result.php?resultid=18260692


Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50073 - Posted: 27 Jul 2018 | 5:07:18 UTC - in response to Message 50072.

It should work as before. I sent some test WUs. Work to come.

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50074 - Posted: 27 Jul 2018 | 8:02:50 UTC

Toni, it seems to work well - both for Windows 10 and Windows XP :-)))

Many thanks for the efforts put in by everybody at GPUGRID !

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50076 - Posted: 27 Jul 2018 | 8:06:57 UTC - in response to Message 50074.

...and this thread is back on topic ;)

valterc
Send message
Joined: 21 Jun 10
Posts: 20
Credit: 2,029,743,813
RAC: 2,943,419
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50078 - Posted: 27 Jul 2018 | 9:12:09 UTC - in response to Message 50076.

Got one on a Windows box:

7/27/2018 6:35:48 AM | GPUGRID | Aborting task e37s19_e36s5p0f20-ADRIA_FOLDT1019_v2_predicted_pred_ss_contacts_50_T1019s1_4-0-1-RND2194_2: exceeded elapsed time limit 7288.00 (250000000.00G/34302.98G)

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 402
Credit: 2,793,208,105
RAC: 2,578,005
Level
Phe
Scientific publications
watwat
Message 50080 - Posted: 27 Jul 2018 | 10:42:10 UTC

What did you guys have to do to fix the application?

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50081 - Posted: 27 Jul 2018 | 11:05:31 UTC - in response to Message 50074.

Toni, it seems to work well - both for Windows 10 and Windows XP :-)))
Many thanks for the efforts put in by everybody at GPUGRID !

Iam afraid I was to early with my above statement :-(

The task on the Windows 10 machine broke off after 8.963 seconds with:

197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED
exceeded elapsed time limit 8947.10 (250000000.00G/27942.01G)


for more details, see here: http://gpugrid.net/result.php?resultid=18262430

Hona
Send message
Joined: 21 Sep 10
Posts: 2
Credit: 463,987,489
RAC: 348,178
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50082 - Posted: 27 Jul 2018 | 12:35:19 UTC

The same here.

http://www.gpugrid.net/result.php?resultid=18261977


Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 725,547,168
RAC: 1,278,495
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50084 - Posted: 27 Jul 2018 | 12:51:15 UTC
Last modified: 27 Jul 2018 | 12:54:54 UTC

The same here http://www.gpugrid.net/result.php?resultid=18262508

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 9498.88 (250000000.00G/26318.88G)
</message>
<stderr_txt>

other task http://www.gpugrid.net/result.php?resultid=18262784

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50086 - Posted: 27 Jul 2018 | 12:56:27 UTC - in response to Message 50085.
Last modified: 27 Jul 2018 | 13:09:49 UTC

I think something (either failures, or likely recent short tasks) made some machine over-optimistic about its own fp-ops. As a consequence, BOINC estimated that tasks could be run in a few hours, which is untrue.

Try re-running the benchmarks. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4273

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 666
Credit: 2,498,095,550
RAC: 219
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50087 - Posted: 27 Jul 2018 | 15:01:51 UTC - in response to Message 50086.

I think something (either failures, or likely recent short tasks) made some machine over-optimistic about its own fp-ops. As a consequence, BOINC estimated that tasks could be run in a few hours, which is untrue.

Try re-running the benchmarks. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4273


Can't be true, my machine have only run GpuGrid long WU's and have the same problem.

Anyway GpuGrid was my last BOINC project and I have decided to hang up my BOINC boots.
The satisfaction of contributing has just left me.

Good Luck to all.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50088 - Posted: 27 Jul 2018 | 15:04:54 UTC - in response to Message 50086.
Last modified: 27 Jul 2018 | 15:25:28 UTC

Try re-running the benchmarks. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4273

I guess I gathered what you mean, but this machine has not run any BOINC tasks in the meantime. So there should not be any (too short) runtime values somewhere deep in BOINC.

Anyway, followed this advise (shown in your link) with a newly downloaded GPUGRID task:
You can help yourself out of this situation by increasing <rsc_fpops_bound> of Sixtrack tasks 1000 times larger or possible even more
However, I increased the value by the factor 10, guess this should be sufficient.

So, I'll see what happens

kksplace
Send message
Joined: 4 Mar 18
Posts: 12
Credit: 5,192,825
RAC: 0
Level
Ser
Scientific publications
wat
Message 50089 - Posted: 27 Jul 2018 | 15:05:33 UTC

Yeah! Two WUs downloaded. I love the sound of my GPU fans spinning up.

[VENETO] boboviz
Send message
Joined: 10 Sep 10
Posts: 105
Credit: 253,107
RAC: 44
Level

Scientific publications
wat
Message 50090 - Posted: 27 Jul 2018 | 15:20:38 UTC - in response to Message 50087.

Anyway GpuGrid was my last BOINC project and I have decided to hang up my BOINC boots.The satisfaction of contributing has just left me.


It's a pity...

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50091 - Posted: 27 Jul 2018 | 15:30:17 UTC - in response to Message 50088.

Have you also tried selecting the "re-run benchmarks" (or something) menu option?

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50092 - Posted: 27 Jul 2018 | 15:36:20 UTC - in response to Message 50088.
Last modified: 27 Jul 2018 | 15:39:04 UTC

[
I guess I gathered what you mean, but this machine has not run any BOINC tasks in the meantime. So there should not be any (too short) runtime values somewhere deep in BOINC.



You are right. Don't know what so say except that it's frustrating on this side too. There is an excess of hidden state and undocumented checks. My hope is that it will resolve by itself at some point (maybe resetting the project).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50094 - Posted: 27 Jul 2018 | 16:01:31 UTC - in response to Message 50092.

Do you require help from some of the BOINC devs? They're pretty responsive on the BOINC Projects email group, and if there is some sort of transparency problem, they'd want to hear about it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50095 - Posted: 27 Jul 2018 | 17:09:09 UTC - in response to Message 50094.

Do you require help from some of the BOINC devs? They're pretty responsive on the BOINC Projects email group, and if there is some sort of transparency problem, they'd want to hear about it.

I've been sitting in the same conference room as about 25 BOINC developers for the last three days. If someone had called, we could have answered...

But today was the group walk in the Oxfordshire countryside, and we meet in an hour for our final group meal before they get their 5 am flights home. I have a simple 200 mile drive home before I'm reunited with my GPUs - I'll look at it Sunday, report Monday,

Running benchmarks won't solve it. because they measure the CPU speed, and this is a GPU app. But you're on the right lines - the initial speed estimate will be low, and the quickest workround will be to increase the

<rsc_fpops_bound>

for the new v9.22 app by a factor of at least 10 and perhaps 100. You may have to generate new workunits with the uprated bound. Runtime estimates will almost certainly appear to users as vastly inflated in the initial stages, but hang in there - they will become 'accurate' (-ish) after the first 11 completed tasks.

More when I can eyeball it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50098 - Posted: 27 Jul 2018 | 17:58:46 UTC

Thinking about it in the shower, that's the wrong way round - apps faster than expected shouldn't cause a problem.

Jacob, while I'm drinking/eating/drinking/sleeping/travelling/sleeping, can you pull the guts out of the <app_version> for 9.22 and a matching WU&task - just the BOINC metadata, not the file references - and post them for me to look at before I get home. Even better if you could subsequently run it and point me to the outcome online. I'm wondering if the project might have slipped half-a-dozen orders of magnitude in <rsc_fpops_est>.

Now I've got a bus to catch.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50099 - Posted: 27 Jul 2018 | 18:13:21 UTC - in response to Message 50098.

No, I just started a 4 day vacation, sorry.

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50100 - Posted: 27 Jul 2018 | 18:14:58 UTC

The interesting thing is that this problem does NOT come up in the cuda65 app (for Windows XP), but only in the cuda80 app (for Windows10).

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50101 - Posted: 27 Jul 2018 | 19:05:01 UTC

I've downloaded 2 Long run tasks on my Windows 10 PC and one is running. It seems to run OK but,according to the Task manager, it seems to use both the CPU and GPU (GTX 1050 Ti), very scarcely compared to SETI@home GPU tasks on the same host.
Tullio

Erich56
Send message
Joined: 1 Jan 15
Posts: 476
Credit: 2,380,601,877
RAC: 1,967,968
Level
Phe
Scientific publications
watwatwatwat
Message 50102 - Posted: 27 Jul 2018 | 19:11:54 UTC

What I noticed so far is that with the Cuda_80 app the GPU load now is about 75%, whereas for the same type of task, crunched on WinXP with the Cuda_65 app, the GPU load is between 96% and 98% (like is was with the former Cuda_80 app, too).

This somehow seems interesting, if not to say strange.

Any explanation for this?

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 79
Credit: 2,243,428,463
RAC: 7,102,174
Level
Phe
Scientific publications
watwatwat
Message 50103 - Posted: 27 Jul 2018 | 19:16:22 UTC

Just had 2 cuda 8 GPU work unit error out due to the time exceeded on my 1080TIs on Win 7.

Reset the project and will see if that fixes the problem.
____________

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 271
Credit: 797,167,484
RAC: 1,319,186
Level
Glu
Scientific publications
watwat
Message 50104 - Posted: 27 Jul 2018 | 20:25:09 UTC
Last modified: 27 Jul 2018 | 20:27:04 UTC

happens to me also

https://www.gpugrid.net/result.php?resultid=18277793

I have already reset the project and re-run the benchmarks, it apparently didnt change anything. Shall we power down our machines again?
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50105 - Posted: 27 Jul 2018 | 22:27:22 UTC - in response to Message 50098.
Last modified: 27 Jul 2018 | 22:51:17 UTC

Thinking about it in the shower, that's the wrong way round - apps faster than expected shouldn't cause a problem.

Jacob, while I'm drinking/eating/drinking/sleeping/travelling/sleeping, can you pull the guts out of the <app_version> for 9.22 and a matching WU&task - just the BOINC metadata, not the file references - and post them for me to look at before I get home. Even better if you could subsequently run it and point me to the outcome online. I'm wondering if the project might have slipped half-a-dozen orders of magnitude in <rsc_fpops_est>.

Now I've got a bus to catch.

Do you need this?:
<app_version> <app_name>acemdlong</app_name> <version_num>922</version_num> <platform>windows_intelx86</platform> <avg_ncpus>1.000000</avg_ncpus> <flops>43004890022276.586000</flops> <plan_class>cuda80</plan_class> <api_version>6.7.0</api_version> ... <coproc> <type>NVIDIA</type> <count>1.000000</count> </coproc> <gpu_ram>512.000000</gpu_ram> <dont_throttle/> </app_version> <workunit> <name>e38s4_e29s9p0f212-ADRIA_FOLDT1015_v2_predicted_pred_ss_contacts_50_T1015s1_3-0-1-RND4166</name> <app_name>acemdlong</app_name> <version_num>922</version_num> <rsc_fpops_est>5000000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>250000000000000000000.000000</rsc_fpops_bound> <rsc_memory_bound>300000000.000000</rsc_memory_bound> <rsc_disk_bound>4000000000.000000</rsc_disk_bound> ... </workunit>

I've got lost in that many zeroes, so I've cut 12 of them:
App flops: 43e12
rsc_fpops_est: 5 000e12
rsc_fpops_bound: 250 000 000e12

The outcome will be here.
I think it will succeed.
Judging by the previous error message:

exceeded elapsed time limit 5659.12 (250000000.00G/43782.46G)

the rsc_fpops_bound was only 250 000e12 before.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50106 - Posted: 27 Jul 2018 | 23:03:58 UTC - in response to Message 50105.

That's good start - thanks.

<flops>43004890022276.586000</flops>

is 43,004,890,022,276 or 43 teraflops

Your host 113852 had an APR (processing speed) of 519 GigaFlops under v9.18

At the new speed, the tasks would run for (flops/(flops/sec)) seconds

rsc_fpops_est/flops

5,000,000,000,000,000 / 43,004,890,022,276

116 seconds (initial runtime estimate)

and be deemed to have 'run too long' after

rsc_fpops_bound/flops

250,000,000,000,000,000,000 / 43,004,890,022,276

5,813,292 seconds

The errors on that machine earlier today were after ~5,670 seconds - maybe they took my advice while I was out, and upped it by three orders of magnitude?

OK, that's as far as I can go here. I need to watch it on one of my own machines. But the problem seems to be that absurd 43 teraflop speed rating.

The only cause of that I can think of might be if they put through some shortened test units WITHOUT CHANGING <rsc_fpops_est>. Anybody see anything like that?

(which machine did that data come from, please?)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50107 - Posted: 27 Jul 2018 | 23:07:30 UTC - in response to Message 50106.

The errors on that machine earlier today were after ~5,670 seconds - maybe they took my advice while I was out, and upped it by three orders of magnitude?
That is my conclusion too (see the end of my previous post).

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 409
Credit: 292,982,246
RAC: 516,856
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50108 - Posted: 27 Jul 2018 | 23:36:54 UTC - in response to Message 50107.
Last modified: 27 Jul 2018 | 23:40:31 UTC

The errors on that machine earlier today were after ~5,670 seconds - maybe they took my advice while I was out, and upped it by three orders of magnitude?
That is my conclusion too (see the end of my previous post).

How much longer will they need to let tasks run before they get enough information to fix the problem?

It looks like one more order of magnitude for run time should at least give them more information.

Also, users might help by mention whether their tasks were able to write a checkpoint, and then continue after this.

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 48
Credit: 395,505,350
RAC: 190,342
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 50109 - Posted: 28 Jul 2018 | 0:05:17 UTC

No help here;
http://www.gpugrid.net/result.php?resultid=18294177

Stderr output
<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 3487.29 (250000000.00G/71688.95G)</message>
<stderr_txt>
# GPU [GeForce GTX 1080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 6.1
# PCI ID : 0000:01:00.0
# Device clock : 1683MHz
# Memory clock : 5505MHz
# Memory width : 352bit
# Driver version : r391_33 : 39135
# GPU 0 : 28C
# GPU 0 : 29C
# GPU 0 : 30C
# GPU 0 : 31C
# GPU 0 : 32C
# GPU 0 : 33C
# GPU 0 : 34C
# GPU 0 : 35C
# GPU 0 : 36C
# Access violation : progress made, try to restart
called boinc_finish

</stderr_txt>
]]>
____________

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.

Michael
Send message
Joined: 30 Nov 10
Posts: 4
Credit: 221,935,557
RAC: 890,332
Level
Leu
Scientific publications
watwatwatwatwatwatwat
Message 50110 - Posted: 28 Jul 2018 | 0:17:33 UTC - in response to Message 50106.

It's currently running on my Windows 10 w/ 1080ti. 86.4 % complete in 14:31 (m:s). It's an ADRIA job. So the jobs are running much faster than they did before. I leave that up to your interpretation. That job took 16:30 to complete about the same as the job that ran before it. Now starting on job 3. This one's a PABLO and took 2:01 to reach 1%. 2% done and estimate is 2:07:20 (and falling) to completion.

Specifically now running e22s56_e25s16p0f231-PABLO_2IDP_P01106_1_GLUP33P_IDP-0-1-RND9574_1

That's good start - thanks.

<flops>43004890022276.586000</flops>

is 43,004,890,022,276 or 43 teraflops

Your host 113852 had an APR (processing speed) of 519 GigaFlops under v9.18

At the new speed, the tasks would run for (flops/(flops/sec)) seconds

rsc_fpops_est/flops

5,000,000,000,000,000 / 43,004,890,022,276

116 seconds (initial runtime estimate)

and be deemed to have 'run too long' after

rsc_fpops_bound/flops

250,000,000,000,000,000,000 / 43,004,890,022,276

5,813,292 seconds

The errors on that machine earlier today were after ~5,670 seconds - maybe they took my advice while I was out, and upped it by three orders of magnitude?

OK, that's as far as I can go here. I need to watch it on one of my own machines. But the problem seems to be that absurd 43 teraflop speed rating.

The only cause of that I can think of might be if they put through some shortened test units WITHOUT CHANGING <rsc_fpops_est>. Anybody see anything like that?

(which machine did that data come from, please?)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50111 - Posted: 28 Jul 2018 | 0:20:43 UTC - in response to Message 50109.
Last modified: 28 Jul 2018 | 0:25:45 UTC

No help here;
http://www.gpugrid.net/result.php?resultid=18294177
The workunits generated with the improper <rsc_fpops_bound> will be around until they error out.
I have two such workunits on my host, so I've manually edited the client_state.xml file to have the right <rsc_fpops_bound> value.

the method of this fix:
1. exit BOINC manager
2. windows key + r
3. type or copy and paste:
notepad c:\ProgramData\BOINC\client_state.xml
4. press <ENTER>
5. CTRL + H
6. search field:
<rsc_fpops_bound>250000000000000000.000000</rsc_fpops_bound>
7. replace field:
<rsc_fpops_bound>250000000000000000000.000000</rsc_fpops_bound>
8. it should replace as many times as the number of GPUGrid tasks on the given host
9. save and exit notepad
10. restart BOINC manager

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 48
Credit: 395,505,350
RAC: 190,342
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 50112 - Posted: 28 Jul 2018 | 1:01:33 UTC
Last modified: 28 Jul 2018 | 1:24:00 UTC

Thanks
<workunit>
<name>e17s86_e4s46p0f53-PABLO_2IDP_P01106_4_LEUP23P_IDP-0-1-RND2735</name>
<app_name>acemdlong</app_name>
<version_num>922</version_num>
<rsc_fpops_est>5000000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>250000000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>300000000.000000</rsc_memory_bound>
<rsc_disk_bound>4000000000.000000</rsc_disk_bound>
<file_ref>

Testing now with
250000000000000000000.000000

Do we have to do this from now on, on each GPU Task?
____________

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50113 - Posted: 28 Jul 2018 | 1:27:42 UTC
Last modified: 28 Jul 2018 | 1:28:56 UTC

No go
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -80 (0xffffffb0)</message>
<stderr_txt>
# GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# ECC : Disabled
# Global mem : 4096MB
# Capability : 6.1
# PCI ID : 0000:01:00.0
# Device clock : 1392MHz
# Memory clock : 3504MHz
# Memory width : 128bit
# Driver version : r397_05 : 39764
# GPU 0 : 64C
# GPU 0 : 67C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 74C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 1755000)
called boinc_finish

</stderr_txt>
]]>
____________

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 48
Credit: 395,505,350
RAC: 190,342
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 50114 - Posted: 28 Jul 2018 | 1:40:04 UTC - in response to Message 50113.
Last modified: 28 Jul 2018 | 1:43:42 UTC

No go
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -80 (0xffffffb0)</message>
<stderr_txt>
# GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 1755000)
called boinc_finish

</stderr_txt>
]]>


exit code -80 is a Driver Issue (OpenCL Missing) as can also be C++ Runtimes issue maybe even missing.
You need both the x86 (32Bit) and the x64 Bit versions.
As well as unstable GPU and or CPU.

This is not the same issue as "Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED"
____________

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50115 - Posted: 28 Jul 2018 | 2:12:13 UTC - in response to Message 50114.

SETI@home tasks complete using opencl_nvidia_SoG. Temperature using Thundermaster is 66 C.
Tullio

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50116 - Posted: 28 Jul 2018 | 2:17:11 UTC

Please start a new thread for the Simulation Unstable issue, if you must. It typically means your GPU is overclocked too much, and this project pushes it harder than other projects. If you want help determining a max stable overclock, PM me and be patient.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50117 - Posted: 28 Jul 2018 | 2:28:40 UTC - in response to Message 50116.

MY GPU is not overclocked. I never overclock.
Tullio

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50119 - Posted: 28 Jul 2018 | 6:43:28 UTC

How much longer will they need to let tasks run before they get enough information to fix the problem?

Do we have to do this from now on, on each GPU Task?

From what I saw yesterday, somehow the system got itself into a state where it thought our machines were much faster than they really are.

'machine speed' comes from one of two places: either the aggregate returns across the whole project, or the actual behaviour of each individual computer.

The speed of the individual computer takes over in the end - after 11 tasks have made it all the way through and been validated. So "11 times per computer" should be the maximum number of manual interventions required.

But since they seem to have put in a workround for the faulty kill-switch, you may not have to do it that many times, or even at all.

Because work is now being completed properly, the system-wide speed assessment will be correcting itself at the same time, so that machines which have been inactive while waiting for the new app may never even see the problem. But it's hard to predict when that will kick in: I may find out when I get home.

As Retvari has pointed out, there will be faulty workunits circulating around the system for a while yet, and they are a problem because they waste resources for a significant length of time. Those are the ones it is most helpful to patch via the file edit: once they have been completed and validated, they won't come back to haunt us again.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50123 - Posted: 28 Jul 2018 | 7:57:39 UTC - in response to Message 50119.

To summarize: the problem AFAIK were the test WUs, sent without changing the ops estimate. I now cancelled them all, and temporarily raised the OPS bound by 10^3.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50124 - Posted: 28 Jul 2018 | 8:19:33 UTC - in response to Message 50123.

To summarize: the problem AFAIK were the test WUs, sent without changing the ops estimate. I now cancelled them all, and temporarily raised the OPS bound by 10^3.

That sounds good. I agree with you about the cause, and the workround will let the system clean itself out with no further intervention.

Just one final task: buy a 2019 calendar, and put a big red circle round the next licence expiry date! (or perhaps a month before...)

I think you once said that the rsc_fpops_est was fixed by the workunit generation script: it might be a good idea to start thinking about making it easier to vary that. But not this weekend - take some time off!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50125 - Posted: 28 Jul 2018 | 12:04:16 UTC - in response to Message 50123.
Last modified: 28 Jul 2018 | 12:16:59 UTC

To summarize: the problem AFAIK were the test WUs, sent without changing the ops estimate. I now cancelled them all, and temporarily raised the OPS bound by 10^3.
I still received a task which has the lower rsc_fpops_bound value. So we should watch these workunits carefully (and fix those which have the lower rsc_fpops_bound) until they've cleared out from the scheduler.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 746
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 50126 - Posted: 28 Jul 2018 | 13:15:15 UTC - in response to Message 50125.
Last modified: 28 Jul 2018 | 13:16:14 UTC

Unfortunately rsc ops values can't be changed once the task is created. I'm waiting that the newly created tasks make the flops estimate return to normal, and then the old tasks should work as well.

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 48
Credit: 395,505,350
RAC: 190,342
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 50128 - Posted: 28 Jul 2018 | 15:46:44 UTC
Last modified: 28 Jul 2018 | 15:47:21 UTC

Thanks Retvari Zoltan for your fix as for the most part worked for me.
Could the cause of this (197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED) been the new version of the Client Software 7.12.1?
I noted that after the update the Client Software ran benchmarks off the bat and I do not recall older versions doing this.
____________

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.

jjch
Send message
Joined: 10 Nov 13
Posts: 20
Credit: 7,921,851,264
RAC: 10,456,941
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 50129 - Posted: 28 Jul 2018 | 16:48:26 UTC

I have noted that work units have been coming through beginning on the 27th however they all seem to be failing. Refer to a sample system here: https://www.gpugrid.net/results.php?hostid=176801

It was not clear to me if all the Windows GPU systems need intervention or if it will sort it out eventually. These are all at a remote location and I don't have remote access anymore. It will be a few days until I can get to them.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50130 - Posted: 28 Jul 2018 | 17:20:58 UTC - in response to Message 50128.

Could the cause of this (197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED) been the new version of the Client Software 7.12.1?

That's a firm NO.

I am currently closely involved in the preparation, testing, and releasing of new client versions. The new client was released well before this problem arose, and (in this respect) the new client works exactly the same as previous ones, going back several releases.

We've now got a pretty clear handle on the release of GPUGrid application 9.22 as the culprit, though I will still test my own machines as I start each of them back up (which will happen after the next transfusion of coffee - only just got back home).

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50131 - Posted: 28 Jul 2018 | 18:35:40 UTC

OK, I've started the first.

Got (project) flops of 461,290,595,930 - 461 gigaflops. That's still too high (this machine had 243 GF for the previous version), but it's in the right ballpark and I'll let it run.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50136 - Posted: 29 Jul 2018 | 9:56:37 UTC

All my machines have now completed tasks without error and without manual intervention. I think we're out of the woods.

Host 477287 is useful. I left that running throughout: you can see that the 'time before error' slowly increased from 17Ksec to 23Ksec as the speed estimate normalised. The task which completed successfully would have crashed after about 90Ksec, but that was more than enough.

I have a slight concern about the short queue task it's working on now, which is running at a very erratic speed. But that could be the app, the task, or the machine. I'll keep an eye on it.

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 48
Credit: 395,505,350
RAC: 190,342
Level
Asp
Scientific publications
watwatwatwatwatwatwatwat
Message 50209 - Posted: 5 Aug 2018 | 3:10:21 UTC

Seems that we need to remove the settings now or Reset the Project.
Tasks are saying 9 Days to complete even though they take less than 3 hours.
This is no longer needed;
<workunit>
<name>e17s86_e4s46p0f53-PABLO_2IDP_P01106_4_LEUP23P_IDP-0-1-RND2735</name>
<app_name>acemdlong</app_name>
<version_num>922</version_num>
<rsc_fpops_est>5000000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>250000000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>300000000.000000</rsc_memory_bound>
<rsc_disk_bound>4000000000.000000</rsc_disk_bound>
<file_ref>
____________

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.

[AF>FAH-Addict.net]toTOW
Send message
Joined: 28 Oct 10
Posts: 8
Credit: 23,948,749
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 50211 - Posted: 5 Aug 2018 | 8:42:45 UTC

All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

No more details in log :( :
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
too many exit(0)s</message>
]]>


Any ideas ?

[AF>FAH-Addict.net]toTOW
Send message
Joined: 28 Oct 10
Posts: 8
Credit: 23,948,749
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 50214 - Posted: 6 Aug 2018 | 12:36:23 UTC

If I try to run directly from a slot, I get this :

D:\BOINC\data\slots\9>acemd-922-80.exe
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Cannot create context 0 on GPU 0 : [999]
# Could not create GPU contexts.
# SWAN swan_assert 0


Any ideas ?

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50215 - Posted: 6 Aug 2018 | 12:59:19 UTC

I am glad not to be the only one. This is what I was getting.
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -80 (0xffffffb0)</message>
<stderr_txt>
# GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# ECC : Disabled
# Global mem : 4096MB
# Capability : 6.1
# PCI ID : 0000:01:00.0
# Device clock : 1392MHz
# Memory clock : 3504MHz
# Memory width : 128bit
# Driver version : r397_05 : 39764
# GPU 0 : 64C
# GPU 0 : 67C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 74C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 1755000)
called boinc_finish

</stderr_txt>
]]>

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50216 - Posted: 6 Aug 2018 | 15:48:07 UTC - in response to Message 50215.

# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 1755000)

called boinc_finish
This message is usually the sign of too high GPU clocks and / or too high GPU temperature (Yes, 78°C could be high).
You should use some 3rd party GPU monitoring software (like MSI Afterburner) to:
1. increase the GPU fan speed,
2. reduce the power target of your GPU
3. reduce GPU clock frequency.
This error message has nothing to do with the new Windows app.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50217 - Posted: 6 Aug 2018 | 15:51:26 UTC - in response to Message 50211.
Last modified: 6 Aug 2018 | 15:51:45 UTC

All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

No more details in log :( :
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
too many exit(0)s</message>
]]>


Any ideas ?

Have you installed BOINC manager in "protected application execution" mode? (as a system service?)
If you did so, you should uninstall it, and reinstall without this setting.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50218 - Posted: 6 Aug 2018 | 16:09:24 UTC - in response to Message 50216.

.
You should use some 3rd party GPU monitoring software (like MSI Afterburner) to:
1. increase the GPU fan speed,
2. reduce the power target of your GPU
3. reduce GPU clock frequency.
This error message has nothing to do with the new Windows app.

The same app on my SUSE Linux box with a GTX 750 Ti board runs at 62 C.
Tullio

[AF>FAH-Addict.net]toTOW
Send message
Joined: 28 Oct 10
Posts: 8
Credit: 23,948,749
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 50219 - Posted: 7 Aug 2018 | 9:11:52 UTC - in response to Message 50217.

All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

No more details in log :( :
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
too many exit(0)s</message>
]]>


Any ideas ?

Have you installed BOINC manager in "protected application execution" mode? (as a system service?)
If you did so, you should uninstall it, and reinstall without this setting.

No ... other projects are working fine.

See my post after the one you quoted with the real error ...

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50221 - Posted: 7 Aug 2018 | 13:58:12 UTC - in response to Message 50216.


1. increase the GPU fan speed,
2. reduce the power target of your GPU
3. reduce GPU clock frequency.
This error message has nothing to do with the new Windows app.

The same GPU board runs SETI@home GPU tasks at 71 C, fan speed 50%, clock 1695 MHz and no error.
Tullio

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50222 - Posted: 7 Aug 2018 | 14:56:21 UTC - in response to Message 50221.
Last modified: 7 Aug 2018 | 14:56:50 UTC

The same GPU board runs SETI@home GPU tasks at 71 C, fan speed 50%, clock 1695 MHz and no error.
That's irrelevant. The GPUGrid app is much harder on GPUs than other apps, partly because it's based on CUDA8.0, while the other apps based on earlier CUDA versions.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 110
Credit: 86,293,163
RAC: 618,780
Level
Thr
Scientific publications
wat
Message 50223 - Posted: 7 Aug 2018 | 20:08:52 UTC

Not necessarily true. The Seti Linux CUDA9 app runs gpus a lot harder than the stock OpenCL application. I don't see more than 62° C. on my air cooled cards.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50224 - Posted: 8 Aug 2018 | 2:32:05 UTC - in response to Message 50222.

The SETI@home GPU tasks run on opencl_nvidia_SoG

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 79
Credit: 2,243,428,463
RAC: 7,102,174
Level
Phe
Scientific publications
watwatwat
Message 50225 - Posted: 8 Aug 2018 | 2:40:50 UTC - in response to Message 50224.
Last modified: 8 Aug 2018 | 2:41:09 UTC

Open_Cl

The SoG is just the name of the app.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 110
Credit: 86,293,163
RAC: 618,780
Level
Thr
Scientific publications
wat
Message 50226 - Posted: 8 Aug 2018 | 7:21:41 UTC

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50227 - Posted: 8 Aug 2018 | 7:41:07 UTC - in response to Message 50226.

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.

I run what does not fail. Times are not important. I am running SETI@home on a ulefone smart watch, on a Linux box and a Windows 10 PC.
Tullio

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50229 - Posted: 8 Aug 2018 | 10:34:55 UTC - in response to Message 50227.

Of course ulefone is a smart phone, not a smart watch as I wrote. It runs Android 7.1.1 and has also a GPU which SETI sees but Einstein does not. Or maybe their BOINC servers. It has eight processors and a 4 GB RAM.
tullio
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50231 - Posted: 8 Aug 2018 | 21:05:33 UTC - in response to Message 50226.

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.
I don't see any CUDA8 or CUDA9 apps on the list of SETI@home applications. The highest CUDA version used for Linux is 6.0.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50232 - Posted: 8 Aug 2018 | 21:23:52 UTC - in response to Message 50231.

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.

I don't see any CUDA8 or CUDA9 apps on the list of SETI@home applications. The highest CUDA version used for Linux is 6.0.

SETI has a long history of encouraging volunteer developers to improve their stock applications. The best of the resulting applications (with high reliability and high validations rates) are accepted as new stock applications - the opencl_nvidia_SoG application mentioned earlier is one such. The cuda8 and cuda9 apps are candidates, but haven't yet reached a sufficient level of acceptance to be deployed as stock.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1947
Credit: 12,443,600,019
RAC: 5,419,094
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50234 - Posted: 8 Aug 2018 | 22:50:40 UTC - in response to Message 50227.
Last modified: 8 Aug 2018 | 22:51:22 UTC

I run what does not fail. Times are not important.
Then it would fit the above ideas if you would lower the power target and/or the clock frequency of your GTX 1050 Ti to make it stable with the GPUGrid app, right?

tullio
Send message
Joined: 8 May 18
Posts: 126
Credit: 11,866,185
RAC: 81,254
Level
Pro
Scientific publications
wat
Message 50235 - Posted: 9 Aug 2018 | 8:51:04 UTC - in response to Message 50234.

I am not a GPU expert and uses default values both on the 1050 Ti on the Windows 10 PC and 750 Ti on the Linux box. This last runs GPUGRID GPU tasks with no problem, so I leave 1050 Ti to run SETI@home tasks.
Tullio

anton
Send message
Joined: 23 Nov 11
Posts: 1
Credit: 2,062,655
RAC: 11,021
Level
Ala
Scientific publications
wat
Message 50236 - Posted: 9 Aug 2018 | 18:48:54 UTC
Last modified: 9 Aug 2018 | 18:49:54 UTC

hello to all the guys in a while comes the super NVIDIA GeForce GTX 1180 https://www.techpowerup.com/gpudb/3224/geforce-gtx-1180 I can not wait for me that you think of this new graphics card ?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1089
Credit: 1,396,732,414
RAC: 998,406
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50237 - Posted: 9 Aug 2018 | 21:44:35 UTC

This thread is supposed to be about a license expiring, and how that broke Windows GPU applications.

If your conversation isn't about "a license expiring, and how that broke Windows GPU applications", then please start a separate thread.

Thank you,
Jacob Klein

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 110
Credit: 86,293,163
RAC: 618,780
Level
Thr
Scientific publications
wat
Message 50238 - Posted: 11 Aug 2018 | 18:35:36 UTC - in response to Message 50232.

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.

I don't see any CUDA8 or CUDA9 apps on the list of SETI@home applications. The highest CUDA version used for Linux is 6.0.

SETI has a long history of encouraging volunteer developers to improve their stock applications. The best of the resulting applications (with high reliability and high validations rates) are accepted as new stock applications - the opencl_nvidia_SoG application mentioned earlier is one such. The cuda8 and cuda9 apps are candidates, but haven't yet reached a sufficient level of acceptance to be deployed as stock.

I'm curious as to where the threshold is "for a sufficient level of acceptance" for the CUDA special apps. What is the target? I have less than a 2.5% ratio of Inconclusives to Valid tasks. I think the stated goal for the science apps is less than a 5% Inconclusive ratio. On my systems, I believe I have reached a "sufficient level of acceptance". I see no reason not to have the zi3v special app qualify for stock.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,767,597,020
RAC: 1,274,807
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50239 - Posted: 11 Aug 2018 | 22:32:55 UTC - in response to Message 50238.

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.

I don't see any CUDA8 or CUDA9 apps on the list of SETI@home applications. The highest CUDA version used for Linux is 6.0.

SETI has a long history of encouraging volunteer developers to improve their stock applications. The best of the resulting applications (with high reliability and high validations rates) are accepted as new stock applications - the opencl_nvidia_SoG application mentioned earlier is one such. The cuda8 and cuda9 apps are candidates, but haven't yet reached a sufficient level of acceptance to be deployed as stock.

I'm curious as to where the threshold is "for a sufficient level of acceptance" for the CUDA special apps. What is the target? I have less than a 2.5% ratio of Inconclusives to Valid tasks. I think the stated goal for the science apps is less than a 5% Inconclusive ratio. On my systems, I believe I have reached a "sufficient level of acceptance". I see no reason not to have the zi3v special app qualify for stock.

It's not the performance on any one machine - yours, or anybody else's. You would have to convince Eric Korpela (and nobody else) that the overall validation rate, across all computers that might be eligible - under the rules of eligibility that you will have to supply him with - to download the app, will be acceptable within the project's standards. Which I don't know, but Eric does.

My personal validation rate at this moment is 17 inconclusive from 1038 valid, with the SoG app on NVidia under Windows. Previous experience tells me that the inconclusives are usually against wingmates running 'the usual suspects' - yup, there's a v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin in there. That's why offline bench testing against known good reference results is so important - it eliminates the variability of unverified wingmates.

Christophe Daulie
Send message
Joined: 30 Aug 15
Posts: 1
Credit: 19,504,225
RAC: 61,079
Level
Pro
Scientific publications
wat
Message 50320 - Posted: 28 Aug 2018 | 15:07:59 UTC

Hello,

Next applications "Long runs (8-12 hours on fastest card) v9.22 (cuda80)" don't work and will be rejected. What is up ?

greetz

mmonnin
Send message
Joined: 2 Jul 16
Posts: 177
Credit: 323,408,389
RAC: 1,558,026
Level
Asp
Scientific publications
wat
Message 50328 - Posted: 28 Aug 2018 | 22:05:14 UTC - in response to Message 50320.

Hello,

Next applications "Long runs (8-12 hours on fastest card) v9.22 (cuda80)" don't work and will be rejected. What is up ?

greetz


See this thread.
https://www.gpugrid.net/forum_thread.php?id=4822

Post to thread

Message boards : News : Windows GPU Applications broken