Advanced search

Message boards : Number crunching : Major SNAFU in Effect

Author Message
Nick Name
Send message
Joined: 3 Sep 13
Posts: 23
Credit: 965,342,244
RAC: 1,368,490
Level
Glu
Scientific publications
watwatwatwatwatwatwatwat
Message 51786 - Posted: 14 May 2019 | 1:14:37 UTC
Last modified: 14 May 2019 | 1:20:02 UTC

I noticed a ton of errors on a previously 100% reliable host tonight. Looks like a bad batch of WUs got pushed out, both IDP and KIX jobs are affected.
IDP
http://www.gpugrid.net/workunit.php?wuid=16483464
http://www.gpugrid.net/workunit.php?wuid=16480175
http://www.gpugrid.net/workunit.php?wuid=16480417
http://www.gpugrid.net/workunit.php?wuid=16453242
KIX
http://www.gpugrid.net/workunit.php?wuid=16483553
http://www.gpugrid.net/workunit.php?wuid=16474311
http://www.gpugrid.net/workunit.php?wuid=16483548


I have 25 bad jobs in total that also have failed on numerous other hosts.

[edit]I should have said mine is a Linux host, and I just noticed most of the other hosts where work failed are also Linux machines.[/edit]
____________
Team USA forum | Team USA page
Always crunching / Always recruiting

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 507
Credit: 4,333,180,951
RAC: 2,756,431
Level
Arg
Scientific publications
watwatwat
Message 51789 - Posted: 14 May 2019 | 1:24:29 UTC

http://www.gpugrid.net/results.php?hostid=490728

Above is my host with erroring WUs

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,777,062,589
RAC: 1,087,301
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51790 - Posted: 14 May 2019 | 1:59:34 UTC - in response to Message 51789.

http://www.gpugrid.net/results.php?hostid=490728

Above is my host with erroring WUs



Did someone forget to renew a license?





Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 51791 - Posted: 14 May 2019 | 3:29:32 UTC

I'm getting nothing but comp errors on these new tasks also.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 4
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 51792 - Posted: 14 May 2019 | 4:18:58 UTC

Same here, of course. But I haven't seen anyone from the project around here for a while. Is anyone at home?

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 70
Credit: 1,003,056,251
RAC: 47,739
Level
Met
Scientific publications
watwatwatwatwat
Message 51793 - Posted: 14 May 2019 | 4:42:08 UTC

Same here as well. Error 212 on WU's that were running fine up to 4 -5 hours ago. sounds like a license thing to me as well. Suspended project until the issue is resolved.

DRSMT
Send message
Joined: 23 Feb 17
Posts: 20
Credit: 618,195,847
RAC: 61,838
Level
Lys
Scientific publications
wat
Message 51795 - Posted: 14 May 2019 | 6:00:57 UTC

Have the same issues on two Linux machines, so not sure if this is a license thing.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 94
Credit: 1,594,427,169
RAC: 1,617,573
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 51796 - Posted: 14 May 2019 | 6:18:42 UTC

For the last 2 years, the License error usually comes after July 1st. 12 month license, I am assuming.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 51798 - Posted: 14 May 2019 | 7:19:59 UTC

Every task I had in my cache on 4 hosts errored out today. Since I don't run very high resource allotment, some tasks had been running a couple of hours a day with no issues until today. The hosts are processing other projects without any errors during this time. I'd have to guess a license expired today.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 10
Credit: 601,673,715
RAC: 1,005,711
Level
Lys
Scientific publications
wat
Message 51799 - Posted: 14 May 2019 | 7:56:34 UTC
Last modified: 14 May 2019 | 8:02:17 UTC

Same. I have two Ubuntu machines that throw up nothing but immediate errors now. My two Windows crunchers are fine, though.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2047
Credit: 14,819,836,269
RAC: 2,276,979
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51801 - Posted: 14 May 2019 | 8:03:13 UTC

The Linux app is broken (most probably its license expired).
All of my Linux hosts run immediately into this error with every single workunit:

<core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44)</message> <stderr_txt> </stderr_txt> ]]>

However my Windows host are crunching happily, so I switched back to Windows on my Linux hosts.

The GPUGrid staff need to act on this without delay.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 58
Credit: 583,364,698
RAC: 35,760
Level
Lys
Scientific publications
watwatwat
Message 51802 - Posted: 14 May 2019 | 8:12:57 UTC
Last modified: 14 May 2019 | 8:13:21 UTC

Same over here:
http://www.gpugrid.net/forum_thread.php?id=4909&nowrap=true#51794

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,196,923,945
RAC: 962,434
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51804 - Posted: 14 May 2019 | 10:15:56 UTC

The Linux ACEMD v9.19 apps were deployed on 13/14 February 2018 - so it possibly looks like a 15 month licence expiry.

The Windows v9.22 apps were deployed on 26 July 2018, so with luck we have until late October for those...

Applications

rod4x4
Send message
Joined: 4 Aug 14
Posts: 94
Credit: 1,594,427,169
RAC: 1,617,573
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 51808 - Posted: 14 May 2019 | 12:22:42 UTC
Last modified: 14 May 2019 | 12:51:24 UTC

A temporary fix for Linux users is to set your system date back 1 year.

EDIT: Setting time back 1 year caused certificate errors with other projects. So I have now set time back 1 month. This seems to work better.

This has allowed me to start GPUgrid jobs successfully.

You may need to stop time sync services so the system does not reset the time back to current time.

For systemd based distros (eg...Ubuntu) - sudo datetimectl set-ntp 0 will turn time sync off

EDIT: you will need to reissue this command and reset time after each reboot. If this licensing issue persists, I will post a more permanent time sync fix

This was the temporary fix last year when license issues occurred.

James C. Owens
Send message
Joined: 16 Apr 09
Posts: 5
Credit: 2,026,505,432
RAC: 1,182,591
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51810 - Posted: 14 May 2019 | 14:35:36 UTC - in response to Message 51808.

Is project leadership aware of the licensing expiration? Seems like someone should be keeping a tickler file for this so that renewals could happen before WU's start erroring out.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2047
Credit: 14,819,836,269
RAC: 2,276,979
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51811 - Posted: 14 May 2019 | 16:25:44 UTC - in response to Message 51810.
Last modified: 14 May 2019 | 16:25:53 UTC

Is project leadership aware of the licensing expiration?
Apparently not. That's why this SNAFU.

Seems like someone should be keeping a tickler file for this so that renewals could happen before WU's start erroring out.
True.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 4
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 51814 - Posted: 14 May 2019 | 17:37:27 UTC - in response to Message 51811.

There wasn't any notification of the pending shutdown of the Quantum Chemistry (CPU) work units either, or when they might be restarted.
I am not sure that there is any project leadership at the moment.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 51816 - Posted: 14 May 2019 | 19:51:19 UTC

I'm going to just suspend the project on all my hosts. The fact I have to exclude my Turing cards makes it difficult to work with the project anyway.

I'll just check back in occasionally and see if a new Linux app is available with current licensing.

Erich56
Send message
Joined: 1 Jan 15
Posts: 595
Credit: 3,082,964,244
RAC: 1,816,651
Level
Arg
Scientific publications
watwatwatwatwatwat
Message 51817 - Posted: 14 May 2019 | 20:25:16 UTC - in response to Message 51810.

Seems like someone should be keeping a tickler file for this so that renewals could happen before WU's start erroring out.

also in the past, license renewals were not done in time and tasks failed. Too bad, but it really seems that the people at GPUGRID simply forget about these things.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,196,923,945
RAC: 962,434
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51821 - Posted: 15 May 2019 | 6:46:56 UTC

Just in case anyone is still wondering, I've been sent WU 16485663.

Failed three times on Linux v9.19 hosts, now running normally under Windows v9.22

Confirms that it's an application problem, not a data problem.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,196,923,945
RAC: 962,434
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51822 - Posted: 15 May 2019 | 10:06:39 UTC

Got a PM reply from Toni:

Oh gosh, thanks ...

:-)

rod4x4
Send message
Joined: 4 Aug 14
Posts: 94
Credit: 1,594,427,169
RAC: 1,617,573
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 51823 - Posted: 15 May 2019 | 13:54:40 UTC - in response to Message 51822.

Got a PM reply from Toni:

Hey Richard, thanks for raising this with admins.
Much appreciated!

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 70
Credit: 1,003,056,251
RAC: 47,739
Level
Met
Scientific publications
watwatwatwatwat
Message 51824 - Posted: 15 May 2019 | 15:34:35 UTC

So hopefully we will be back up and running shortly :). Thanks for bring it to Toni's attention.

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 4,730,024
Level
Tyr
Scientific publications
wat
Message 51825 - Posted: 15 May 2019 | 16:32:52 UTC

Will someone tell us when the FUBAR has finished???
____________

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 58
Credit: 583,364,698
RAC: 35,760
Level
Lys
Scientific publications
watwatwat
Message 51828 - Posted: 16 May 2019 | 6:14:49 UTC

The problem is still not resolved...

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Erich56
Send message
Joined: 1 Jan 15
Posts: 595
Credit: 3,082,964,244
RAC: 1,816,651
Level
Arg
Scientific publications
watwatwatwatwatwat
Message 51830 - Posted: 16 May 2019 | 8:38:08 UTC - in response to Message 51823.

Got a PM reply from Toni:

Hey Richard, thanks for raising this with admins.
Much appreciated!

What surprises me though is that no one from GPUGRID found out by themselves :-(

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 70
Credit: 1,003,056,251
RAC: 47,739
Level
Met
Scientific publications
watwatwatwatwat
Message 51838 - Posted: 16 May 2019 | 20:47:42 UTC

I aborted all my gpu wu's to let someone with windows run them. Was hoping the certificate would be renewed by now so I could finish the ones I had time invested that I suspended before they failed. No such luck :-(. Barley enough calander time left to finish them anyway.

Matt Kowal
Avatar
Send message
Joined: 27 May 14
Posts: 9
Credit: 91,818,751
RAC: 135,544
Level
Thr
Scientific publications
wat
Message 51839 - Posted: 16 May 2019 | 20:52:30 UTC

Toni responded in this thread: http://www.gpugrid.net/forum_thread.php?id=4925&nowrap=true#51834

We are aware of the problem. We'd like to do a major version upgrade rather than continue fixing the old one. For the time being, I'm deprecating the app for linux so crunching goes on on Windows rather than erroring out.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 177
Credit: 4,121,030,726
RAC: 1,227,485
Level
Arg
Scientific publications
watwatwat
Message 51841 - Posted: 16 May 2019 | 22:01:32 UTC - in response to Message 51839.

So it looks like time to find a new project for the majority of my machines. Only have 1 that still runs M$
____________

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 1,301
Level
Lys
Scientific publications
wat
Message 51844 - Posted: 16 May 2019 | 22:41:24 UTC

I came back from the Pent to this. :( Thought my computers borked.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 51845 - Posted: 17 May 2019 | 1:01:26 UTC

So does anyone want to explain how a BOINC wrapper works? The docs don't really say anything about the mechanics involved.

What pre-requisites are there?

Anyone running a BOINC wrapper on other projects and care to elaborate?

tullio
Send message
Joined: 8 May 18
Posts: 157
Credit: 40,042,195
RAC: 53,262
Level
Val
Scientific publications
wat
Message 51846 - Posted: 17 May 2019 | 2:46:52 UTC

LHC@home uses a boincwrapper. All Windows, MAC OSX and other Linux distros can run their programs written in Scientific Linux. You must have VirtualBox installed.
Tullio
____________

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 1,301
Level
Lys
Scientific publications
wat
Message 51859 - Posted: 17 May 2019 | 22:04:04 UTC - in response to Message 51846.
Last modified: 17 May 2019 | 22:05:14 UTC

LHC@home uses a boincwrapper. All Windows, MAC OSX and other Linux distros can run their programs written in Scientific Linux. You must have VirtualBox installed.
Tullio


Nope that's even more separation from the client including OS and environment variables like specific libc versions. In the case of LHC they give the choice of VBox or setting up CVFMS and Singularity on your own which is included in vbox.vdi file

https://boinc.berkeley.edu/trac/wiki/WrapperApp

If you DON'T want want to include progress % complete, check pointing, GPU device # within your app then the wrapper can do that.

Don't expect it to be as efficient as there is now another layer between the exe doing the calculations and hardware.

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 54
Credit: 745,461,800
RAC: 2,148,871
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 51863 - Posted: 18 May 2019 | 4:00:45 UTC
Last modified: 18 May 2019 | 4:01:40 UTC

So we can no longer run this BOINC GPU Project under BOINC version 7.9.3 on Ubuntu 18.04.2 LTS [4.15.0-51-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] Running NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 390.11?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2047
Credit: 14,819,836,269
RAC: 2,276,979
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51866 - Posted: 18 May 2019 | 9:59:43 UTC - in response to Message 51863.

So we can no longer run this BOINC GPU Project under BOINC version 7.9.3 on Ubuntu 18.04.2 LTS [4.15.0-51-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] Running NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 390.11?
Correction:
We can not run this BOINC GPU Project (GPUGrid) on any Linux distro for a who-knows-how-long time period.

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 4,730,024
Level
Tyr
Scientific publications
wat
Message 51875 - Posted: 18 May 2019 | 15:27:10 UTC

I bet it won't be long before we get Linux WUs again. In the mean time there's asteroids, einstein, milkyway & seti to keep one busy.
____________

MarkJ
Volunteer moderator
Project tester
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 732
Credit: 197,194,445
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51883 - Posted: 19 May 2019 | 3:43:46 UTC - in response to Message 51845.
Last modified: 19 May 2019 | 3:45:54 UTC

So does anyone want to explain how a BOINC wrapper works? The docs don't really say anything about the mechanics involved.

From what I understand its a wrapper program they put around their normal (non-BOINC) science app that is used to invoke it. No pre-reqs. No need for vbox. That way the wrapper handles the BOINC interaction and allows the use of non-BOINC app.

See https://boinc.berkeley.edu/trac/wiki/WrapperApp for docs.
____________
BOINC blog

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 51886 - Posted: 19 May 2019 | 17:56:49 UTC - in response to Message 51883.

Thanks, I had already read that document and was and still am confused. I gather it is not a VM. So assume you don't need virtualization on the cpu?

Why does BOINC offer versions of BOINC+Virtual Box if this mechanism does not require VBox?

Does VBox do more or less than a wrapper? What are the limitations of a wrapper compared to VBox?

Does the application wrapped in a wrapper have to be native code for the platform? With a VM you could run an app not native to the platform.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 1,301
Level
Lys
Scientific publications
wat
Message 51887 - Posted: 19 May 2019 | 20:45:18 UTC - in response to Message 51886.

Thanks, I had already read that document and was and still am confused. I gather it is not a VM. So assume you don't need virtualization on the cpu?

Why does BOINC offer versions of BOINC+Virtual Box if this mechanism does not require VBox?

Does VBox do more or less than a wrapper? What are the limitations of a wrapper compared to VBox?

Does the application wrapped in a wrapper have to be native code for the platform? With a VM you could run an app not native to the platform.


The wrapper does not need VBox. It's just another interface to perform BOINC related functions while the project's 'math.exe' or w/e is doing the crunching ONLY performs calculations.

VBox can set up the entire OS environment to satisfy all the specifics needed to crunch. If a project needs extra programs that do not typically come with an OS or are normally installed by people then that can be included in the vbox image. Again as LHC as the example, Singularity and CVFMS are included in the image. They can also make 1 vbox image for Windows and Linux Host OSs

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 4,730,024
Level
Tyr
Scientific publications
wat
Message 51888 - Posted: 19 May 2019 | 23:36:21 UTC

Is the BOINC wrapper a memory hog like virtualbox???
____________

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 1,301
Level
Lys
Scientific publications
wat
Message 51889 - Posted: 20 May 2019 | 0:07:34 UTC
Last modified: 20 May 2019 | 0:09:35 UTC

I'm trying to to think of projects that use it. Going through project folders it looks like DrugDiscovery CPU Goofy, MindModeling and CAS used it. DHEP, Gerasium, Moo, SRBase, Enigma, YoYo and Yafu are active projects that have a wrapper in the exe name. Some Yoyo ECM tasks can use like 8GB but I think thats the data as its limited to certain types. But nothing like LHC Atlas using 10gb for the other projects. VBox apps are huge because its an entire image.

It seems like most GPUGrid crunching is done in Windows as the stats have only gone down from about 600m to 400m per day.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 51890 - Posted: 20 May 2019 | 2:31:55 UTC

That still shows the Linux hosts responsible for 1/3 of the total credit. And since the percentage of Linux hosts is 37% compared to 54% for Windows hosts, the Linux hosts are showing a greater percentage of higher production hosts compared to Windows hosts.

It would benefit the project to return the Linux hosts to participation.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,196,923,945
RAC: 962,434
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51891 - Posted: 20 May 2019 | 11:24:17 UTC - in response to Message 51890.

It would benefit the project to return the Linux hosts to participation.

Which is why the PM which got Toni's attention had the subject line

Research being delayed - Linux apps broken

:-)

Profile bcavnaugh
Send message
Joined: 8 Nov 13
Posts: 54
Credit: 745,461,800
RAC: 2,148,871
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52084 - Posted: 14 Jun 2019 | 1:02:09 UTC

Been a while, and news?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2047
Credit: 14,819,836,269
RAC: 2,276,979
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52426 - Posted: 8 Aug 2019 | 20:27:39 UTC

Now the license of the Windows app has expired.
I have the feeling that this project is more important for us than for the GPUGrid team, if there's such an entity at all.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 154
Credit: 2,550,603,228
RAC: 3,070,780
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52427 - Posted: 8 Aug 2019 | 21:52:42 UTC - in response to Message 52426.

Now the license of the Windows app has expired.
I have the feeling that this project is more important for us than for the GPUGrid team, if there's such an entity at all.


August is the vacation month in Italy. Looking at the "about" I don't see a lot of diversity. Probably took off a week to get their heads out of the quantum clouds and socialize with opposite sex.

w1hue
Send message
Joined: 28 Sep 09
Posts: 16
Credit: 68,627,575
RAC: 60,496
Level
Thr
Scientific publications
watwatwat
Message 52429 - Posted: 9 Aug 2019 | 1:06:31 UTC

August is vacation month in Italy . . .

Most likely most take off the whole month . . . not just a week.
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 4
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52430 - Posted: 9 Aug 2019 | 1:51:07 UTC

They are in Spain, so I always figured they would head to Majorca. No one ever denied it at any rate.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 436
Credit: 499,137,746
RAC: 358,574
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52431 - Posted: 9 Aug 2019 | 12:03:56 UTC - in response to Message 51792.

Same here, of course. But I haven't seen anyone from the project around here for a while. Is anyone at home?

It looks to me like the two main researchers are about to get a flood of workunits that failed due to all of the tasks giving an error. If so, they will have to notify the programmer or programmers, and start an effort to fix the problem. If they're able to read and write in English, they'll then have little worthwhile to do other than tell us what happened, and when they expect a fix.

wolfman1360
Send message
Joined: 19 Feb 17
Posts: 5
Credit: 29,672,725
RAC: 318,443
Level
Val
Scientific publications
wat
Message 52705 - Posted: 23 Sep 2019 | 20:54:15 UTC - in response to Message 51786.

Am I to assume this has been fixed and I can add my Linux machine here? Or are there no WUs for Linux as of yet?
I know I'm crunching okay under Windows...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2047
Credit: 14,819,836,269
RAC: 2,276,979
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52706 - Posted: 23 Sep 2019 | 21:48:12 UTC - in response to Message 52705.
Last modified: 23 Sep 2019 | 21:49:43 UTC

Am I to assume this has been fixed and I can add my Linux machine here?
It's been fixed, thoguh only the Windows app is released to the production line.
You can add your Linux machine, but it will receive only beta test tasks for a while.

Or are there no WUs for Linux as of yet?
The workunits are common, but the new Linux app will be put into the production line only when the new Windows app is working as it should be.

I know I'm crunching okay under Windows...
Me too.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 149,124
Level
Leu
Scientific publications
wat
Message 52708 - Posted: 23 Sep 2019 | 23:53:39 UTC - in response to Message 52706.

Am I to assume this has been fixed and I can add my Linux machine here?
It's been fixed, thoguh only the Windows app is released to the production line.
You can add your Linux machine, but it will receive only beta test tasks for a while.

Or are there no WUs for Linux as of yet?
The workunits are common, but the new Linux app will be put into the production line only when the new Windows app is working as it should be.

I know I'm crunching okay under Windows...
Me too.

I am receiving non-Toni test tasks today for my Linux host. Looks like normal project work.
https://www.gpugrid.net/result.php?resultid=21405079
https://www.gpugrid.net/result.php?resultid=21405557
https://www.gpugrid.net/result.php?resultid=21405187
https://www.gpugrid.net/result.php?resultid=21405090

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,196,923,945
RAC: 962,434
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52712 - Posted: 24 Sep 2019 | 7:34:51 UTC - in response to Message 52708.

I am receiving non-Toni test tasks today for my Linux host. Looks like normal project work.
https://www.gpugrid.net/result.php?resultid=21405079
https://www.gpugrid.net/result.php?resultid=21405557
https://www.gpugrid.net/result.php?resultid=21405187
https://www.gpugrid.net/result.php?resultid=21405090

Yes, 'Application version: New version of ACEMD v2.06 (cuda100)' is the new normal.

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 4,730,024
Level
Tyr
Scientific publications
wat
Message 52714 - Posted: 24 Sep 2019 | 13:58:59 UTC

Being in check-in mode for months has got me so confused. I thought Toni asked not to run acemd3 on Linux as that's not what she needs to test. Or are we now good to go on Linux WUs???
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 4
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52715 - Posted: 24 Sep 2019 | 14:17:21 UTC - in response to Message 52714.

I thought Toni asked not to run acemd3 on Linux as that's not what she needs to test.

Yes, that is what he said. I am just surprised that they send them to Linux machines at all. Can't they block them?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2047
Credit: 14,819,836,269
RAC: 2,276,979
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52716 - Posted: 24 Sep 2019 | 17:44:05 UTC - in response to Message 52712.

I am receiving non-Toni test tasks today for my Linux host. Looks like normal project work.
Yes, 'Application version: New version of ACEMD v2.06 (cuda100)' is the new normal.
I received such tasks too. These are from the short queue. (Which is epmty now, though).
I think Toni put some workunits from the short queue to the "New version of ACEMD" queue from time to time to serve as a bit longer test.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 1,301
Level
Lys
Scientific publications
wat
Message 52717 - Posted: 24 Sep 2019 | 19:57:42 UTC

I've received only 1 since he's said that. If admins only want Windows hosts to receive the tasks then they could always depreciate the Linux app.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 94
Credit: 1,594,427,169
RAC: 1,617,573
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52723 - Posted: 26 Sep 2019 | 0:52:23 UTC - in response to Message 52716.

I received such tasks too. These are from the short queue. (Which is epmty now, though).
I think Toni put some workunits from the short queue to the "New version of ACEMD" queue from time to time to serve as a bit longer test.

Agreed.
My Windows hosts do not process from the short queue, only from the long queue and test queue.
I am receiving ADRIA short work units from the test queue. This would seem to indicate ADRIA is becoming familiar with creating ACEMD3 work units.
We are getting closer to full release of ACEMD3!

Post to thread

Message boards : Number crunching : Major SNAFU in Effect