Advanced search

Message boards : Server and website : Welcome back Gpugrid

Author Message
Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46106 - Posted: 9 Jan 2017 | 11:17:25 UTC

Nice to see you reconnected.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,652,742,755
RAC: 2,555,673
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46107 - Posted: 9 Jan 2017 | 11:29:58 UTC

My house is nice and chilly now, -22C outside

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,643,281,926
RAC: 5,849,001
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46108 - Posted: 9 Jan 2017 | 11:39:16 UTC

I am asking just out of curiosity: what was the reason for this lenghty outage?

Profile Logan Carr
Send message
Joined: 12 Aug 15
Posts: 240
Credit: 64,069,811
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 46109 - Posted: 9 Jan 2017 | 13:16:37 UTC - in response to Message 46108.

I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that.

Hope this is helpful

-Logan

Riaan
Send message
Joined: 16 Dec 10
Posts: 4
Credit: 19,812,500
RAC: 0
Level
Pro
Scientific publications
watwat
Message 46111 - Posted: 9 Jan 2017 | 18:15:25 UTC

I can't seem to find any reason for the down time nor an apology for it.

Maybe my GPU cycles are better spent on a project that monitors their systems over a weekend and has better up-time.

Since they don't seem to look after their own systems, what would they care about my hard worked data?

Or at least point me at the post that shows you care about us.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46112 - Posted: 9 Jan 2017 | 18:43:26 UTC - in response to Message 46109.

I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan

Logan, the BOINC site being down has nothing to do with GPUGrid. Apparently the GPUGrid server crashed during the weekend and nobody noticed. It sure did create a crazy backlog of WUs trying to upload. :-(

Profile Logan Carr
Send message
Joined: 12 Aug 15
Posts: 240
Credit: 64,069,811
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 46113 - Posted: 9 Jan 2017 | 19:55:35 UTC - in response to Message 46112.
Last modified: 9 Jan 2017 | 19:56:53 UTC

I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan

Logan, the BOINC site being down has nothing to do with GPUGrid. Apparently the GPUGrid server crashed during the weekend and nobody noticed. It sure did create a crazy backlog of WUs trying to upload. :-(



Ah alright. Thanks for letting me know! The timing must have been just right then, haha.

My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away. We all need to step away from our jobs sometimes, so maybe that's what they did.

Either way, the website is back up and that's all that counts, right? Let's all try to think positively about these situations. Also if this website was down for much longer, check gpugrid's twitter account. They post useful stuff there.



I have no intention of lecturing if it appears that way. I'm just trying to make positive vibes :)
____________
Cruncher/Learner in progress.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,393,622,716
RAC: 10,044,394
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46115 - Posted: 9 Jan 2017 | 22:51:42 UTC - in response to Message 46108.

I am asking just out of curiosity: what was the reason for this lenghty outage?



I would like to know as well.


And somehow, I received 5 ghost units during this outage!



Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,643,281,926
RAC: 5,849,001
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46117 - Posted: 10 Jan 2017 | 6:21:23 UTC - in response to Message 46113.

My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away.

No problem if the scientists themselves had taken the weekend off, they have produced plenty of WUs during last week anyway.
However, I was a little surprised that there was not even one IT person at least in any kind of standby and would have noticed already on Saturday evening that there was a problem that got even worse by Sunday morning.

John
Send message
Joined: 15 Oct 11
Posts: 17
Credit: 81,085,378
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 46118 - Posted: 10 Jan 2017 | 15:57:24 UTC - in response to Message 46117.

My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away.

No problem if the scientists themselves had taken the weekend off, they have produced plenty of WUs during last week anyway.
However, I was a little surprised that there was not even one IT person at least in any kind of standby and would have noticed already on Saturday evening that there was a problem that got even worse by Sunday morning.


Was wondering the same...nobody checking on the server(s) for (approx. 2 days....)
Did not get my 24 hr bonus because of this..
I know small potatoes..... :)

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 46140 - Posted: 10 Jan 2017 | 22:56:54 UTC

There was a server crash. Sometimes it can take us a day to notice if we are not currently actively monitoring everything. Sorry for any inconvenience caused by it. Maybe best send us a mail if it happens again.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46157 - Posted: 12 Jan 2017 | 15:43:41 UTC - in response to Message 46140.

Thanks for the update.

Unfortunately, the WU that I had gotten before the crash and had finished without error uploaded and was not credited. That has happened before, but not that often, and these were extenuating circumstances, so I am not all that concerned.
____________

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 2,370,679,288
RAC: 2,404,045
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46160 - Posted: 12 Jan 2017 | 17:17:51 UTC

Stephan said,


Maybe best send us a mail if it happens again.


Where do we send the email when the GPUGRID site is unavailable?

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,652,742,755
RAC: 2,555,673
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46168 - Posted: 13 Jan 2017 | 12:34:40 UTC

I had two BNBS2 WUs run for 100k+ seconds and had a validation error, can anyone explain this?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46174 - Posted: 14 Jan 2017 | 17:36:28 UTC - in response to Message 46168.
Last modified: 14 Jan 2017 | 17:38:02 UTC

I had two BNBS2 WUs run for 100k+ seconds and had a validation error, can anyone explain this?

Perhaps your host had a power outage, and these GPUGrid tasks restarted from 0%.
In such cases it is practical to abort the workunits manually, as there's no point in spending time and electricity crunching them.

Here's two excerpts from the stderr.txt of your failed tasks:
1st:
# GPU 2 : 73C # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 970

2nd:
# GPU 0 : 73C # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690
Note that there's no line explaining the reason to the exit from the application between the 1st and the 2nd line, which is usually the sign of a dirty shutdown.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,652,742,755
RAC: 2,555,673
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46194 - Posted: 16 Jan 2017 | 2:29:01 UTC

How did you get that information zoltan? I've been curious to see some of your WUs

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46229 - Posted: 18 Jan 2017 | 22:32:40 UTC - in response to Message 46194.

How did you get that information zoltan? I've been curious to see some of your WUs

Every host computer have a list of workunits. If you click on the ID (or name in other view, it's the first column of the tasklist) of a finished task, you can see detailed information of the given task, and the second part is the "stderr output" which is generated by the task while it is running.

Post to thread

Message boards : Server and website : Welcome back Gpugrid

//