Welcome back Gpugrid

Message boards : Server and website : Welcome back Gpugrid

Author	Message
Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 46106 - Posted: 9 Jan 2017 \| 11:17:25 UTC
	Nice to see you reconnected.
	ID: 46106 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,652,742,755 RAC: 2,555,673 Level Scientific publications	Message 46107 - Posted: 9 Jan 2017 \| 11:29:58 UTC
	My house is nice and chilly now, -22C outside
	ID: 46107 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1091 Credit: 6,643,281,926 RAC: 5,849,001 Level Scientific publications	Message 46108 - Posted: 9 Jan 2017 \| 11:39:16 UTC
	I am asking just out of curiosity: what was the reason for this lenghty outage?
	ID: 46108 \| Rating: 0 \| rate: / Reply Quote

Logan Carr Send message Joined: 12 Aug 15 Posts: 240 Credit: 64,069,811 RAC: 0 Level Scientific publications	Message 46109 - Posted: 9 Jan 2017 \| 13:16:37 UTC - in response to Message 46108.
	I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan
	ID: 46109 \| Rating: 0 \| rate: / Reply Quote

Riaan Send message Joined: 16 Dec 10 Posts: 4 Credit: 19,812,500 RAC: 0 Level Scientific publications	Message 46111 - Posted: 9 Jan 2017 \| 18:15:25 UTC
	I can't seem to find any reason for the down time nor an apology for it. Maybe my GPU cycles are better spent on a project that monitors their systems over a weekend and has better up-time. Since they don't seem to look after their own systems, what would they care about my hard worked data? Or at least point me at the post that shows you care about us.
	ID: 46111 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 46112 - Posted: 9 Jan 2017 \| 18:43:26 UTC - in response to Message 46109.
	I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan Logan, the BOINC site being down has nothing to do with GPUGrid. Apparently the GPUGrid server crashed during the weekend and nobody noticed. It sure did create a crazy backlog of WUs trying to upload. :-(
	ID: 46112 \| Rating: 0 \| rate: / Reply Quote

Logan Carr Send message Joined: 12 Aug 15 Posts: 240 Credit: 64,069,811 RAC: 0 Level Scientific publications	Message 46113 - Posted: 9 Jan 2017 \| 19:55:35 UTC - in response to Message 46112. Last modified: 9 Jan 2017 \| 19:56:53 UTC
	I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan Logan, the BOINC site being down has nothing to do with GPUGrid. Apparently the GPUGrid server crashed during the weekend and nobody noticed. It sure did create a crazy backlog of WUs trying to upload. :-( Ah alright. Thanks for letting me know! The timing must have been just right then, haha. My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away. We all need to step away from our jobs sometimes, so maybe that's what they did. Either way, the website is back up and that's all that counts, right? Let's all try to think positively about these situations. Also if this website was down for much longer, check gpugrid's twitter account. They post useful stuff there. I have no intention of lecturing if it appears that way. I'm just trying to make positive vibes :) ____________ Cruncher/Learner in progress.
	ID: 46113 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 467 Credit: 8,393,622,716 RAC: 10,044,394 Level Scientific publications	Message 46115 - Posted: 9 Jan 2017 \| 22:51:42 UTC - in response to Message 46108.
	I am asking just out of curiosity: what was the reason for this lenghty outage? I would like to know as well. And somehow, I received 5 ghost units during this outage!
	ID: 46115 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1091 Credit: 6,643,281,926 RAC: 5,849,001 Level Scientific publications	Message 46117 - Posted: 10 Jan 2017 \| 6:21:23 UTC - in response to Message 46113.
	My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away. No problem if the scientists themselves had taken the weekend off, they have produced plenty of WUs during last week anyway. However, I was a little surprised that there was not even one IT person at least in any kind of standby and would have noticed already on Saturday evening that there was a problem that got even worse by Sunday morning.
	ID: 46117 \| Rating: 0 \| rate: / Reply Quote

John Send message Joined: 15 Oct 11 Posts: 17 Credit: 81,085,378 RAC: 0 Level Scientific publications	Message 46118 - Posted: 10 Jan 2017 \| 15:57:24 UTC - in response to Message 46117.
	My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away. No problem if the scientists themselves had taken the weekend off, they have produced plenty of WUs during last week anyway. However, I was a little surprised that there was not even one IT person at least in any kind of standby and would have noticed already on Saturday evening that there was a problem that got even worse by Sunday morning. Was wondering the same...nobody checking on the server(s) for (approx. 2 days....) Did not get my 24 hr bonus because of this.. I know small potatoes..... :)
	ID: 46118 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 46140 - Posted: 10 Jan 2017 \| 22:56:54 UTC
	There was a server crash. Sometimes it can take us a day to notice if we are not currently actively monitoring everything. Sorry for any inconvenience caused by it. Maybe best send us a mail if it happens again.
	ID: 46140 \| Rating: 0 \| rate: / Reply Quote

wiyosaya Send message Joined: 22 Nov 09 Posts: 114 Credit: 589,114,683 RAC: 0 Level Scientific publications	Message 46157 - Posted: 12 Jan 2017 \| 15:43:41 UTC - in response to Message 46140.
	Thanks for the update. Unfortunately, the WU that I had gotten before the crash and had finished without error uploaded and was not credited. That has happened before, but not that often, and these were extenuating circumstances, so I am not all that concerned. ____________
	ID: 46157 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 2,370,679,288 RAC: 2,404,045 Level Scientific publications	Message 46160 - Posted: 12 Jan 2017 \| 17:17:51 UTC
	Stephan said, Maybe best send us a mail if it happens again. Where do we send the email when the GPUGRID site is unavailable?
	ID: 46160 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,652,742,755 RAC: 2,555,673 Level Scientific publications	Message 46168 - Posted: 13 Jan 2017 \| 12:34:40 UTC
	I had two BNBS2 WUs run for 100k+ seconds and had a validation error, can anyone explain this?
	ID: 46168 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 851 Level Scientific publications	Message 46174 - Posted: 14 Jan 2017 \| 17:36:28 UTC - in response to Message 46168. Last modified: 14 Jan 2017 \| 17:38:02 UTC
	I had two BNBS2 WUs run for 100k+ seconds and had a validation error, can anyone explain this? Perhaps your host had a power outage, and these GPUGrid tasks restarted from 0%. In such cases it is practical to abort the workunits manually, as there's no point in spending time and electricity crunching them. Here's two excerpts from the stderr.txt of your failed tasks: 1st: # GPU 2 : 73C # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 970 2nd: # GPU 0 : 73C # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690 Note that there's no line explaining the reason to the exit from the application between the 1st and the 2nd line, which is usually the sign of a dirty shutdown.
	ID: 46174 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,652,742,755 RAC: 2,555,673 Level Scientific publications	Message 46194 - Posted: 16 Jan 2017 \| 2:29:01 UTC
	How did you get that information zoltan? I've been curious to see some of your WUs
	ID: 46194 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 851 Level Scientific publications	Message 46229 - Posted: 18 Jan 2017 \| 22:32:40 UTC - in response to Message 46194.
	How did you get that information zoltan? I've been curious to see some of your WUs Every host computer have a list of workunits. If you click on the ID (or name in other view, it's the first column of the tasklist) of a finished task, you can see detailed information of the given task, and the second part is the "stderr output" which is generated by the task while it is running.
	ID: 46229 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Server and website : Welcome back Gpugrid

	About	Science	Volunteers	Performance	Forum	Join us	Donate