Advanced search

Message boards : News : Server up again

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 20255 - Posted: 27 Jan 2011 | 10:01:25 UTC

The server is up again and all should get back to normal very quickly.

gdf

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 589
Credit: 2,039,762,925
RAC: 1,511,935
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20303 - Posted: 31 Jan 2011 | 15:25:53 UTC - in response to Message 20255.

Server down again
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 790
Credit: 1,423,662,395
RAC: 1,365,680
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20304 - Posted: 31 Jan 2011 | 17:03:23 UTC

I'm getting

GPUGRID 31/01/2011 16:53:28 [error] Error reported by file upload server: can't open file

on all file uploads - apparently after all data has successfully transferred. (task 3631188)

Profile dataman
Avatar
Send message
Joined: 18 Sep 08
Posts: 36
Credit: 100,352,867
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 20308 - Posted: 31 Jan 2011 | 18:21:38 UTC - in response to Message 20304.

Same here. No dn/up loads. Have 13 uploads stuck in the pipe. It will take a bit to get these sorted. :)
____________

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,834,906,524
RAC: 276,970
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20309 - Posted: 31 Jan 2011 | 18:48:24 UTC - in response to Message 20308.

Yeah, looks like the gpugrid_file_deleter program failed on the server.
Might be to do with changes in task size, a bad batch or just a random service failure event, ie I know nothing.

For the next few hours I'm turning my systems off and doing some dusting ;)

For those that can micromanage you might want to suspend the pending uploads and keep an eye on the server status page

Some crunchers might want to hook up to MilkyWay (or other GPU project), keep a low cache (0.01 days) and wait it out.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 790
Credit: 1,423,662,395
RAC: 1,365,680
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20310 - Posted: 31 Jan 2011 | 18:50:56 UTC - in response to Message 20309.
Last modified: 31 Jan 2011 | 18:52:35 UTC

Would a failed 'deleter' daemon really cause this error? Does their data storage really fill up that quickly? Feels more like a NAS mounting error to me.

[Edit - especially as there are - reportedly - only a few files waiting to be deleted]

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,834,906,524
RAC: 276,970
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20312 - Posted: 31 Jan 2011 | 19:07:24 UTC - in response to Message 20310.

Possibly but for all I know they have stopped the service.
Yes data storage is a repeating issue.
It could well be a NAS issue. Good point because if it is a NAS issue its probably down to the technicians to fix, not the research team, and that means it could be down until tomorrow morning.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 12,251,130,346
RAC: 8,744,322
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20313 - Posted: 31 Jan 2011 | 19:24:50 UTC - in response to Message 20312.

Bags me first in queue when it gets sorted, lol

Hypernova
Send message
Joined: 16 Nov 10
Posts: 22
Credit: 24,712,746
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 20316 - Posted: 31 Jan 2011 | 21:28:00 UTC - in response to Message 20312.

Possibly but for all I know they have stopped the service.
Yes data storage is a repeating issue.
It could well be a NAS issue. Good point because if it is a NAS issue its probably down to the technicians to fix, not the research team, and that means it could be down until tomorrow morning.


Even if the upload/download server is displayed as running, I cannot download or upload anything.
Here is the messages I get:
1/31/2011 10:21:58 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_0
1/31/2011 10:21:58 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_1
1/31/2011 10:22:05 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:05 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_0: transient upload error
1/31/2011 10:22:05 PM GPUGRID Backing off 56 min 8 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_0
1/31/2011 10:22:05 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_2
1/31/2011 10:22:15 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:15 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_1: transient upload error
1/31/2011 10:22:15 PM GPUGRID Backing off 45 min 37 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_1
1/31/2011 10:22:15 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_3
1/31/2011 10:22:21 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:21 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:21 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_2: transient upload error
1/31/2011 10:22:21 PM GPUGRID Backing off 36 min 21 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_2
1/31/2011 10:22:21 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_3: transient upload error
1/31/2011 10:22:21 PM GPUGRID Backing off 3 min 26 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_3
1/31/2011 10:23:22 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_7
1/31/2011 10:23:22 PM GPUGRID Started upload of p40-IBUCH_1_wtEGFR_110121-7-20-RND6307_0_0
1/31/2011 10:23:23 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:23:23 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_7: transient upload error
1/31/2011 10:23:23 PM GPUGRID Backing off 1 min 0 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_7
1/31/2011 10:23:28 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:23:28 PM GPUGRID Temporarily failed upload of p40-IBUCH_1_wtEGFR_110121-7-20-RND6307_0_0: transient upload error
1/31/2011 10:23:28 PM GPUGRID Backing off 1 min 0 sec on upload of p40-IBUCH_1_wtEGFR_110121-7-20-RND6307_0_0

12 Teraflops going to sleep.....

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 20317 - Posted: 31 Jan 2011 | 22:02:33 UTC - in response to Message 20316.

Fixed.
gdf

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,834,906,524
RAC: 276,970
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20318 - Posted: 31 Jan 2011 | 22:12:12 UTC - in response to Message 20317.

Thanks, and just to confirm I have downloaded new tasks and uploaded finished work. Most tasks should report back automatically in a reasonable time, but if you don't have work do a manual update.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,645,595,194
RAC: 9,957,536
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20320 - Posted: 31 Jan 2011 | 22:38:46 UTC - in response to Message 20318.

I did a manual update, all 4 tasks were uploaded successfully two of them were successfully reported. I received two new WUs. But the two other WU cannot be reported, and I'm still receiving:

2011.01.31. 23:34:38 GPUGRID Message from server: Server can't open database

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,645,595,194
RAC: 9,957,536
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20321 - Posted: 31 Jan 2011 | 22:47:02 UTC - in response to Message 20320.

It's ok now. The remaining two tasks were successfully reported, and I received two more WUs.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1844
Credit: 10,645,595,194
RAC: 9,957,536
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20322 - Posted: 31 Jan 2011 | 22:50:32 UTC - in response to Message 20321.

But the error message persists:

2011.01.31. 23:45:49 GPUGRID Message from server: Server can't open database

I don't understand this error message, everything seems to be working fine.

Post to thread

Message boards : News : Server up again