Advanced search

Message boards : News : Server up again

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 20255 - Posted: 27 Jan 2011 | 10:01:25 UTC

The server is up again and all should get back to normal very quickly.

gdf

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 668
Credit: 2,498,095,550
RAC: 1
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20303 - Posted: 31 Jan 2011 | 15:25:53 UTC - in response to Message 20255.

Server down again
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,832,471,945
RAC: 1,228,982
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20304 - Posted: 31 Jan 2011 | 17:03:23 UTC

I'm getting

GPUGRID 31/01/2011 16:53:28 [error] Error reported by file upload server: can't open file

on all file uploads - apparently after all data has successfully transferred. (task 3631188)

Profile dataman
Avatar
Send message
Joined: 18 Sep 08
Posts: 36
Credit: 100,352,867
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 20308 - Posted: 31 Jan 2011 | 18:21:38 UTC - in response to Message 20304.

Same here. No dn/up loads. Have 13 uploads stuck in the pipe. It will take a bit to get these sorted. :)
____________

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,930,355,360
RAC: 236,297
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20309 - Posted: 31 Jan 2011 | 18:48:24 UTC - in response to Message 20308.

Yeah, looks like the gpugrid_file_deleter program failed on the server.
Might be to do with changes in task size, a bad batch or just a random service failure event, ie I know nothing.

For the next few hours I'm turning my systems off and doing some dusting ;)

For those that can micromanage you might want to suspend the pending uploads and keep an eye on the server status page

Some crunchers might want to hook up to MilkyWay (or other GPU project), keep a low cache (0.01 days) and wait it out.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,832,471,945
RAC: 1,228,982
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20310 - Posted: 31 Jan 2011 | 18:50:56 UTC - in response to Message 20309.
Last modified: 31 Jan 2011 | 18:52:35 UTC

Would a failed 'deleter' daemon really cause this error? Does their data storage really fill up that quickly? Feels more like a NAS mounting error to me.

[Edit - especially as there are - reportedly - only a few files waiting to be deleted]

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,930,355,360
RAC: 236,297
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20312 - Posted: 31 Jan 2011 | 19:07:24 UTC - in response to Message 20310.

Possibly but for all I know they have stopped the service.
Yes data storage is a repeating issue.
It could well be a NAS issue. Good point because if it is a NAS issue its probably down to the technicians to fix, not the research team, and that means it could be down until tomorrow morning.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 14,164,331,913
RAC: 6,146,814
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20313 - Posted: 31 Jan 2011 | 19:24:50 UTC - in response to Message 20312.

Bags me first in queue when it gets sorted, lol

Hypernova
Send message
Joined: 16 Nov 10
Posts: 22
Credit: 24,712,746
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 20316 - Posted: 31 Jan 2011 | 21:28:00 UTC - in response to Message 20312.

Possibly but for all I know they have stopped the service.
Yes data storage is a repeating issue.
It could well be a NAS issue. Good point because if it is a NAS issue its probably down to the technicians to fix, not the research team, and that means it could be down until tomorrow morning.


Even if the upload/download server is displayed as running, I cannot download or upload anything.
Here is the messages I get:
1/31/2011 10:21:58 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_0
1/31/2011 10:21:58 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_1
1/31/2011 10:22:05 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:05 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_0: transient upload error
1/31/2011 10:22:05 PM GPUGRID Backing off 56 min 8 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_0
1/31/2011 10:22:05 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_2
1/31/2011 10:22:15 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:15 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_1: transient upload error
1/31/2011 10:22:15 PM GPUGRID Backing off 45 min 37 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_1
1/31/2011 10:22:15 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_3
1/31/2011 10:22:21 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:21 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:22:21 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_2: transient upload error
1/31/2011 10:22:21 PM GPUGRID Backing off 36 min 21 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_2
1/31/2011 10:22:21 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_3: transient upload error
1/31/2011 10:22:21 PM GPUGRID Backing off 3 min 26 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_3
1/31/2011 10:23:22 PM GPUGRID Started upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_7
1/31/2011 10:23:22 PM GPUGRID Started upload of p40-IBUCH_1_wtEGFR_110121-7-20-RND6307_0_0
1/31/2011 10:23:23 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:23:23 PM GPUGRID Temporarily failed upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_7: transient upload error
1/31/2011 10:23:23 PM GPUGRID Backing off 1 min 0 sec on upload of p18-IBUCH_4_mutEGFR_110124-6-20-RND8641_1_7
1/31/2011 10:23:28 PM GPUGRID [error] Error reported by file upload server: can't open file
1/31/2011 10:23:28 PM GPUGRID Temporarily failed upload of p40-IBUCH_1_wtEGFR_110121-7-20-RND6307_0_0: transient upload error
1/31/2011 10:23:28 PM GPUGRID Backing off 1 min 0 sec on upload of p40-IBUCH_1_wtEGFR_110121-7-20-RND6307_0_0

12 Teraflops going to sleep.....

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 20317 - Posted: 31 Jan 2011 | 22:02:33 UTC - in response to Message 20316.

Fixed.
gdf

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,930,355,360
RAC: 236,297
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20318 - Posted: 31 Jan 2011 | 22:12:12 UTC - in response to Message 20317.

Thanks, and just to confirm I have downloaded new tasks and uploaded finished work. Most tasks should report back automatically in a reasonable time, but if you don't have work do a manual update.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1965
Credit: 12,929,017,594
RAC: 11,499,090
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20320 - Posted: 31 Jan 2011 | 22:38:46 UTC - in response to Message 20318.

I did a manual update, all 4 tasks were uploaded successfully two of them were successfully reported. I received two new WUs. But the two other WU cannot be reported, and I'm still receiving:

2011.01.31. 23:34:38 GPUGRID Message from server: Server can't open database

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1965
Credit: 12,929,017,594
RAC: 11,499,090
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20321 - Posted: 31 Jan 2011 | 22:47:02 UTC - in response to Message 20320.

It's ok now. The remaining two tasks were successfully reported, and I received two more WUs.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1965
Credit: 12,929,017,594
RAC: 11,499,090
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20322 - Posted: 31 Jan 2011 | 22:50:32 UTC - in response to Message 20321.

But the error message persists:

2011.01.31. 23:45:49 GPUGRID Message from server: Server can't open database

I don't understand this error message, everything seems to be working fine.

Post to thread

Message boards : News : Server up again