Advanced search

Message boards : Server and website : http error with HIV workunits

Author Message
Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11292 - Posted: 24 Jul 2009 | 19:49:47 UTC
Last modified: 24 Jul 2009 | 19:53:20 UTC

I have had two issues today where a HIV workunit (635688 and 582818) stopped downloading with a http error, on all files in the package afaik. Even after letting it run its course it did not download. Eventually I had to cancel the workunits to keep going. Is this a workunit related issue ? The connection with the server was fine, other packets right before it and after it downloaded fine. What is the best course of action in cases like these ?

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11305 - Posted: 25 Jul 2009 | 10:27:15 UTC - in response to Message 11292.

Two more cases today. I have noticed that the issue seemingly is caused by THREE download threads being started simultaneously whereas normally only TWO threads are allowed. Hope this provides some insight into the issue.

Mark Henderson
Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11322 - Posted: 26 Jul 2009 | 6:23:07 UTC

I just had to abort transfer on 2 HIV workunits. Stalled in download with HTTP error.

7/26/2009 2:16:57 AM GPUGRID Temporarily failed download of 9-KASHIF_HIVPR_dim_ba5-22-9-KASHIF_HIVPR_dim_ba5-21-100-RND0604_2: HTTP error

7/26/2009 2:17:58 AM GPUGRID Temporarily failed download of 9-KASHIF_HIVPR_dim_ba5-22-9-KASHIF_HIVPR_dim_ba5-21-100-RND0604_1: HTTP error

Profile [AF>HFR>RR] Jim PROFIT
Send message
Joined: 3 Jun 07
Posts: 107
Credit: 30,296,137
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 11326 - Posted: 26 Jul 2009 | 8:34:37 UTC

Same for me today.

Profile MarkJ
Volunteer moderator
Project tester
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 730
Credit: 196,835,345
RAC: 536
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11327 - Posted: 26 Jul 2009 | 9:03:23 UTC

And you can add me to the list. Like the other guys its been doing this for the last couple of days.

26/07/2009 6:55:30 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_1 has wrong size: expected 1210492, got 0
26/07/2009 6:55:30 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_1
26/07/2009 6:55:30 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_2 has wrong size: expected 1210492, got 0
26/07/2009 6:55:30 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_2
26/07/2009 6:55:31 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_1: HTTP error
26/07/2009 6:55:31 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_1
26/07/2009 6:55:31 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_2: HTTP error
26/07/2009 6:55:31 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_2
26/07/2009 6:55:31 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_3 has wrong size: expected 410624, got 0
26/07/2009 6:55:31 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_3
26/07/2009 6:55:31 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-pdb_file has wrong size: expected 3442503, got 0
26/07/2009 6:55:31 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-pdb_file
26/07/2009 6:55:32 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_3: HTTP error
26/07/2009 6:55:32 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-76-KASHIF_HIVPR_dim_ba5-23-100-RND8018_3
26/07/2009 6:55:32 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-pdb_file: HTTP error
26/07/2009 6:55:32 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-pdb_file
26/07/2009 6:55:32 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-par_file has wrong size: expected 8402771, got 0
26/07/2009 6:55:32 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-par_file
26/07/2009 6:55:33 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-par_file: HTTP error
26/07/2009 6:55:33 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-par_file
26/07/2009 6:55:33 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc has wrong size: expected 872, got 0
26/07/2009 6:55:33 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc
26/07/2009 6:55:34 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc: HTTP error
26/07/2009 6:55:34 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc

____________
BOINC blog

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 748
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 11329 - Posted: 26 Jul 2009 | 10:08:54 UTC - in response to Message 11327.
Last modified: 26 Jul 2009 | 10:56:39 UTC

We stopped some HIV WUs two days ago, but they left behind remnants. Please abort them at will.

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11344 - Posted: 26 Jul 2009 | 17:26:26 UTC - in response to Message 11329.

Thanks. I'll have to because I'm being bombarded with them now.. :( and they block processing.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 748
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 11346 - Posted: 26 Jul 2009 | 18:20:59 UTC - in response to Message 11344.

We'll try to cancel them server-side asap, thanks for your patience.

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11347 - Posted: 26 Jul 2009 | 18:34:49 UTC - in response to Message 11346.

Thanks for that, much appreciated.

Ross*
Send message
Joined: 6 May 09
Posts: 34
Credit: 442,860,201
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11363 - Posted: 27 Jul 2009 | 8:55:38 UTC - in response to Message 11346.

We'll try to cancel them server-side asap, thanks for your patience.

I also have had 20 WUs errors in download , stuck in mid download , error in computation E.T.C
Typical of is below
27/07/2009 10:44:29 a.m. GPUGRID Finished download of 91-KASHIF_HIVPR_sub_so_ba1-5-91-KASHIF_HIVPR_sub_so_ba1-4-100-RND7343_2
27/07/2009 10:44:29 a.m. GPUGRID Started download of 91-KASHIF_HIVPR_sub_so_ba1-5-91-KASHIF_HIVPR_sub_so_ba1-4-100-RND7343_3
27/07/2009 10:45:17 a.m. GPUGRID Finished download of 91-KASHIF_HIVPR_sub_so_ba1-5-91-KASHIF_HIVPR_sub_so_ba1-4-100-RND7343_3
27/07/2009 10:45:17 a.m. GPUGRID Started download of 91-KASHIF_HIVPR_sub_so_ba1-5-pdb_file
27/07/2009 10:46:44 a.m. GPUGRID Finished download of 77-GIANNI_BINDX119-29-par_file
27/07/2009 10:46:44 a.m. GPUGRID Started download of 91-KASHIF_HIVPR_sub_so_ba1-5-psf_file
27/07/2009 10:46:44 a.m. GPUGRID [error] MD5 check failed for 77-GIANNI_BINDX119-29-par_file
27/07/2009 10:46:44 a.m. GPUGRID [error] expected c2605a4451ad8240f29215f84cb6de7e, got d8298542b27b3e9c7a3396c23444223c
27/07/2009 10:46:44 a.m. GPUGRID [error] Checksum or signature error for 77-GIANNI_BINDX119-29-par_file
plus other strange behavoiur.
is it all sorted out now?
Ross
____________

zpm
Avatar
Send message
Joined: 2 Mar 09
Posts: 159
Credit: 13,639,818
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 11364 - Posted: 27 Jul 2009 | 9:24:37 UTC - in response to Message 11363.
Last modified: 27 Jul 2009 | 9:40:16 UTC

7/27/2009 5:21:29 AM GPUGRID [error] File 35-KASHIF_HIVPR_dim_ba3-26-35-KASHIF_HIVPR_dim_ba3-25-100-RND7138_1 has wrong size: expected 1210492, got 0
7/27/2009 5:21:29 AM GPUGRID Started download of 35-KASHIF_HIVPR_dim_ba3-26-35-KASHIF_HIVPR_dim_ba3-25-100-RND7138_1
7/27/2009 5:21:29 AM GPUGRID [error] File 35-KASHIF_HIVPR_dim_ba3-26-35-KASHIF_HIVPR_dim_ba3-25-100-RND7138_2 has wrong size: expected 1210492, got 0


i know you guys are working on it... am i'm about to try to get new work, i knew something was wrong with my boinc as my stomach got me up, 5 hrs early.

as of 5:30 am est, it's still not dling. and my stomach is being fed. pizza hut lasagna; hey, 10 minutes later, 1 task got through. i guess gpugrid got jealous of seti@home.

Profile [AF>HFR>RR] Jim PROFIT
Send message
Joined: 3 Jun 07
Posts: 107
Credit: 30,296,137
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 11367 - Posted: 27 Jul 2009 | 10:08:10 UTC

And again this morning, but just on one host! Always the same.
Can't monitor all the time.

Please do something.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 748
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 11370 - Posted: 27 Jul 2009 | 10:31:59 UTC - in response to Message 11367.
Last modified: 27 Jul 2009 | 10:36:38 UTC

We cancelled the faulty WUs. Hopefully the change propagates fast to your clients.

BTW, if someone has still faulty downloads, or notices that the faulty DLs were removed without intervention, can you please report here, so that we can figure out how quickly clients get informed of such things? Thanks

zpm
Avatar
Send message
Joined: 2 Mar 09
Posts: 159
Credit: 13,639,818
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 11371 - Posted: 27 Jul 2009 | 10:36:32 UTC - in response to Message 11370.

yep, change seems to have fixed the servers...

Ross*
Send message
Joined: 6 May 09
Posts: 34
Credit: 442,860,201
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11373 - Posted: 27 Jul 2009 | 11:05:58 UTC - in response to Message 11370.

have 2 WUS almosted completed
test will be when they are replaced
hopefully the Ge Fouce will conqueror
Cheers
Ross
____________

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,627,401,504
RAC: 388,497
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11415 - Posted: 28 Jul 2009 | 5:45:42 UTC

This would have to happen while I'm on vacation. Just got home to find 2 GPUs stuck on these bad WUs :-(

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 748
Credit: 4,285,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 11444 - Posted: 28 Jul 2009 | 17:00:08 UTC - in response to Message 11415.

Uhm.. that means that the clients do not really obey cancellation requests...

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,627,401,504
RAC: 388,497
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11446 - Posted: 28 Jul 2009 | 18:08:39 UTC
Last modified: 28 Jul 2009 | 18:10:33 UTC

I had to cancel them both manually. They probably didn't cancel because they were stuck with download errors. It's way worse than a normally bad WU though because they took the GPUs out of action until I got home to intervene.

fractal
Send message
Joined: 16 Aug 08
Posts: 87
Credit: 989,788,876
RAC: 1,199,586
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11447 - Posted: 28 Jul 2009 | 19:02:26 UTC - in response to Message 11446.
Last modified: 28 Jul 2009 | 19:20:25 UTC

Ok, from my log files


7/26/2009 12:03:03 AM|GPUGRID|Sending scheduler request: To report completed tasks. Requesting 82299 seconds of work, reporting 3 completed tasks
7/26/2009 12:03:08 AM|GPUGRID|Scheduler request completed: got 1 new tasks
7/26/2009 12:03:10 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE
7/26/2009 12:03:10 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-COPYRIGHT
7/26/2009 12:03:11 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE: HTTP error
7/26/2009 12:03:11 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE
7/26/2009 12:03:11 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-COPYRIGHT: HTTP error
7/26/2009 12:03:11 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-COPYRIGHT
7/26/2009 12:03:11 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_1
7/26/2009 12:03:11 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_2
7/26/2009 12:03:13 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_1: HTTP error
7/26/2009 12:03:13 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_1
7/26/2009 12:03:13 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_2: HTTP error
7/26/2009 12:03:13 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_2
7/26/2009 12:03:13 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_3
7/26/2009 12:03:13 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-pdb_file
7/26/2009 12:03:14 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_3: HTTP error
7/26/2009 12:03:14 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_3
7/26/2009 12:03:14 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-pdb_file: HTTP error
7/26/2009 12:03:14 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-pdb_file
7/26/2009 12:03:14 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-psf_file
7/26/2009 12:03:14 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-par_file
7/26/2009 12:03:14 AM|Docking@Home|Sending scheduler request: To fetch work. Requesting 120956 seconds of work, reporting 0 completed tasks
7/26/2009 12:03:15 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-psf_file: HTTP error
7/26/2009 12:03:15 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-psf_file
7/26/2009 12:03:15 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-myfile.enc
7/26/2009 12:03:17 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-myfile.enc: HTTP error
7/26/2009 12:03:17 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-myfile.enc
7/26/2009 12:03:18 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-par_file: HTTP error
7/26/2009 12:03:18 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-par_file


my logfile is full of messages like


7/28/2009 12:06:37 PM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE
7/28/2009 12:06:38 PM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE: HTTP error
7/28/2009 12:06:38 PM|GPUGRID|Backing off 1 hr 22 min 54 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE


taken just now.

I went in to abort the work units, but they were not on the Tasks page, so I manually aborted the transfers. That seemed to clean things up.

Interestingly, my quad core linux box got

07/28/09 12:21:48|GPUGRID|Sending scheduler request: To fetch work. Requesting 222626 seconds of work, reporting 0 completed tasks
07/28/09 12:21:58|GPUGRID|Scheduler request completed: got 5 new tasks
07/28/09 12:22:00|GPUGRID|Started download of acemd_6.66_x86_64-pc-linux-gnu__cuda
07/28/09 12:22:00|GPUGRID|Started download of libcufft.so.2.1
07/28/09 12:22:48|GPUGRID|Finished download of libcufft.so.2.1
07/28/09 12:22:48|GPUGRID|Started download of libcudart.so.2.1


5 tasks for one GPU? Come on, it was bad enough when it gave me 4, but 5??
____________

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 14,154,874,863
RAC: 5,906,883
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11491 - Posted: 30 Jul 2009 | 6:44:37 UTC

Yet more failed downloads, starting from 29th 22:22hrs. Really is a pita! That's using client 6.6.36

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1914
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11492 - Posted: 30 Jul 2009 | 7:46:15 UTC - in response to Message 11491.

there might be some WUs which survived the cancellation from the server.
gdf

Post to thread

Message boards : Server and website : http error with HIV workunits