Advanced search

Message boards : Number crunching : have a lot of stuck tasks, abort some?

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 164
Credit: 2,798,610,332
RAC: 1,394,838
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53140 - Posted: 27 Nov 2019 | 4:34:55 UTC
Last modified: 27 Nov 2019 | 5:05:38 UTC

have never seen this many before other than when my router was turned off. Other systems are running fine even those with gpugrid tasks. Looks like all tasks completed just fine, no error, but cannot upload. A restart of boinc did not help.

GPUGRID initial_1344-ELISA_GSN4V1-8-100-RND6294_0_1 15.093 3817.50 K 00:23:01 - 16:48:28 0.00 Kbps Upload pending (Retry in: 02:31:01), retried: 8 JYSArea51
GPUGRID initial_1344-ELISA_GSN4V1-8-100-RND6294_0_2 20.116 3817.50 K 00:26:10 0.71 Kbps Uploading JYSArea51
GPUGRID initial_1344-ELISA_GSN4V1-8-100-RND6294_0_9 0.850 67761.89 K 00:22:58 - 17:47:06 0.00 Kbps Upload pending (Retry in: 03:29:39), retried: 8 JYSArea51
GPUGRID initial_1381-ELISA_GSN0V1-9-100-RND4251_0_1 1.683 3816.54 K 00:02:34 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1381-ELISA_GSN0V1-9-100-RND4251_0_2 1.683 3816.54 K 00:02:34 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1381-ELISA_GSN0V1-9-100-RND4251_0_9 0.094 68042.38 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1512-ELISA_GSN0V1-6-100-RND5965_0_1 3.359 3816.54 K 00:05:13 0.00 Kbps Upload pending, retried: 2 JYSArea51
GPUGRID initial_1512-ELISA_GSN0V1-6-100-RND5965_0_2 3.359 3816.54 K 00:05:12 0.00 Kbps Upload pending, retried: 2 JYSArea51
GPUGRID initial_1512-ELISA_GSN0V1-6-100-RND5965_0_9 0.094 67987.78 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1719-ELISA_GSN0V1-5-100-RND4368_0_1 1.683 3816.54 K 00:02:37 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1719-ELISA_GSN0V1-5-100-RND4368_0_2 1.683 3816.54 K 00:02:35 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1719-ELISA_GSN0V1-5-100-RND4368_0_9 0.095 67536.57 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test265-TONI_GSNTEST3-11-100-RND0660_0_1 1.682 3817.50 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test265-TONI_GSNTEST3-11-100-RND0660_0_2 1.682 3817.50 K 00:02:34 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test265-TONI_GSNTEST3-11-100-RND0660_0_9 0.094 68081.72 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test360-TONI_GSNTEST3-6-100-RND5366_0_1 18.440 3817.50 K 00:23:32 0.71 Kbps Uploading JYSArea51
GPUGRID test360-TONI_GSNTEST3-6-100-RND5366_0_2 10.064 3817.50 K 00:15:35 0.00 Kbps Upload pending, retried: 6 JYSArea51
GPUGRID test360-TONI_GSNTEST3-6-100-RND5366_0_9 0.470 68081.16 K 00:12:57 0.00 Kbps Upload pending, retried: 5 JYSArea51


[EDIT] reboot of windows started things going. I suspect the first 67mb files caused a problem which was compouned by subsequent ones of same size all trying to upload concurrently. Need to figure a was to stop this. Have three 1070ti boards but network cant seem to handle the large files when all get done near same time.

In other news I got my first Linux cuda100. It is running on gtx 1660ti.

captainjack
Send message
Joined: 9 May 13
Posts: 160
Credit: 1,221,230,882
RAC: 16,853
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 53155 - Posted: 27 Nov 2019 | 13:49:14 UTC - in response to Message 53140.

JStateson wrote:

I suspect the first 67mb files caused a problem which was compouned by subsequent ones of same size all trying to upload concurrently. Need to figure a was to stop this.


Just a thought, in the cc_config.xml file there is an option for
<max_file_xfers_per_project>N</max_file_xfers_per_project>
.
Maybe that would help.

klepel
Send message
Joined: 23 Dec 09
Posts: 178
Credit: 3,040,852,052
RAC: 1,532,010
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53157 - Posted: 27 Nov 2019 | 15:32:13 UTC

I am not alone! You are not alone!

I observe also slow uploads of the finish WUs on all my computers. BOINC reports about 6 KBps.

I do have particularly problems with this computer: http://www.gpugrid.net/show_host_detail.php?hostid=512293 Uploads stall for hours! Yes, the computer has also some climateprediction.net files to upload, but other projects have no problems to upload and download with faster speeds.

It reminds me of the bandwidth problems GRIDCOIN had with their IT department (not giving sufficient bandwidth to GPUGRID) years ago. Might somebody from the project look into it. Make the WUs longer so the server does not get hammered by so many computers at the same time?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 500
Credit: 479,234,119
RAC: 1,520,642
Level
Gln
Scientific publications
wat
Message 53160 - Posted: 27 Nov 2019 | 16:16:43 UTC

Since I run multiple projects on the same hosts, I need to provide sufficient network communication threads for all the uploads/downloads.

Does not help I have asymmetrical upload/downloads speeds because of ADSL2. My 1Mbps upload link is not big enough to handle the large result files from GPUGrid without some strain. Not having any issues uploading though as long as the project servers are accepting connection.

I know that they will take at least a half hour to upload. I use these parameters in cc_config.xml

<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>

klepel
Send message
Joined: 23 Dec 09
Posts: 178
Credit: 3,040,852,052
RAC: 1,532,010
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53162 - Posted: 27 Nov 2019 | 18:19:26 UTC - in response to Message 53160.

I use these parameters in cc_config.xml
<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>


So the cc_config.xml would like look like:
<cc_config>
<options>
<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
</options>
</cc_config>

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 164
Credit: 2,798,610,332
RAC: 1,394,838
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53164 - Posted: 27 Nov 2019 | 20:01:58 UTC

All tasks finally uploaded after a reboot

Need to configure that max allowable number of transfers


Thanks!

Profile Dingo
Avatar
Send message
Joined: 1 Nov 07
Posts: 20
Credit: 74,630,803
RAC: 357,625
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53245 - Posted: 1 Dec 2019 | 10:39:07 UTC

I have a few tasks that have not uploaded for a while and all have "Upload Pending Project Backoff" Do I just let them sit there and wait till they upload. I have tried stopping and starting BOINC but that did not fix it.

GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_0 1.054 10.65 K 00:00:20 - 15:10:42 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_1 0.003 3816.54 K 00:00:18 - 14:50:55 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_2 0.003 3816.54 K 00:00:11 - 12:41:05 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_9 0.000 68065.03 K 00:00:07 - 12:29:29 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_10 100.000 0.27 K 00:00:39 - 12:37:38 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_0 1.057 10.62 K 00:00:42 - 12:39:17 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_1 0.003 3816.54 K 00:00:22 - 12:24:37 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_2 0.003 3816.54 K 00:00:21 - 12:20:38 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_9 0.000 68067.50 K 00:00:04 - 12:08:51 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_10 100.000 0.27 K 00:00:03 - 06:35:08 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_0 1.022 10.99 K 00:00:14 - 08:49:10 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_1 0.003 3816.54 K 00:00:16 - 07:01:25 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_2 0.003 3816.54 K 00:00:11 - 07:53:31 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_9 0.000 68066.14 K 00:00:08 - 06:20:56 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_10 100.000 0.27 K 00:00:07 - 06:05:22 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_0 1.031 10.90 K 00:00:20 - 16:13:14 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_1 0.003 3817.50 K 00:00:20 - 14:14:37 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_2 0.003 3817.50 K 00:00:11 - 13:01:50 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_9 0.000 68080.19 K 00:00:10 - 12:35:48 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_10 100.000 0.27 K 00:00:07 - 11:41:37 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3

stiwi
Send message
Joined: 18 Jun 12
Posts: 1
Credit: 74,611,751
RAC: 4,962
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 53246 - Posted: 1 Dec 2019 | 10:45:48 UTC - in response to Message 53245.

01.12.2019 10:51:50 | GPUGRID | [error] Error reported by file upload server: Server is out of disk space


We have to wait until they fix it :)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 195
Credit: 1,429,675,986
RAC: 783,187
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53247 - Posted: 1 Dec 2019 | 10:46:33 UTC - in response to Message 53245.

This is being treated on this other thread:
http://www.gpugrid.net/forum_thread.php?id=5027

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 947
Credit: 4,353,973
RAC: 930
Level
Ala
Scientific publications
watwatwatwat
Message 53250 - Posted: 1 Dec 2019 | 13:44:56 UTC - in response to Message 53247.

Please be patient, no need to abort

Post to thread

Message boards : Number crunching : have a lot of stuck tasks, abort some?