Advanced search

Message boards : News : Server problems

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57135 - Posted: 5 Jul 2021 | 9:00:34 UTC

We are experiencing multiple issues, e.g. hitting an undocumented upload size limit, and others. Please be patient.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57137 - Posted: 5 Jul 2021 | 9:07:36 UTC - in response to Message 57135.

https://blog.hubspot.com/website/413-request-entity-too-large?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1064
Credit: 40,231,533,983
RAC: 55,339
Level
Trp
Scientific publications
wat
Message 57141 - Posted: 5 Jul 2021 | 12:36:35 UTC - in response to Message 57135.

Thanks :)
____________

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 747,645,933
RAC: 51,513
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57155 - Posted: 5 Jul 2021 | 18:39:49 UTC - in response to Message 57135.

We are experiencing multiple issues, e.g. hitting an undocumented upload size limit, and others. Please be patient.

Have you tried file compression, or splitting the large file into multiple pieces that will be combined after they are unloaded?

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 57165 - Posted: 6 Jul 2021 | 10:08:52 UTC

Is this project still going? I rarely get work units anymore.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57397 - Posted: 30 Sep 2021 | 16:24:14 UTC - in response to Message 57137.

https://blog.hubspot.com/website/413-request-entity-too-large?

And another:

Thu 30 Sep 2021 17:11:04 BST | GPUGRID | Started upload of e2s116_e1s44p0f981-ADRIA_AdB_KIXCMYB_HIP-0-2-RND2716_2_9
Thu 30 Sep 2021 17:11:05 BST | GPUGRID | [http] [ID#7007] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
Thu 30 Sep 2021 17:11:05 BST | GPUGRID | [http] [ID#7007] Sent header to server: Content-Length: 540810054
Thu 30 Sep 2021 17:11:06 BST | GPUGRID | [http] [ID#7007] Received header from server: HTTP/1.1 413 Request Entity Too Large




Task 32645986

You'll need to re-configure Apache.

Diplomat
Send message
Joined: 1 Sep 10
Posts: 15
Credit: 661,399,648
RAC: 29,782
Level
Lys
Scientific publications
watwatwat
Message 57398 - Posted: 30 Sep 2021 | 16:32:15 UTC

+1 can't upload 27079276

mrchips
Send message
Joined: 9 May 21
Posts: 16
Credit: 1,275,555,500
RAC: 3,069,506
Level
Met
Scientific publications
wat
Message 57399 - Posted: 30 Sep 2021 | 16:50:14 UTC

i'm getting peer certificates cannot be authticated with given CA certificates.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 57400 - Posted: 30 Sep 2021 | 17:02:04 UTC

Also seeing message:
9/30/2021 10:55:06 AM | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates

That message jogged a memory.
Could it be another license expiry like last year about this time?
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57401 - Posted: 30 Sep 2021 | 17:27:07 UTC - in response to Message 57400.

I doubt it - the licence expiry related to the science application (ACEMD), and the Peer certificate relates to https internet traffic.

But I've just dealt with a Peer certificate error reported for two other projects, on the BOINC web site. So something might be brewing.

Lo and behold, I've got

30/09/2021 18:09:14 | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates

on this machine, and I've got a task just 7.5% short of finishing. Alarm bells are starting to ring.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,042,942,459
RAC: 15,385,321
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57402 - Posted: 30 Sep 2021 | 17:28:15 UTC - in response to Message 57400.

Nope. Not that issue. I already pinged Toni about the license renewal.
He said this year it is not his responsibility.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,042,942,459
RAC: 15,385,321
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57403 - Posted: 30 Sep 2021 | 17:32:33 UTC

Went back through my BOINC logs and my GPUGrid scheduler connections have been working fine and normal.

Wonder what is different with your two hosts?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57404 - Posted: 30 Sep 2021 | 18:13:00 UTC - in response to Message 57403.
Last modified: 30 Sep 2021 | 18:18:49 UTC

1) This problem only started happening today - I'll try to find a start time.
2) It only affects Windows hosts - Linux is not affected.

Hourly contact at 14:21 today was good, failed at 15:22 - times are British, UTC+1

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 74,786,176
RAC: 500,630
Level
Thr
Scientific publications
wat
Message 57405 - Posted: 30 Sep 2021 | 18:23:30 UTC
Last modified: 30 Sep 2021 | 18:30:35 UTC

I do see the same issue on my Windows host as well. Just happened a few hours ago.

Edit: Adding the project on a Win machine via the BOINC manager also temporarily fails.

Steve Jones
Send message
Joined: 28 Oct 18
Posts: 3
Credit: 70,648,040
RAC: 0
Level
Thr
Scientific publications
wat
Message 57406 - Posted: 30 Sep 2021 | 20:49:32 UTC

If the project is using LetsEncrypt, then apparently there's some fairly widespread issues with that at the moment.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 74,786,176
RAC: 500,630
Level
Thr
Scientific publications
wat
Message 57407 - Posted: 30 Sep 2021 | 21:27:17 UTC

Turned out that an expired https certificate was the culprit. Workarounds as suggested on the main BOINC or WUprop forum worked fine and behavior is back to normal for me.

bibi
Send message
Joined: 4 May 17
Posts: 14
Credit: 13,301,744,643
RAC: 39,333,664
Level
Trp
Scientific publications
watwatwatwatwat
Message 57414 - Posted: 1 Oct 2021 | 11:50:14 UTC
Last modified: 1 Oct 2021 | 11:54:04 UTC

For Windows:
- make a backup from "C:\Program Files\BOINC\ca-bundle.crt"
- open it as administrator in an editor
it is a text file
- delete "DST Root CA X3" until "-----END CERTIFICATE-----"
- restart boinc
the needed "ISRG Root X1" is already contained


see https://boinc.berkeley.edu/forum_thread.php?id=14413
and https://letsencrypt.org/certificates/
____________

jjch
Send message
Joined: 10 Nov 13
Posts: 100
Credit: 15,444,100,388
RAC: 832,992
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57433 - Posted: 3 Oct 2021 | 18:07:12 UTC
Last modified: 3 Oct 2021 | 18:52:24 UTC

I just tried this on one of my hosts and the upload started up right away.

I copied the edited ca-bundle file to another system and that seems to work too.

Need to go through the rest of them and copy the updated file now.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57449 - Posted: 4 Oct 2021 | 12:40:09 UTC - in response to Message 57433.
Last modified: 4 Oct 2021 | 12:40:48 UTC

Seems an instance of https://it.slashdot.org/story/21/10/02/2318202/millions-experience-browser-problems-after-long-anticipated-expiration-of-lets-encrypt-certificate

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57450 - Posted: 4 Oct 2021 | 13:08:09 UTC - in response to Message 57449.

Yes - and the Windows version on BOINC wasn't ready to handle it. The secondary problem was BOINC's continued use of an old version of OpenSSL, long deprecated.

There's some hope of an emergency release of a replacement Windows client within (possibly) 24 hours. I've also been kicking some butt at GitHub, and I've extracted a sort-of promise that there will be an improved version of BOINC across all platforms by mid-December.

mrchips
Send message
Joined: 9 May 21
Posts: 16
Credit: 1,275,555,500
RAC: 3,069,506
Level
Met
Scientific publications
wat
Message 57462 - Posted: 4 Oct 2021 | 17:44:22 UTC

I can't make a copy of the .crt / security file let alone edit it as a test file. OF course I have all the permissions.

Have not been able to get tasks for 4 days

____________

Profile phi1258
Send message
Joined: 30 Jul 16
Posts: 4
Credit: 1,555,158,536
RAC: 4
Level
His
Scientific publications
watwatwatwatwatwatwatwat
Message 57470 - Posted: 4 Oct 2021 | 22:35:57 UTC - in response to Message 57414.

Bibi, thanks for the workaround. Works like a charm.

For Windows:
- make a backup from "C:\Program Files\BOINC\ca-bundle.crt"
- open it as administrator in an editor
it is a text file
- delete "DST Root CA X3" until "-----END CERTIFICATE-----"
- restart boinc
the needed "ISRG Root X1" is already contained


see https://boinc.berkeley.edu/forum_thread.php?id=14413
and https://letsencrypt.org/certificates/


____________

candido
Send message
Joined: 12 Jun 11
Posts: 12
Credit: 150,069,999
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57483 - Posted: 5 Oct 2021 | 9:51:59 UTC - in response to Message 57414.

Thanks bibi
Solved my problem immediately!
Candido
____________

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,018,279,946
RAC: 10,549,467
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57587 - Posted: 12 Oct 2021 | 7:42:29 UTC - in response to Message 57414.

For Windows:
- make a backup from "C:\Program Files\BOINC\ca-bundle.crt"
- open it as administrator in an editor
it is a text file
- delete "DST Root CA X3" until "-----END CERTIFICATE-----"
- restart boinc
the needed "ISRG Root X1" is already contained

This didn't help. Last succesfull contact from a Windows host with project servers was on Sep 30. Linux hosts work fine.

Log file on a Win10 host says:


...
12.10.2021 10.26.04 | GPUGRID | update requested by user
12.10.2021 10.26.05 | GPUGRID | Fetching scheduler list
12.10.2021 10.26.07 | | Project communication failed: attempting access to reference site
12.10.2021 10.26.08 | | Internet access OK - project servers may be temporarily down.
...

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57589 - Posted: 12 Oct 2021 | 8:32:30 UTC - in response to Message 57587.
Last modified: 12 Oct 2021 | 8:33:47 UTC

Fixing the ca-bundle.crt file is certainly a cure for the current problem.

1) An alternative way of fixing it is to download a whole new file from the BOINC workround thread.
2) Make sure that you place the replacement in the correct place for your system. The location C:\Program Files\BOINC\ca-bundle.crt is BOINC's default on Windows, but you had the choice during installation to put it anywhere else you wanted.

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,018,279,946
RAC: 10,549,467
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57595 - Posted: 13 Oct 2021 | 9:00:14 UTC - in response to Message 57589.

Fixing the ca-bundle.crt file is certainly a cure for the current problem.

1) An alternative way of fixing it is to download a whole new file from the BOINC workround thread.
2) Make sure that you place the replacement in the correct place for your system. The location C:\Program Files\BOINC\ca-bundle.crt is BOINC's default on Windows, but you had the choice during installation to put it anywhere else you wanted.

Unfortunately this didn't help either. Boinc client seems to be installed to default location c:\program files\boinc since TaskManager shows boinc.exe being executed at that location. I replaced c:\program files\boinc\ca-bundle.crt file with the one from BOINC workround and rebooted, but no dice.

That same Win10 host is currently running WCG and Rosetta tasks, so I believe its network connection is basically ok.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57596 - Posted: 13 Oct 2021 | 9:26:00 UTC - in response to Message 57595.

OK, it seems to be a local problem specific to your machine, rather than the general problem affecting all Windows users.

You could get more detailed information by enabling 'http_debug' logging in the Event Log. Set that in BOINC Manager Options -> Event Log options: update the project: turn off the option again. (It produces a lot of output, so unwise to leave it running continuously)

See if you can see the cause of the problem, or post the output relating to GPUGrid here and we can take a look.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 337
Credit: 7,175,151,065
RAC: 17,333,088
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57601 - Posted: 13 Oct 2021 | 10:45:12 UTC

I've had to remove the project and re-attach even after replacing the crt file. Others have had to do the same.

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,018,279,946
RAC: 10,549,467
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57602 - Posted: 13 Oct 2021 | 17:52:55 UTC - in response to Message 57601.

I've had to remove the project and re-attach even after replacing the crt file. Others have had to do the same.

Ok, thanks. I'll give that a try tomorrow.

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,018,279,946
RAC: 10,549,467
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57603 - Posted: 13 Oct 2021 | 18:04:37 UTC - in response to Message 57596.

You could get more detailed information by enabling 'http_debug' logging in the Event Log. Set that in BOINC Manager Options -> Event Log options: update the project: turn off the option again. (It produces a lot of output, so unwise to leave it running continuously)

Thanks, I'll check first what extra information that gives.

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,018,279,946
RAC: 10,549,467
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57604 - Posted: 14 Oct 2021 | 12:27:09 UTC - in response to Message 57596.
Last modified: 14 Oct 2021 | 12:28:52 UTC

You could get more detailed information by enabling 'http_debug' logging in the Event Log. Set that in BOINC Manager Options -> Event Log options: update the project: turn off the option again. (It produces a lot of output, so unwise to leave it running continuously)

See if you can see the cause of the problem, or post the output relating to GPUGrid here and we can take a look.

Hello.

Here's what http_debug logging option wrote to logfile:


14.10.2021 15.15.29 | | Re-reading cc_config.xml
14.10.2021 15.15.29 | | Not using a proxy
14.10.2021 15.15.29 | | log flags: file_xfer, sched_ops, task, http_debug
14.10.2021 15.15.35 | GPUGRID | update requested by user
14.10.2021 15.15.36 | | [http] HTTP_OP::init_get(): https://www.gpugrid.net/notices.php?userid=163138&auth=xxxxxxxxxxxxxxx_censored_xxxxxxxxxxxxxx
14.10.2021 15.15.36 | | [http] HTTP_OP::libcurl_exec(): ca-bundle set
14.10.2021 15.15.37 | | [http] [ID#0] Info: Connection 134 seems to be dead!
14.10.2021 15.15.37 | | [http] [ID#0] Info: Closing connection 134
14.10.2021 15.15.37 | | [http] [ID#0] Info: timeout on name lookup is not supported
14.10.2021 15.15.37 | | [http] [ID#0] Info: Hostname was NOT found in DNS cache
14.10.2021 15.15.37 | | [http] [ID#0] Info: Trying 84.89.134.145...
14.10.2021 15.15.37 | | [http] [ID#0] Info: Connected to www.gpugrid.net (84.89.134.145) port 443 (#135)
14.10.2021 15.15.37 | | [http] [ID#0] Info: successfully set certificate verify locations:
14.10.2021 15.15.37 | | [http] [ID#0] Info: CAfile: C:\Program Files\BOINC\ca-bundle.crt
14.10.2021 15.15.37 | | [http] [ID#0] Info: CApath: none
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv3, TLS Unknown, Unknown (22):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv3, TLS handshake, Client hello (1):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv2, Unknown (22):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv3, TLS handshake, Server hello (2):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv2, Unknown (22):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv3, TLS handshake, CERT (11):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv2, Unknown (21):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSLv3, TLS alert, Server hello (2):
14.10.2021 15.15.37 | | [http] [ID#0] Info: SSL certificate problem: unable to get local issuer certificate
14.10.2021 15.15.37 | | [http] [ID#0] Info: Closing connection 135
14.10.2021 15.15.37 | | [http] HTTP error: Peer certificate cannot be authenticated with given CA certificates
14.10.2021 15.15.39 | GPUGRID | [http] HTTP_OP::init_get(): https://www.gpugrid.net/
14.10.2021 15.15.39 | GPUGRID | [http] HTTP_OP::libcurl_exec(): ca-bundle set
14.10.2021 15.15.39 | GPUGRID | Fetching scheduler list
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: timeout on name lookup is not supported
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: Hostname was found in DNS cache
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: Trying 84.89.134.145...
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: Connected to www.gpugrid.net (84.89.134.145) port 443 (#136)
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: successfully set certificate verify locations:
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: CAfile: C:\Program Files\BOINC\ca-bundle.crt
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: CApath: none
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv3, TLS Unknown, Unknown (22):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv3, TLS handshake, Client hello (1):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv2, Unknown (22):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv3, TLS handshake, Server hello (2):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv2, Unknown (22):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv3, TLS handshake, CERT (11):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv2, Unknown (21):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSLv3, TLS alert, Server hello (2):
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSL certificate problem: unable to get local issuer certificate
14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: Closing connection 136
14.10.2021 15.15.40 | GPUGRID | [http] HTTP error: Peer certificate cannot be authenticated with given CA certificates
14.10.2021 15.15.41 | | Project communication failed: attempting access to reference site
14.10.2021 15.15.41 | | [http] HTTP_OP::init_get(): http://www.google.com/
14.10.2021 15.15.41 | | [http] HTTP_OP::libcurl_exec(): ca-bundle set
14.10.2021 15.15.42 | | [http] [ID#0] Info: timeout on name lookup is not supported
14.10.2021 15.15.42 | | [http] [ID#0] Info: Hostname was NOT found in DNS cache
14.10.2021 15.15.42 | | [http] [ID#0] Info: Trying 142.250.74.36...
14.10.2021 15.15.42 | | [http] [ID#0] Info: Connected to www.google.com (142.250.74.36) port 80 (#137)
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: GET / HTTP/1.1
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.9)
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: Host: www.google.com
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: Accept: */*
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server: Accept-Language: fi_FI
14.10.2021 15.15.42 | | [http] [ID#0] Sent header to server:
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: HTTP/1.1 200 OK
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Date: Thu, 14 Oct 2021 12:15:42 GMT
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Expires: -1
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Cache-Control: private, max-age=0
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Content-Type: text/html; charset=ISO-8859-1
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Content-Encoding: gzip
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Server: gws
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Content-Length: 6541
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: X-XSS-Protection: 0
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: X-Frame-Options: SAMEORIGIN
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server: Set-Cookie: NID=511=rNZDMOeL-yoBDIxpS-MwJhLLHVbgNw8z6PNKY_0SER61QEYVR9fr5LRnkfMy59elOCzNmteggAsEiyy_ihMnRzYIxFPpHbp0wDnmDuX__eFZPRz-XBkqgysOUBBHtzxRW0gT0hLCYNtUn0QIqz1wp5dHbA0Ujw_Mi0TgLIsrzOE; expires=Fri, 15-Apr-2022 12:15:42 GMT; path=/; domain=.google.com; HttpOnly
14.10.2021 15.15.42 | | [http] [ID#0] Received header from server:
14.10.2021 15.15.42 | | [http] [ID#0] Info: Connection #137 to host www.google.com left intact
14.10.2021 15.15.42 | | Internet access OK - project servers may be temporarily down.
14.10.2021 15.15.47 | | Re-reading cc_config.xml
14.10.2021 15.15.47 | | Not using a proxy
14.10.2021 15.15.47 | | log flags: file_xfer, sched_ops, task

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57606 - Posted: 14 Oct 2021 | 13:59:25 UTC - in response to Message 57604.

Yup, that's the one.

14.10.2021 15.15.40 | GPUGRID | [http] [ID#1] Info: SSL certificate problem: unable to get local issuer certificate
14.10.2021 15.15.40 | GPUGRID | [http] HTTP error: Peer certificate cannot be authenticated with given CA certificates

The certificates you're offering (from BOINC) can't get it together with the certificates being offered by GPUGrid.

The quickest solution is to give BOINC a better set of certificates to offer. Edit the bundle, use somebody else's hacked version, or use the newer version available on the BOINC message board.

The 'local problem' I suggested earlier might simply be that some versions of Windows, or some local installations, cache frequently-used files and don't notice that the original source has changed.

You'll have to work through that one on your own system, I'm afraid - the simple file replacement worked on all my machines, without any further action.

Steps I'd suggest, in order:
1) Update the file
2) Check that the file has really updated - not been blocked by an anti-virus program.
3) Restart BOINC
4) Restart the computer
5) Detach from GPUGrid and re-attach (it worked for mmonnin, but I can't see why)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57607 - Posted: 14 Oct 2021 | 15:10:30 UTC - in response to Message 57587.

For Windows:
- make a backup from "C:\Program Files\BOINC\ca-bundle.crt"
- open it as administrator in an editor
it is a text file
- delete "DST Root CA X3" until "-----END CERTIFICATE-----"
- restart boinc
the needed "ISRG Root X1" is already contained

This didn't help. Last succesfull contact from a Windows host with project servers was on Sep 30. Linux hosts work fine.
The key of successfully modifing a file in the "C:\Program Files" folder is to do it on elevated privilege level (using administrator access rights), otherwise you can't write/modify files in the "C:\Program Files" folder.
To achieve this, copy the following to your cliboard:
notepad "C:\Program Files\BOINC\ca-bundle.crt"
Press Windows key + R (it brings up the "run" dialog box)
Paste the text into the input field
Press CTRL+SHIFT+ENTER, then click yes in the "User Account Control" dialog box
Delete the unnecessary certificate
Exit notepad with saving changes
Restart BOINC

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57629 - Posted: 16 Oct 2021 | 23:32:46 UTC
Last modified: 16 Oct 2021 | 23:38:44 UTC

Have anyone tried BOINC manager v7.16.20?
Is this version contain the updated certificates?

---- EDIT ----
I did.
This version has the updated certificate file.

jjch
Send message
Joined: 10 Nov 13
Posts: 100
Credit: 15,444,100,388
RAC: 832,992
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57630 - Posted: 17 Oct 2021 | 16:43:55 UTC

Thanks for the update Retvari! I manually edited and replaced all the certificates on my Win systems earlier. When I have time I will start testing v7.16.20.

[CSF] Aleksey Belkov
Send message
Joined: 26 Dec 13
Posts: 86
Credit: 1,258,731,270
RAC: 480,669
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57633 - Posted: 18 Oct 2021 | 1:50:32 UTC
Last modified: 18 Oct 2021 | 1:51:10 UTC

As always you can get actual CA bundle at curl project site:

https://curl.se/docs/caextract.html

Or use direct url to latest CA bundle:

https://curl.se/ca/cacert.pem

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,018,279,946
RAC: 10,549,467
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57634 - Posted: 19 Oct 2021 | 8:57:25 UTC - in response to Message 57606.
Last modified: 19 Oct 2021 | 8:59:36 UTC

Steps I'd suggest, in order:
1) Update the file
2) Check that the file has really updated - not been blocked by an anti-virus program.
3) Restart BOINC
4) Restart the computer
5) Detach from GPUGrid and re-attach (it worked for mmonnin, but I can't see why)

I reloaded the cert bundle from BOINC message board and copied it again over the previous one with elevated privileges (CMD.EXE started as an Administrator), but no dice this time either. I didn't find any evidence about that host using the old cert bundle dated somewhere 2015 (perhaps BOINC being installed then).

I didn't detach/reattach from project at this point. I wanted to first try to update the BOINC manager.

Have anyone tried BOINC manager v7.16.20?
Is this version contain the updated certificates?

---- EDIT ----
I did.
This version has the updated certificate file.

This solved the problem. Now that host is again communicating with GPUGRID.

Many thanks to all of you helping to sort this out :)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57635 - Posted: 19 Oct 2021 | 11:23:55 UTC - in response to Message 57634.

Glad you got it sorted in the end.

Just for the record, BOINC v7.16.20 is an update for the whole BOINC package - Manager, client, and all the twiddly bits that glue it all together. It will be the client update that solved this particular problem - the Manager wasn't implicated this time.

jjch
Send message
Joined: 10 Nov 13
Posts: 100
Credit: 15,444,100,388
RAC: 832,992
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57641 - Posted: 23 Oct 2021 | 3:59:20 UTC

Version 7.16.20 seems to be working well on the Windows clients.

KLiK
Send message
Joined: 18 Nov 14
Posts: 9
Credit: 215,167,121
RAC: 625
Level
Leu
Scientific publications
wat
Message 57774 - Posted: 7 Nov 2021 | 21:52:01 UTC - in response to Message 57414.

For Windows:
- make a backup from "C:\Program Files\BOINC\ca-bundle.crt"
- open it as administrator in an editor
it is a text file
- delete "DST Root CA X3" until "-----END CERTIFICATE-----"
- restart boinc
the needed "ISRG Root X1" is already contained


see https://boinc.berkeley.edu/forum_thread.php?id=14413
and https://letsencrypt.org/certificates/

Tried this one...if you do, GPUgrid will not work - but neither would WCG. 🤦‍♂️🙄

So fixed it back with .old certificate & now WCG works.

But also, GPUgrid reported to have OLD certificates on Chrome. & after removing, GPUgrid would not connect again.

Admins & techs, when will this be fixed?
____________


non-profit org. Play4Life in Zagreb, Croatia, EU

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,042,942,459
RAC: 15,385,321
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57775 - Posted: 8 Nov 2021 | 2:18:22 UTC

Just update the entire BOINC package. Don't bother with the certificate file edit when the entire new package just works. BOINC version 7.16.20

https://boinc.berkeley.edu/download_all.php

There is nothing wrong with any of the projects if you just fix the BOINC client in the first place.

xnaas
Send message
Joined: 15 Nov 19
Posts: 4
Credit: 53,568,092
RAC: 0
Level
Thr
Scientific publications
wat
Message 57835 - Posted: 14 Nov 2021 | 16:56:31 UTC - in response to Message 57775.

My client is up-to-date, but I haven't gotten any jobs from GPUGrid in a very, very long time. I've tried resetting the project, removing and re-adding it, etc. with no luck. In my GPUGrid settings here on the website, I'm enabled to get any task from any project type.

My other projects are getting tasks. Rosetta stopped, but that's because it apparently wants VirtualBox now, but I don't want to install that version of BOINC w/ VB.

OS: Windows 10 64-bit
BOINC: 7.16.20 (x64)

Are there just no projects right now other than the Python beta? Is said beta Linux only?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1120
Credit: 8,875,620,176
RAC: 33,427,509
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57836 - Posted: 14 Nov 2021 | 17:13:14 UTC - in response to Message 57835.

Are there just no projects right now other than the Python beta? Is said beta Linux only?

your assumption is correct; unfortunately :-(

Obviously, they now concentrate only on Python beta, so no ACEMD tasks; I am afraid it will go like this for a while :-(

xnaas
Send message
Joined: 15 Nov 19
Posts: 4
Credit: 53,568,092
RAC: 0
Level
Thr
Scientific publications
wat
Message 57837 - Posted: 14 Nov 2021 | 17:38:32 UTC - in response to Message 57836.

That's fine. At least I know it's not just me. :)

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 57858 - Posted: 20 Nov 2021 | 8:36:21 UTC - in response to Message 57397.

https://blog.hubspot.com/website/413-request-entity-too-large?

And another:

Thu 30 Sep 2021 17:11:04 BST | GPUGRID | Started upload of e2s116_e1s44p0f981-ADRIA_AdB_KIXCMYB_HIP-0-2-RND2716_2_9
Thu 30 Sep 2021 17:11:05 BST | GPUGRID | [http] [ID#7007] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
Thu 30 Sep 2021 17:11:05 BST | GPUGRID | [http] [ID#7007] Sent header to server: Content-Length: 540810054
Thu 30 Sep 2021 17:11:06 BST | GPUGRID | [http] [ID#7007] Received header from server: HTTP/1.1 413 Request Entity Too Large




Task 32645986

You'll need to re-configure Apache.

Has this trashing of results finally been resolved after months?
I have stopped contributing with the RTX 3080 (and all other cards) until this is fixed.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57899 - Posted: 27 Nov 2021 | 23:27:09 UTC
Last modified: 27 Nov 2021 | 23:42:53 UTC

NET::ERR_CERT_DATE_INVALID
Subject: www.ps3grid.net
Issuer: R3
Expires on: 2021. nov. 27.
Current date: 2021. nov. 28.
I can reach the forum only through http.
All of my hosts are set to https, they can't reach the servers because the certificate has expired.

GPUGRID | Scheduler request failed: SSL peer certificate or SSH remote key was not OK
Please fix it as soon as possible.
It could take days, or even weeks, so I've suspended GPUGrid on my hosts.

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 78,050,793
RAC: 1,301,924
Level
Thr
Scientific publications
wat
Message 57901 - Posted: 28 Nov 2021 | 7:33:58 UTC

Is there a way to make boinc overlook certificate problems? S

Erich56
Send message
Joined: 1 Jan 15
Posts: 1120
Credit: 8,875,620,176
RAC: 33,427,509
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57902 - Posted: 28 Nov 2021 | 8:16:50 UTC - in response to Message 57899.

NET::ERR_CERT_DATE_INVALID
Subject: www.ps3grid.net
Issuer: R3
Expires on: 2021. nov. 27.
Current date: 2021. nov. 28.
I can reach the forum only through http.
All of my hosts are set to https, they can't reach the servers because the certificate has expired.
GPUGRID | Scheduler request failed: SSL peer certificate or SSH remote key was not OK
Please fix it as soon as possible.
It could take days, or even weeks, so I've suspended GPUGrid on my hosts.


New tasks cannot be downloaded either :-(


Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57903 - Posted: 28 Nov 2021 | 9:00:33 UTC

It also prevents the upload of completed work:

28/11/2021 08:51:47 | GPUGRID | [http] [ID#27114] Info: TLSv1.2 (OUT), TLS alert, certificate expired (557):
28/11/2021 08:51:47 | GPUGRID | [http] [ID#27114] Info: SSL certificate problem: certificate has expired
28/11/2021 08:51:47 | GPUGRID | [http] HTTP error: SSL peer certificate or SSH remote key was not OK
28/11/2021 08:51:48 | GPUGRID | Temporarily failed upload of e1s741_0-ADRIA_BanditGPCR_APJ_b0-0-1-RND2532_5_10: transient HTTP error
28/11/2021 08:51:48 | GPUGRID | Backing off 00:07:19 on upload of e1s741_0-ADRIA_BanditGPCR_APJ_b0-0-1-RND2532_5_10

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 78,050,793
RAC: 1,301,924
Level
Thr
Scientific publications
wat
Message 57904 - Posted: 28 Nov 2021 | 9:02:46 UTC

Post here when it is fixed
I have subscribed.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57907 - Posted: 28 Nov 2021 | 11:14:53 UTC

I have managed (slowly, carefully) to upload and report completed tasks, by editing the urls in client_state.xml

And I have also managed (even more slowly, even more carefully, and with an error on the first attempt) to download and run new work.

But it's not for the faint-hearted: please restore normal service as soon as possible!

[CSF] Aleksey Belkov
Send message
Joined: 26 Dec 13
Posts: 86
Credit: 1,258,731,270
RAC: 480,669
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57914 - Posted: 28 Nov 2021 | 14:13:06 UTC - in response to Message 57907.

I have managed (slowly, carefully) to upload and report completed tasks, by editing the urls in client_state.xml

If you change project url from secure HTTPS to unsecure HTTP, it's wise to beforehand change authentication key( <authenticator></authenticator> ) in account_www.gpugrid.net.xml to weak account key( https://gpugrid.net/weak_auth.php ) to prevent account abuse if http traffic will be sniffed by someone.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1120
Credit: 8,875,620,176
RAC: 33,427,509
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57932 - Posted: 28 Nov 2021 | 16:11:51 UTC - in response to Message 57907.
Last modified: 28 Nov 2021 | 16:22:14 UTC

Richard wrote:

And I have also managed (even more slowly, even more carefully, and with an error on the first attempt) to download and run new work.

oh, I see, so this was you :-)
I was wondering how come that in on the project status page, the number of unsent tasks was slightly decreasing.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57933 - Posted: 28 Nov 2021 | 16:15:50 UTC - in response to Message 57907.

But it's not for the faint-hearted: please restore normal service as soon as possible!

I am out after only a couple of days. It was fun while it lasted.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 579
Credit: 8,879,137,024
RAC: 17,737,965
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57935 - Posted: 28 Nov 2021 | 17:06:53 UTC - in response to Message 57907.
Last modified: 28 Nov 2021 | 17:07:17 UTC

I have managed (slowly, carefully) to upload and report completed tasks, by editing the urls in client_state.xml

Thank you, Richard, once more time.
Guided by your clue, I uploaded all my finished tasks without losing their deserved bonus.
But I reverted everything to https, and I'm waiting for the problem to be solved on project's side before downloading new work.
Weekend, bad moment for the Informatics on duty at a University to be aware to solve emerged problems...
(We usually think of infrastructure maintenance staff as being omnipresent, but surely they also need to take their weekly rest to recharge energy :-)

marsinph
Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 57936 - Posted: 28 Nov 2021 | 17:23:58 UTC - in response to Message 57907.

I have managed (slowly, carefully) to upload and report completed tasks, by editing the urls in client_state.xml

And I have also managed (even more slowly, even more carefully, and with an error on the first attempt) to download and run new work.

But it's not for the faint-hearted: please restore normal service as soon as possible!



Hello Richard,
I have try your proposal (editing)
I found my pending upload WU, but still problem ! Now Transiant (CA error disapeared)But still unable to report.
Like you say, it is a possible solution. But very long.
More and more there are problems with this project.
Thank you for all your informations.
Best regards

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57938 - Posted: 28 Nov 2021 | 17:40:45 UTC

Since then, I've posted full details of the basic technique in message 57912, and possible reasons for the failure to report in message 57930.

Both are in the Stuck on 'uploading' thread, Server and website area. Read the whole thread for context.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,042,942,459
RAC: 15,385,321
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57942 - Posted: 28 Nov 2021 | 18:43:53 UTC

Just tried the trick but it did not work. Can't get past 24 hour backoff.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,042,942,459
RAC: 15,385,321
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57943 - Posted: 28 Nov 2021 | 18:43:56 UTC
Last modified: 28 Nov 2021 | 18:45:00 UTC

deleted

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,042,942,459
RAC: 15,385,321
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57944 - Posted: 28 Nov 2021 | 18:44:00 UTC
Last modified: 28 Nov 2021 | 18:44:47 UTC

deleted

TrevG
Send message
Joined: 19 Mar 14
Posts: 5
Credit: 14,682,787
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 57954 - Posted: 29 Nov 2021 | 10:01:37 UTC - in response to Message 57942.
Last modified: 29 Nov 2021 | 10:10:05 UTC

The project is back ok now.
Try a reset, or re-attaching and it should pick up, as it did for me just now.
I had a 'fun' few hours forcing a download of 100Mb- but only got to 60% before the plug was pulled server side..
I see that the previous Cuda 1101 zip file is no longer attached -only Cuda101, as I had wondered if this was a factor -apart from file size and expired certs,but probably not.
However the larger Cuda file is still only downloading at ~8KBps, so the unit won't be running for quite a while..

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57955 - Posted: 29 Nov 2021 | 10:20:03 UTC - in response to Message 57954.

Just try a normal update, or a retry on any stalled transfers, before going any further.

A full project reset shouldn't be needed, and all those extra transfers will just slow down the project recovery for everyone.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 74,786,176
RAC: 500,630
Level
Thr
Scientific publications
wat
Message 57956 - Posted: 29 Nov 2021 | 10:31:13 UTC - in response to Message 57955.
Last modified: 29 Nov 2021 | 10:32:05 UTC

Having worked through your excellent instructions Richard, I finally succeeded in getting the pending uploads through. Forgot to set NNW and thus the manager requested 2 new tasks. However, I didn't bother to fix the new work download issue. Had 2 pending downloads that I aborted. This morning when all was fixed, I repeatedly got the scheduler request completed – downloads stalled message and thus reattached the project. Took less than 1.5 min to download all necessary project files and now I am back to business as usual. Only this approach seemed to solve the above annoying scheduler message.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 337
Credit: 7,175,151,065
RAC: 17,333,088
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57957 - Posted: 29 Nov 2021 | 11:43:13 UTC

Site is back to normal. No need for funny business or resets.
No more Unsecure message when browsing.
Uploads/Downloads work as intended.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57961 - Posted: 29 Nov 2021 | 16:25:02 UTC - in response to Message 57957.

there was a problem with the certificate renewal. Now is fine.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57962 - Posted: 29 Nov 2021 | 16:25:07 UTC - in response to Message 57957.

there was a problem with the certificate renewal. Now is fine.

marsinph
Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 57970 - Posted: 29 Nov 2021 | 19:52:59 UTC - in response to Message 57961.

there was a problem with the certificate renewal. Now is fine.



Everyone knows it since three days !!!
It tooks three days to solve the problem !
Not be surprised, that less ans less users leaves your project.
Only the one who race for Formula Boinc, are following.
Thank you Richard Hasselgrove for your help.

marsinph
Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 57972 - Posted: 29 Nov 2021 | 19:58:34 UTC - in response to Message 57955.

Just try a normal update, or a retry on any stalled transfers, before going any further.

A full project reset shouldn't be needed, and all those extra transfers will just slow down the project recovery for everyone.



Thank you Richard for all your help.Why you not join as computer scientist, this team ?
You are everuwhere, with a very heavy knownledge about Boinc.
I write now, after publication of site admin who says " it was a certificate problem".
All of us knows it! Only very late reaction from admin !
Best regards from Belgium. (sorry for my english, i try to do my best)



____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57974 - Posted: 29 Nov 2021 | 21:16:51 UTC - in response to Message 57972.

Thanks for the kind words. Trying to solve these little problems goes some way to keeping those little grey cells in working order.

I did - many years ago - try to put forward the concept of 'technical moderators' as a specialist position within BOINC: people with technical knowledge who could bridge the gap between the mass of volunteers and the project scientists or administrators. There's a need for people who can decipher [5-year old voice on] Mummeeeee - it's not working! [5-year old voice off] and turn it into a technical description of what needs to be tweaked.

The idea never took off (the project side couldn't see the need), but I've gone on trying to live the dream.

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 114,748,150
RAC: 198,088
Level
Cys
Scientific publications
wat
Message 57977 - Posted: 30 Nov 2021 | 3:42:40 UTC - in response to Message 57974.
Last modified: 30 Nov 2021 | 3:43:02 UTC

Richard... and because you try there will be a small quite, wonderful place in heaven for you.

Bill F

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57979 - Posted: 30 Nov 2021 | 14:02:27 UTC

Ta. Continuing on our theme of 'things the volunteers have noticed, and project admin might like to take a look at...'

We are now running ACEMD3 v2.19, deployed on 10 Nov 2021. The first tasks had a data error, but we've now been running ADRIA_BanditGPCR tasks successfully since Friday 26 November.

The apps come in two flavours, cuda 101 and cuda 1121. I'll let the owners of Ampere cards pursue their own private grief, but I'm worried about the rest of us.

The machines I run here all have GTX 1660 series cards - all modern and efficient, and fast enough to complete these tasks in under 24 hours. Five cards have returned four tasks each, and all twenty have validated.

All machines have tried both cuda101 and cuda1121, and on four out of five cuda1121 is clearly faster than cuda101. The fifth is a bit ambiguous.

That means that BOINC should be moving towards issuing cuda1121 preferentially. In fact, it should have reached that point by now - we must have completed well over 100 tasks globally since this version was launched.

But all four of my 'clear advantage' machines are currently running cuda101, and only the ambiguous one is trying cuda1121 again. That's the wrong way round.

Why? Looking at the details for each of our computers, there's a link for "Application details: Show". That brings up the history for that computer, running each application that its tried.

The crucial lines here are 'Number of tasks completed' and 'Average processing rate'. Once 'Number of tasks completed' reaches 11, the server should compare the APRs and preferentially assign the fastest app for new work, when there's a choice.

But my hosts are showing zero tasks completed, despite the 'Consecutive valid tasks' count being filled in. If the project-global count of completed tasks (which we can't inspect directly) is also not filling in properly, that would explain the bias towards cuda101. But I can't explain why the stats aren't being recorded properly.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1616
Credit: 8,056,644,351
RAC: 19,330,304
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57986 - Posted: 1 Dec 2021 | 9:57:21 UTC - in response to Message 57979.

Well, I don't know how it happened, but after writing all that yesterday...

Today's rotation has brought me a clean sweep of cuda1121 tasks across all five machines. Coincidence, or a tweak to the server? No way of knowing externally, but it's good news both for the project (the science will be done more quickly) and for the volunteers.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 58006 - Posted: 1 Dec 2021 | 23:07:32 UTC - in response to Message 57962.

there was a problem with the certificate renewal. Now is fine.

Nice.
How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58007 - Posted: 1 Dec 2021 | 23:15:17 UTC - in response to Message 58006.

How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?
Kind of.
The present workunits are much shorter, thus their result file is much shorter (~270MB) as well.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 58013 - Posted: 2 Dec 2021 | 1:47:14 UTC - in response to Message 58007.

How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?
Kind of.
The present workunits are much shorter, thus their result file is much shorter (~270MB) as well.

Well, that's not a solid solution to the problem.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Post to thread

Message boards : News : Server problems

//