Back-end broken (/tmp out of space?)

Message boards : Number crunching : Back-end broken (/tmp out of space?)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matthew McCleary

Send message
Joined: 11 Oct 19
Posts: 31
Credit: 6,255,862,310
RAC: 2,959
Message 3396 - Posted: 27 Aug 2021, 20:25:47 UTC

Maybe something is broken on the back-end at Collatz. I got a bunch of errored-out work units earlier today and my throughput was way down. I also have 10 M credit that has not yet been reported to BOINC, and when I load my profile page I get at the top:

Warning: mysqli_query(): (HY000/3): Error writing file '/tmp/MYvY3IdB' (Errcode: 28 - No space left on device) in /home/boincadm/projects/collatz/html/inc/db.inc on line 57

Warning: mysqli_fetch_object() expects parameter 1 to be mysqli_result, boolean given in /home/boincadm/projects/collatz/html/inc/db.inc on line 67

Warning: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in /home/boincadm/projects/collatz/html/inc/db.inc on line 76


But I'm realizing also that it's hunting season, so despite that hopefully we can get this fixed before not too long.
ID: 3396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Matthew McCleary

Send message
Joined: 11 Oct 19
Posts: 31
Credit: 6,255,862,310
RAC: 2,959
Message 3397 - Posted: 27 Aug 2021, 20:35:06 UTC
Last modified: 27 Aug 2021, 20:35:16 UTC

Yup, according to https://boinc.thesonntags.com/collatz/server_status.php, a bunch of stuff is down, including the sieve_validator. I guess that explains it.
ID: 3397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cereberus

Send message
Joined: 23 Nov 11
Posts: 13
Credit: 7,059,735
RAC: 27,742
Message 3398 - Posted: 28 Aug 2021, 14:37:07 UTC - in response to Message 3396.  

Some of my WU's are reported as failed. Bit annoying
ID: 3398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gordon Lack

Send message
Joined: 14 Apr 12
Posts: 12
Credit: 689,696,968
RAC: 412
Message 3402 - Posted: 31 Aug 2021, 9:55:58 UTC

The failed ones (mine at least) all fail within 2s.
This has happened ever time there has been a problem in the past.
Basically a bad set of jobs gets generated(?) and has to flush through the system (each one gets sent to 6 systems before it "dies").
Just be patient and expect a few very quick failures for a few weeks.
ID: 3402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 195
Credit: 80,231,709,401
RAC: 84,729,949
Message 3404 - Posted: 1 Sep 2021, 20:24:34 UTC

ID: 3404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Slicker
Project administrator

Send message
Joined: 11 Jun 09
Posts: 78
Credit: 943,644,517
RAC: 0
Message 3406 - Posted: 2 Sep 2021, 0:20:31 UTC - in response to Message 3402.  

The failed ones (mine at least) all fail within 2s.
This has happened ever time there has been a problem in the past.
Basically a bad set of jobs gets generated(?) and has to flush through the system (each one gets sent to 6 systems before it "dies").
Just be patient and expect a few very quick failures for a few weeks.


Correct. Very well said. I am trying go through all the workunit and result files and make sure there aren't any orphaned records in the database and vice versa.
ID: 3406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rcthardcore

Send message
Joined: 15 May 10
Posts: 12
Credit: 134,567,058
RAC: 63
Message 3427 - Posted: 30 Sep 2021, 19:01:41 UTC

The project servers must be down again. Can't get any work. Checked server status page. It said everything was working.
ID: 3427 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
lanbrown

Send message
Joined: 8 Sep 19
Posts: 10
Credit: 3,611,739,401
RAC: 2,152,091
Message 3428 - Posted: 30 Sep 2021, 19:18:47 UTC - in response to Message 3427.  

Nope, check the other threads for today and you'll see what the issue is.
ID: 3428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,191,262,790
RAC: 8,539,677
Message 3434 - Posted: 1 Oct 2021, 5:35:50 UTC - in response to Message 3428.  

Nope, check the other threads for today and you'll see what the issue is.

_________________________

What is the issue? The certificate is fine. Boinc is the latest version. CPDN, WCG and WUProp are fine. At least on my computers, they are fine. Frustrating. Even Boinc Forums are scratching their heads.
ID: 3434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
lanbrown

Send message
Joined: 8 Sep 19
Posts: 10
Credit: 3,611,739,401
RAC: 2,152,091
Message 3439 - Posted: 1 Oct 2021, 16:11:52 UTC - in response to Message 3434.  

Nope, check the other threads for today and you'll see what the issue is.

_________________________

What is the issue? The certificate is fine. Boinc is the latest version. CPDN, WCG and WUProp are fine. At least on my computers, they are fine. Frustrating. Even Boinc Forums are scratching their heads.


No they are not. The issue has been identified. A root certificate that only the Windows version of the client has is expired and sites/projects that use Let's Encrypt certificates are the ones that are having the issue. Remove the expired certificate from the file and the issue goes away. Look at the links I provided and they take you to a thread that has the fix. Or you can wait until October 5th when a new Windows client is released that should have the updated file.
ID: 3439 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Back-end broken (/tmp out of space?)


©2022 Jon Sonntag; All rights reserved