Hundreds of tasks failing

Message boards : Number crunching : Hundreds of tasks failing
Message board moderation

To post messages, you must log in.

AuthorMessage
candido

Send message
Joined: 19 Dec 10
Posts: 1
Credit: 1,003,167,384
RAC: 0
Message 3201 - Posted: 16 Mar 2021, 20:30:27 UTC

Most of the tasks are failing with computation errors. And the computers to where they are resend fail too.
Any ideas?
Thanks
C

PS: Example
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -102 (0xffffff9a)</message>
<stderr_txt>
Collatz Conjecture Sieve 1.30 Windows x86_64 for OpenCL
Written by Slicker (Jon Sonntag) of team SETI.USA
Based on the AMD Brook+ kernels by Gipsel of team Planet 3DNow!
Sieve code and OpenCL optimization provided by Sosiris of team BOINC@Taiwan
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 48
threads 2^6 (64)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Processor Type NVIDIA
Max Dimensions 3
Max Work Items 1024 1024 64
Max Work Groups 1024
Max Kernel Threads 256
Device Vendor NVIDIA Corporation
Name GeForce GTX 1660 Ti
Driver Version 457.74
OpenCL Version OpenCL 1.2 CUDA
Device Vendor NVIDIA Corporation
Name GeForce GTX 1660 Ti
Driver Version 457.74
OpenCL Version OpenCL 1.2 CUDA
worker: error reading input file.
Error -102. Processing Aborted.
17:01:10 (7368): called boinc_finish(-102)

</stderr_txt>
]]>
ID: 3201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 46,225
Message 3202 - Posted: 16 Mar 2021, 21:46:48 UTC - in response to Message 3201.  

Most of the tasks are failing with computation errors. And the computers to where they are resend fail too.
Any ideas?
Thanks
C

PS: Example
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -102 (0xffffff9a)</message>
<stderr_txt>
Collatz Conjecture Sieve 1.30 Windows x86_64 for OpenCL
Written by Slicker (Jon Sonntag) of team SETI.USA
Based on the AMD Brook+ kernels by Gipsel of team Planet 3DNow!
Sieve code and OpenCL optimization provided by Sosiris of team BOINC@Taiwan
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 48
threads 2^6 (64)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Processor Type NVIDIA
Max Dimensions 3
Max Work Items 1024 1024 64
Max Work Groups 1024
Max Kernel Threads 256
Device Vendor NVIDIA Corporation
Name GeForce GTX 1660 Ti
Driver Version 457.74
OpenCL Version OpenCL 1.2 CUDA
Device Vendor NVIDIA Corporation
Name GeForce GTX 1660 Ti
Driver Version 457.74
OpenCL Version OpenCL 1.2 CUDA
worker: error reading input file.
Error -102. Processing Aborted.
17:01:10 (7368): called boinc_finish(-102)

</stderr_txt> ]]>


You've had this same problem for awhile and haven't fixed it yet!

ERR_READ -102 - BOINC has a problem reading from the drive. Maybe you do not have rights to read from the BOINC directory.

Solution: Make sure you have rights in your operating system to read from the drive. Check your drive for consistency, in Windows using chkdsk.

https://boinc.mundayweb.com/wiki/index.php?title=Error_code_-100_to_-110_explained
ID: 3202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abhi506

Send message
Joined: 7 Nov 17
Posts: 1
Credit: 9,358,606,598
RAC: 8,418,739
Message 3206 - Posted: 17 Mar 2021, 7:09:21 UTC - in response to Message 3202.  

Facing the same problem myself. I see 0 KB input files on my Windows system, so it must be something with the project itself rather than a user side error. I remember this issue happening earlier as well. I'm checking for work twice a day and temporarily moved to other GPU projects till the situation stabilizes. Hope this helps.
ID: 3206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Doug

Send message
Joined: 13 Dec 18
Posts: 37
Credit: 8,695,693,523
RAC: 401,855
Message 3207 - Posted: 17 Mar 2021, 7:51:59 UTC - in response to Message 3206.  

I was working on optimizing my RTX 2080 and things started failing left and right. The fact I was in the middle of changing configs didn't help. Glad to hear it's not me.
ID: 3207 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile IDEA

Send message
Joined: 30 May 17
Posts: 119
Credit: 37,173,545,890
RAC: 6
Message 3209 - Posted: 17 Mar 2021, 11:58:20 UTC

UK Times:

Wed 17 Mar 05:35:31 2021 | collatz | Started upload of collatz_sieve_d4fa341a-8993-4b45-987b-2f1fb5d622b2_0_r179307990_0
Wed 17 Mar 05:35:33 2021 | collatz | [error] Error reported by file upload server: can't write file collatz_sieve_d4fa341a-8993-4b45-987b-2f1fb5d622b2_0_r179307990_0: No space left on server
Wed 17 Mar 05:35:33 2021 | collatz | Temporarily failed upload of collatz_sieve_d4fa341a-8993-4b45-987b-2f1fb5d622b2_0_r179307990_0: transient upload error

Later followed by:

Wed 17 Mar 06:09:56 2021 | collatz | Computation for task collatz_sieve_d25e0415-4a43-43ce-9941-4eff51679842_0 finished
Wed 17 Mar 06:09:56 2021 | collatz | Output file collatz_sieve_d25e0415-4a43-43ce-9941-4eff51679842_0_r1141664431_0 for task collatz_sieve_d25e0415-4a43-43ce-9941-4eff51679842_0 absent

And then all tasks failing after that :(
ID: 3209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 201
Credit: 91,032,493,251
RAC: 98,887,666
Message 3210 - Posted: 17 Mar 2021, 20:20:07 UTC
Last modified: 17 Mar 2021, 20:21:25 UTC

Hey, IDEA, glad to see you are still hanging around here.

I just received 87 AMD tasks for one of my MacPro 5,1s and every one of them failed with the "Computation Error" problem. Then, as soon as those tasks were reported to the server, the server responded by saying that communication is deferred for 24 hours. Bummer.
ID: 3210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 201
Credit: 91,032,493,251
RAC: 98,887,666
Message 3211 - Posted: 17 Mar 2021, 20:25:11 UTC - in response to Message 3210.  
Last modified: 17 Mar 2021, 20:26:37 UTC

Used the "Update" command to force communications with the server. Got 174 AMD and NVIDIA tasks. The first AMD and NVIDIA tasks appear to be crunching stably. Crossing my fingers.

Can't hang around here at the office watching over this unit. The National Weather Service just issued a Tornado Watch and campus has been closed. Gotta run for home and batten down the hatches.
ID: 3211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile conf [MM]

Send message
Joined: 8 Jul 10
Posts: 1
Credit: 6,355,437,886
RAC: 3
Message 3213 - Posted: 18 Mar 2021, 5:16:23 UTC

Same here on a Nvidia RTX 3090. Nearly every work unit is signed with error.
ID: 3213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Dell>UniversdeDell.com] Athar
Avatar

Send message
Joined: 7 Nov 10
Posts: 5
Credit: 2,832,454,732
RAC: 0
Message 3216 - Posted: 18 Mar 2021, 12:19:51 UTC
Last modified: 18 Mar 2021, 12:25:17 UTC

Me too, I looked at my BOINC directory where tasks are saved, and on (approx.) 1162 files, 818 are 0kb... (and at least 342 are 1kb), it's a project issue, not on my side for sure.


Edit : Just saw that the project is "shutdown for maintenance"
18/03/2021 13:24:19 | collatz | Project is temporarily shut down for maintenance
ID: 3216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile IDEA

Send message
Joined: 30 May 17
Posts: 119
Credit: 37,173,545,890
RAC: 6
Message 3220 - Posted: 18 Mar 2021, 15:43:48 UTC - in response to Message 3210.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?
ID: 3220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 201
Credit: 91,032,493,251
RAC: 98,887,666
Message 3225 - Posted: 19 Mar 2021, 16:04:07 UTC - in response to Message 3220.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?


We got lucky with the weather. No hail and no tornadoes, although we did get quite a bit of rain.

I am hopeful that something has been happening to resolve the issues with the project, as we no longer receive the "temporarily down for maintenance" message. Moreover, several hours ago I regained access to the C@H home page and message boards. But, obviously, much remains to be resolved. As my computers run out of Collatz work, I am reconnecting them to Einstein@Home, another one of my long-time faves, as its support for Macs is pretty good. I am also thinking about the balance between work on behalf of Collatz and work on behalf of Einstein. In part this is because Einstein work units run a bit cooler than Collatz work units and the steady-state temperature in my office is 27-30 C.
ID: 3225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 201
Credit: 91,032,493,251
RAC: 98,887,666
Message 3226 - Posted: 19 Mar 2021, 16:04:26 UTC - in response to Message 3220.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?


We got lucky with the weather. No hail and no tornadoes, although we did get quite a bit of rain.

I am hopeful that something has been happening to resolve the issues with the project, as we no longer receive the "temporarily down for maintenance" message. Moreover, several hours ago I regained access to the C@H home page and message boards. But, obviously, much remains to be resolved. As my computers run out of Collatz work, I am reconnecting them to Einstein@Home, another one of my long-time faves, as its support for Macs is pretty good. I am also thinking about the balance between work on behalf of Collatz and work on behalf of Einstein. In part this is because Einstein work units run a bit cooler than Collatz work units and the steady-state temperature in my office is 27-30 C.
ID: 3226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 201
Credit: 91,032,493,251
RAC: 98,887,666
Message 3227 - Posted: 19 Mar 2021, 16:04:27 UTC - in response to Message 3220.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?


We got lucky with the weather. No hail and no tornadoes, although we did get quite a bit of rain.

I am hopeful that something has been happening to resolve the issues with the project, as we no longer receive the "temporarily down for maintenance" message. Moreover, several hours ago I regained access to the C@H home page and message boards. But, obviously, much remains to be resolved. As my computers run out of Collatz work, I am reconnecting them to Einstein@Home, another one of my long-time faves, as its support for Macs is pretty good. I am also thinking about the balance between work on behalf of Collatz and work on behalf of Einstein. In part this is because Einstein work units run a bit cooler than Collatz work units and the steady-state temperature in my office is 27-30 C.
ID: 3227 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,654,522,722
RAC: 39,703
Message 3233 - Posted: 20 Mar 2021, 3:38:57 UTC - in response to Message 3220.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?

_______________________
Hatch battening? Not shifted as yet to Mac Greggor Hatches? You seem to be my vintage. Steam winches and windlass by any chance? Those were fun.
ID: 3233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,654,522,722
RAC: 39,703
Message 3234 - Posted: 20 Mar 2021, 3:39:57 UTC - in response to Message 3220.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?

_______________________
Hatch battening? Not shifted as yet to Mac Greggor Hatches? You seem to be my vintage. Steam winches and windlass by any chance? Those were fun.
ID: 3234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 46,225
Message 3235 - Posted: 20 Mar 2021, 12:48:04 UTC - in response to Message 3234.  

Hey, IDEA, glad to see you are still hanging around here.


I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :(

Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance".

PS. Hope the hatch battening was successful and the weather bypassed you?

_______________________
Hatch battening? Not shifted as yet to Mac Greggor Hatches? You seem to be my vintage. Steam winches and windlass by any chance? Those were fun.


And THAT could be the problem with Collatz...just guessing here as I have zero insider info but the Admin does hose the Project in Minnesota or somewhere up there in the upper center of the US and it just got whacked pretty good with a snowstorm and he could still be recovering from it.
ID: 3235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gordon Lack

Send message
Joined: 14 Apr 12
Posts: 12
Credit: 693,112,270
RAC: 163,365
Message 3238 - Posted: 20 Mar 2021, 23:28:01 UTC - in response to Message 3216.  

Me too, I looked at my BOINC directory where tasks are saved, and on (approx.) 1162 files, 818 are 0kb... (and at least 342 are 1kb), it's a project issue, not on my side for sure.
I suspect this is a consequence of the server problem in general.
Perhaps it was out of disk space so resulted in creating jobs with empty input files?

I suspect that there is a batch of these, but they'll work through the system and it will then get back to normal.

After a hundred or so failures across my systems they are now receiving valid jobs and back to "normal" processing.
ID: 3238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Dell>UniversdeDell.com] Athar
Avatar

Send message
Joined: 7 Nov 10
Posts: 5
Credit: 2,832,454,732
RAC: 0
Message 3240 - Posted: 21 Mar 2021, 10:50:50 UTC
Last modified: 21 Mar 2021, 10:55:25 UTC

Nah, I'm still getting a LOT of 0kb files.

I think the system will wait for the 3 errors in a row per units to cancel them.
ID: 3240 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gordon Lack

Send message
Joined: 14 Apr 12
Posts: 12
Credit: 693,112,270
RAC: 163,365
Message 3241 - Posted: 21 Mar 2021, 14:38:34 UTC - in response to Message 3240.  

I think the system will wait for the 3 errors in a row per units to cancel them.
That's "working through the system".
ID: 3241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Dell>UniversdeDell.com] Athar
Avatar

Send message
Joined: 7 Nov 10
Posts: 5
Credit: 2,832,454,732
RAC: 0
Message 3242 - Posted: 21 Mar 2021, 20:08:40 UTC - in response to Message 3241.  

😂🤣😂 I was not really awake 🤣
ID: 3242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Hundreds of tasks failing


©2022 Jon Sonntag; All rights reserved