Message boards :
Number crunching :
Hundreds of tasks failing
Message board moderation
Author | Message |
---|---|
candido Send message Joined: 19 Dec 10 Posts: 1 Credit: 1,003,167,384 RAC: 0 |
Most of the tasks are failing with computation errors. And the computers to where they are resend fail too. Any ideas? Thanks C PS: Example <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -102 (0xffffff9a)</message> <stderr_txt> Collatz Conjecture Sieve 1.30 Windows x86_64 for OpenCL Written by Slicker (Jon Sonntag) of team SETI.USA Based on the AMD Brook+ kernels by Gipsel of team Planet 3DNow! Sieve code and OpenCL optimization provided by Sosiris of team BOINC@Taiwan Collatz Config Settings: verbose 1 (yes) kernels/reduction 48 threads 2^6 (64) lut_size 17 (1048576 bytes) sieve_size 2^30 (51085096 bytes) sleep 1 cache_sieve 1 (yes) reducecpu 0 (no) Processor Type NVIDIA Max Dimensions 3 Max Work Items 1024 1024 64 Max Work Groups 1024 Max Kernel Threads 256 Device Vendor NVIDIA Corporation Name GeForce GTX 1660 Ti Driver Version 457.74 OpenCL Version OpenCL 1.2 CUDA Device Vendor NVIDIA Corporation Name GeForce GTX 1660 Ti Driver Version 457.74 OpenCL Version OpenCL 1.2 CUDA worker: error reading input file. Error -102. Processing Aborted. 17:01:10 (7368): called boinc_finish(-102) </stderr_txt> ]]> |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 46,225 |
Most of the tasks are failing with computation errors. And the computers to where they are resend fail too. You've had this same problem for awhile and haven't fixed it yet! ERR_READ -102 - BOINC has a problem reading from the drive. Maybe you do not have rights to read from the BOINC directory. Solution: Make sure you have rights in your operating system to read from the drive. Check your drive for consistency, in Windows using chkdsk. https://boinc.mundayweb.com/wiki/index.php?title=Error_code_-100_to_-110_explained |
abhi506 Send message Joined: 7 Nov 17 Posts: 1 Credit: 9,358,547,555 RAC: 8,416,077 |
Facing the same problem myself. I see 0 KB input files on my Windows system, so it must be something with the project itself rather than a user side error. I remember this issue happening earlier as well. I'm checking for work twice a day and temporarily moved to other GPU projects till the situation stabilizes. Hope this helps. |
![]() Send message Joined: 13 Dec 18 Posts: 37 Credit: 8,695,693,523 RAC: 401,855 |
I was working on optimizing my RTX 2080 and things started failing left and right. The fact I was in the middle of changing configs didn't help. Glad to hear it's not me. |
![]() Send message Joined: 30 May 17 Posts: 119 Credit: 37,173,545,890 RAC: 6 |
UK Times: Wed 17 Mar 05:35:31 2021 | collatz | Started upload of collatz_sieve_d4fa341a-8993-4b45-987b-2f1fb5d622b2_0_r179307990_0 Wed 17 Mar 05:35:33 2021 | collatz | [error] Error reported by file upload server: can't write file collatz_sieve_d4fa341a-8993-4b45-987b-2f1fb5d622b2_0_r179307990_0: No space left on server Wed 17 Mar 05:35:33 2021 | collatz | Temporarily failed upload of collatz_sieve_d4fa341a-8993-4b45-987b-2f1fb5d622b2_0_r179307990_0: transient upload error Later followed by: Wed 17 Mar 06:09:56 2021 | collatz | Computation for task collatz_sieve_d25e0415-4a43-43ce-9941-4eff51679842_0 finished Wed 17 Mar 06:09:56 2021 | collatz | Output file collatz_sieve_d25e0415-4a43-43ce-9941-4eff51679842_0_r1141664431_0 for task collatz_sieve_d25e0415-4a43-43ce-9941-4eff51679842_0 absent And then all tasks failing after that :( ![]() |
Tigers_Dave![]() Send message Joined: 23 Sep 12 Posts: 201 Credit: 91,031,880,121 RAC: 98,835,090 |
Hey, IDEA, glad to see you are still hanging around here. I just received 87 AMD tasks for one of my MacPro 5,1s and every one of them failed with the "Computation Error" problem. Then, as soon as those tasks were reported to the server, the server responded by saying that communication is deferred for 24 hours. Bummer. |
Tigers_Dave![]() Send message Joined: 23 Sep 12 Posts: 201 Credit: 91,031,880,121 RAC: 98,835,090 |
Used the "Update" command to force communications with the server. Got 174 AMD and NVIDIA tasks. The first AMD and NVIDIA tasks appear to be crunching stably. Crossing my fingers. Can't hang around here at the office watching over this unit. The National Weather Service just issued a Tornado Watch and campus has been closed. Gotta run for home and batten down the hatches. |
![]() Send message Joined: 8 Jul 10 Posts: 1 Credit: 6,355,437,886 RAC: 3 |
Same here on a Nvidia RTX 3090. Nearly every work unit is signed with error. |
![]() ![]() Send message Joined: 7 Nov 10 Posts: 5 Credit: 2,832,454,732 RAC: 0 |
Me too, I looked at my BOINC directory where tasks are saved, and on (approx.) 1162 files, 818 are 0kb... (and at least 342 are 1kb), it's a project issue, not on my side for sure. Edit : Just saw that the project is "shutdown for maintenance" 18/03/2021 13:24:19 | collatz | Project is temporarily shut down for maintenance |
![]() Send message Joined: 30 May 17 Posts: 119 Credit: 37,173,545,890 RAC: 6 |
Hey, IDEA, glad to see you are still hanging around here. I've still got a couple of old iMacs crunching Collatz because I can't find much else for them to crunch :( Looks like this problem is still persisting though as boincstats shows everybody has 0 credit for the last 24 hours and the server is responding with "temporarily down for maintenance". PS. Hope the hatch battening was successful and the weather bypassed you? ![]() |
Tigers_Dave![]() Send message Joined: 23 Sep 12 Posts: 201 Credit: 91,031,880,121 RAC: 98,835,090 |
Hey, IDEA, glad to see you are still hanging around here. We got lucky with the weather. No hail and no tornadoes, although we did get quite a bit of rain. I am hopeful that something has been happening to resolve the issues with the project, as we no longer receive the "temporarily down for maintenance" message. Moreover, several hours ago I regained access to the C@H home page and message boards. But, obviously, much remains to be resolved. As my computers run out of Collatz work, I am reconnecting them to Einstein@Home, another one of my long-time faves, as its support for Macs is pretty good. I am also thinking about the balance between work on behalf of Collatz and work on behalf of Einstein. In part this is because Einstein work units run a bit cooler than Collatz work units and the steady-state temperature in my office is 27-30 C. |
Tigers_Dave![]() Send message Joined: 23 Sep 12 Posts: 201 Credit: 91,031,880,121 RAC: 98,835,090 |
Hey, IDEA, glad to see you are still hanging around here. We got lucky with the weather. No hail and no tornadoes, although we did get quite a bit of rain. I am hopeful that something has been happening to resolve the issues with the project, as we no longer receive the "temporarily down for maintenance" message. Moreover, several hours ago I regained access to the C@H home page and message boards. But, obviously, much remains to be resolved. As my computers run out of Collatz work, I am reconnecting them to Einstein@Home, another one of my long-time faves, as its support for Macs is pretty good. I am also thinking about the balance between work on behalf of Collatz and work on behalf of Einstein. In part this is because Einstein work units run a bit cooler than Collatz work units and the steady-state temperature in my office is 27-30 C. |
Tigers_Dave![]() Send message Joined: 23 Sep 12 Posts: 201 Credit: 91,031,880,121 RAC: 98,835,090 |
Hey, IDEA, glad to see you are still hanging around here. We got lucky with the weather. No hail and no tornadoes, although we did get quite a bit of rain. I am hopeful that something has been happening to resolve the issues with the project, as we no longer receive the "temporarily down for maintenance" message. Moreover, several hours ago I regained access to the C@H home page and message boards. But, obviously, much remains to be resolved. As my computers run out of Collatz work, I am reconnecting them to Einstein@Home, another one of my long-time faves, as its support for Macs is pretty good. I am also thinking about the balance between work on behalf of Collatz and work on behalf of Einstein. In part this is because Einstein work units run a bit cooler than Collatz work units and the steady-state temperature in my office is 27-30 C. |
KAMasud Send message Joined: 20 Oct 11 Posts: 48 Credit: 4,654,522,722 RAC: 39,703 |
Hey, IDEA, glad to see you are still hanging around here. _______________________ Hatch battening? Not shifted as yet to Mac Greggor Hatches? You seem to be my vintage. Steam winches and windlass by any chance? Those were fun. |
KAMasud Send message Joined: 20 Oct 11 Posts: 48 Credit: 4,654,522,722 RAC: 39,703 |
Hey, IDEA, glad to see you are still hanging around here. _______________________ Hatch battening? Not shifted as yet to Mac Greggor Hatches? You seem to be my vintage. Steam winches and windlass by any chance? Those were fun. |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 46,225 |
Hey, IDEA, glad to see you are still hanging around here. And THAT could be the problem with Collatz...just guessing here as I have zero insider info but the Admin does hose the Project in Minnesota or somewhere up there in the upper center of the US and it just got whacked pretty good with a snowstorm and he could still be recovering from it. |
Gordon Lack Send message Joined: 14 Apr 12 Posts: 12 Credit: 693,112,270 RAC: 163,365 |
Me too, I looked at my BOINC directory where tasks are saved, and on (approx.) 1162 files, 818 are 0kb... (and at least 342 are 1kb), it's a project issue, not on my side for sure.I suspect this is a consequence of the server problem in general. Perhaps it was out of disk space so resulted in creating jobs with empty input files? I suspect that there is a batch of these, but they'll work through the system and it will then get back to normal. After a hundred or so failures across my systems they are now receiving valid jobs and back to "normal" processing. |
![]() ![]() Send message Joined: 7 Nov 10 Posts: 5 Credit: 2,832,454,732 RAC: 0 |
Nah, I'm still getting a LOT of 0kb files. I think the system will wait for the 3 errors in a row per units to cancel them. |
Gordon Lack Send message Joined: 14 Apr 12 Posts: 12 Credit: 693,112,270 RAC: 163,365 |
I think the system will wait for the 3 errors in a row per units to cancel them.That's "working through the system". |
![]() ![]() Send message Joined: 7 Nov 10 Posts: 5 Credit: 2,832,454,732 RAC: 0 |
😂🤣😂 I was not really awake 🤣 |
©2022 Jon Sonntag; All rights reserved