Computations errors

Message boards : Number crunching : Computations errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
AT Hiker

Send message
Joined: 26 Sep 19
Posts: 5
Credit: 105,524,215
RAC: 0
Message 2092 - Posted: 21 Nov 2019, 2:29:55 UTC

I am experiencing computation errors. Occurs about 2 seconds into the wu.

The last time this happened there was some sort of system problem.

Anyone else having computational errors?
ID: 2092 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anthony Ayiomamitis

Send message
Joined: 21 Jan 15
Posts: 14
Credit: 8,934,221,670
RAC: 1
Message 2093 - Posted: 21 Nov 2019, 10:02:53 UTC - in response to Message 2092.  

Same here but with me it started following a Win 10 update. Coincidence?
ID: 2093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 2094 - Posted: 21 Nov 2019, 11:58:28 UTC - in response to Message 2093.  

Same here but with me it started following a Win 10 update. Coincidence?


Yes because it's happening on Linux machines too, it's being caused by the Project sending out files that are empty ie nothing to process inside them. However some units are getting thru that work just fine so you can either switch to another Project for a bit or keep going thru them.
ID: 2094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 128
Credit: 26,856,242,362
RAC: 84,766,285
Message 2097 - Posted: 21 Nov 2019, 12:58:28 UTC - in response to Message 2093.  

It appears to be a coincidence, as this problem has been affecting a variety of my Macs, running MacOS 10.11 - 10.14. The good news is that the problem is dissipating, as my RAC is beginning to rebound. My individual Macs are recovering at different rates. This is probably due to the fact that I have not standardized my Collatz task cache across my machines; some have as few as 2 days of work, whereas others have as many as 10 days of work. So, the cache of affected tasks is being cleared at different rates.
ID: 2097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 128
Credit: 26,856,242,362
RAC: 84,766,285
Message 2098 - Posted: 21 Nov 2019, 13:00:38 UTC - in response to Message 2094.  

Hey, Mikey, I see that you have increased your hardware commitment to Collatz. Your RAC is more than 40M/day - very impressive! I may have to perform some more upgrades to stay in front of you - I have an RX580 that has yet to find a home!
ID: 2098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jack-Hiker

Send message
Joined: 3 Apr 16
Posts: 5
Credit: 180,238,914,423
RAC: 98,162,407
Message 2100 - Posted: 21 Nov 2019, 19:55:33 UTC - in response to Message 2092.  

Same here
ID: 2100 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 2101 - Posted: 21 Nov 2019, 22:42:11 UTC - in response to Message 2098.  

Hey, Mikey, I see that you have increased your hardware commitment to Collatz. Your RAC is more than 40M/day - very impressive! I may have to perform some more upgrades to stay in front of you - I have an RX580 that has yet to find a home!


I am coming for you!!! LOL!!!
I got 2 new for me gpu's acouple weeks ago and decided to see what I can do if I mostly focus on a gpu project.
I'm also earning GridCoins right now so it makes sense to focus a bit on a project.
ID: 2101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 2102 - Posted: 21 Nov 2019, 22:43:29 UTC - in response to Message 2100.  

Same here


Mine are coming back too!
ID: 2102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
crweaver

Send message
Joined: 22 Jan 11
Posts: 1
Credit: 3,594,439,351
RAC: 3,858,959
Message 2104 - Posted: 21 Nov 2019, 22:58:22 UTC
Last modified: 21 Nov 2019, 22:59:49 UTC

On one computer, I'm getting errors, but not on the other. The first has an nvidia gamer driver, the other - the working one - a studio driver. Both have gtx 1060 cards and are win10 machines. Both have the latest versions of their drivers. Hope that helps.
ID: 2104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kjohnson

Send message
Joined: 4 Jul 18
Posts: 8
Credit: 1,758,319,451
RAC: 883
Message 2105 - Posted: 22 Nov 2019, 0:14:37 UTC

Whew, this makes me feel much better. My 1080 was crunching just fine until a few days ago and now I have over 1200 failed tasks. I was unable to determine what was wrong, so hope it clears up with newly generated work.

Thanks all!
ID: 2105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Miklos M.

Send message
Joined: 2 Oct 13
Posts: 15
Credit: 3,231,503,801
RAC: 451,232
Message 2107 - Posted: 22 Nov 2019, 15:17:01 UTC - in response to Message 2092.  

Likewise, many errors after 1-2 seconds.
ID: 2107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 2108 - Posted: 22 Nov 2019, 18:00:49 UTC - in response to Message 2107.  

Likewise, many errors after 1-2 seconds.


I think it's starting to end, the problem is fileswith no data in them, andmost of my gpu's are now processing files with data in them, I'm still getting some blankones though.i don't know if they are resends or new ones though.
ID: 2108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gordon Lack

Send message
Joined: 14 Apr 12
Posts: 6
Credit: 327,905,386
RAC: 719,655
Message 2110 - Posted: 22 Nov 2019, 23:23:47 UTC - in response to Message 2108.  
Last modified: 22 Nov 2019, 23:24:10 UTC

Likewise, many errors after 1-2 seconds.


I think it's starting to end, the problem is fileswith no data in them, andmost of my gpu's are now processing files with data in them, I'm still getting some blankones though.i don't know if they are resends or new ones though.
Good point.
Each failing file will get tried 6 times before it is flagged as an error.

My failures are becoming less frequent, and those that do fail are now on their fifth or sixth attempt.
ID: 2110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Padanian

Send message
Joined: 28 May 10
Posts: 13
Credit: 2,392,304,259
RAC: 6,518
Message 2113 - Posted: 23 Nov 2019, 18:54:07 UTC - in response to Message 2110.  

Same here
ID: 2113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
seanr22a

Send message
Joined: 3 Oct 19
Posts: 14
Credit: 13,898,704,321
RAC: 9,088,998
Message 2114 - Posted: 23 Nov 2019, 21:30:12 UTC

Still very bad, the batch with deadline 12/7/19 16:28 was all error, several 100. The batch that I received before that with deadline 12/7/19 15:16 was better with 'just' every 7th or 8th job in error. Now that rig has a project backoff 23 hours so it's more or less banned because of all errors.
ID: 2114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Padanian

Send message
Joined: 28 May 10
Posts: 13
Credit: 2,392,304,259
RAC: 6,518
Message 2115 - Posted: 24 Nov 2019, 11:34:54 UTC - in response to Message 2114.  
Last modified: 24 Nov 2019, 11:35:10 UTC

Still very bad, the batch with deadline 12/7/19 16:28 was all error, several 100. The batch that I received before that with deadline 12/7/19 15:16 was better with 'just' every 7th or 8th job in error. Now that rig has a project backoff 23 hours so it's more or less banned because of all errors.


i've been backed off for 24 hours too, with some hundreds WU errored out.
Looks like the affected batch was generated on November 17th.
ID: 2115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gordon Lack

Send message
Joined: 14 Apr 12
Posts: 6
Credit: 327,905,386
RAC: 719,655
Message 2116 - Posted: 24 Nov 2019, 13:19:26 UTC - in response to Message 2104.  
Last modified: 24 Nov 2019, 13:19:47 UTC

On one computer, I'm getting errors, but not on the other. The first has an nvidia gamer driver, the other - the working one - a studio driver. Both have gtx 1060 cards and are win10 machines. Both have the latest versions of their drivers. Hope that helps.
The reported error is:
error reading input file
so the problem is nothing to do with what set-up you have - it's in the job data being sent.
ID: 2116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 2119 - Posted: 25 Nov 2019, 0:45:49 UTC - in response to Message 2116.  

On one computer, I'm getting errors, but not on the other. The first has an nvidia gamer driver, the other - the working one - a studio driver. Both have gtx 1060 cards and are win10 machines. Both have the latest versions of their drivers. Hope that helps.


The reported error is:
error reading input file
so the problem is nothing to do with what set-up you have - it's in the job data being sent.


YES the problem is the units were being sent out with blank files, once they are all gone thru though everything should be back to normal again.I have 835 workunits in progress right now so we are going thru them!! I also have 3844 workunits that have had errors, before this started I had less than10!! The max number of errors for each workunit is 6 and today I saw a bunch of _4 and a few _2 at the end of the tasks that had problems, since it starts at _0 we ARE getting there.
ID: 2119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 128
Credit: 26,856,242,362
RAC: 84,766,285
Message 2121 - Posted: 26 Nov 2019, 5:05:05 UTC - in response to Message 2119.  
Last modified: 26 Nov 2019, 5:19:10 UTC

Ignore the comment below. The Mac in question performed an auto upgrade and its NVIDIA web driver was rendered incompatible. I updated the NVIDIA driver, and now it is able to crunch Collatz GPU tasks to completion. Arrgggh, I hate when I overlook something basic like that ....

-----

I am not sure we are making progress. Several of the tasks that halted prematurely due to the characteristic computational error were created earlier today. Here are some examples:

https://boinc.thesonntags.com/collatz/result.php?resultid=53680760
https://boinc.thesonntags.com/collatz/result.php?resultid=53680761
https://boinc.thesonntags.com/collatz/result.php?resultid=53680762
https://boinc.thesonntags.com/collatz/result.php?resultid=53680763
https://boinc.thesonntags.com/collatz/result.php?resultid=53680764
https://boinc.thesonntags.com/collatz/result.php?resultid=53680765
https://boinc.thesonntags.com/collatz/result.php?resultid=53680766
https://boinc.thesonntags.com/collatz/result.php?resultid=53680767
https://boinc.thesonntags.com/collatz/result.php?resultid=53680768

More than 900 tasks have halted prematurely on this single computer (850946 - a MacPro 5,1 with a GTX1070).
ID: 2121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 2122 - Posted: 26 Nov 2019, 12:01:51 UTC - in response to Message 2121.  

Ignore the comment below. The Mac in question performed an auto upgrade and its NVIDIA web driver was rendered incompatible. I updated the NVIDIA driver, and now it is able to crunch Collatz GPU tasks to completion. Arrgggh, I hate when I overlook something basic like that ....

-----

More than 900 tasks have halted prematurely on this single computer (850946 - a MacPro 5,1 with a GTX1070).


It happens ALOT after Windows does it's updates too.
ID: 2122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Computations errors


©2020 Jon Sonntag; All rights reserved