Computation errors

Message boards : Number crunching : Computation errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
James Lee*

Send message
Joined: 10 Sep 15
Posts: 12
Credit: 26,131,897,058
RAC: 42,297,884
Message 1824 - Posted: 23 Jun 2019, 15:40:57 UTC

For some reason, all of my machines are now getting comp errors on this project. I have tried rebooting, but to no avail. What has changed, and what do I need to do? or is it a problem with the WUs themselves?

James
ID: 1824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 30 Jul 09
Posts: 55
Credit: 37,141,881,233
RAC: 69,432
Message 1825 - Posted: 23 Jun 2019, 17:31:04 UTC - in response to Message 1824.  

I've gotten some of those WUs myself. Doesn't seem consistent, though, as I've also gotten new WUs that have worked fine. Just have to wait until Jon gets back and straightens things out. :)
ID: 1825 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 10 Sep 15
Posts: 12
Credit: 26,131,897,058
RAC: 42,297,884
Message 1828 - Posted: 23 Jun 2019, 20:53:29 UTC - in response to Message 1825.  

Just for the heck of it, I lessened the buffer sizes I was using in the config file, and two machines started crunching again. But, that did not work for all the machines getting errors. I even installed the latest version of MSI Afterburner, and that did not help. Just wondering if there was a Win7 update that could be involved... Or just wait for Jon... Sheesh..

James
ID: 1828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 30 Jul 09
Posts: 55
Credit: 37,141,881,233
RAC: 69,432
Message 1829 - Posted: 23 Jun 2019, 23:40:40 UTC

Here's a new wrinkle. When the "new" WUs don't fail, they are taking about twice as long to run (at least on one of my machines)!
ID: 1829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 136
Credit: 29,616,099,029
RAC: 87,572,350
Message 1830 - Posted: 24 Jun 2019, 1:52:09 UTC - in response to Message 1828.  

Doesn't appear to be specific for windows, as my OS X 10.11 and 10.13 Macs have generated an unusually high number of computation errors.
ID: 1830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sir sant

Send message
Joined: 17 Nov 10
Posts: 6
Credit: 4,565,763,909
RAC: 0
Message 1832 - Posted: 24 Jun 2019, 3:02:52 UTC

Just got home, and my 1060gtx 3gb is producing mostly errors and project is out of work apparently:
https://www.upload.ee/image/10132105/collateral_errors.png
Haven't updated anything for a while, switching to PG for a while.
ID: 1832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 136
Credit: 29,616,099,029
RAC: 87,572,350
Message 1835 - Posted: 24 Jun 2019, 4:29:14 UTC

Jon has returned from Wisconsin and fixed the file uploading problem. Alas, it does not appear that the computation error problem has been fixed. Any idea if the problem is restricted to a specific brand and model of board? I am having problems with my only NVIDIA GTX 1060 and one of my three NVIDIA GTX 980s - all in Macs.
ID: 1835 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 136
Credit: 29,616,099,029
RAC: 87,572,350
Message 1836 - Posted: 24 Jun 2019, 4:31:12 UTC - in response to Message 1829.  

I am having a similar problem on at least one of my machines; when my GTX 1060 is capable of crunching a work unit, it takes approximately twice as long as before.
ID: 1836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 56,596,610,323
RAC: 91,219,149
Message 1838 - Posted: 24 Jun 2019, 4:59:01 UTC - in response to Message 1835.  

It seems to affect one or two of my GTX 1060's but not all of them.
ID: 1838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
valterc

Send message
Joined: 21 Sep 09
Posts: 2
Credit: 26,494,420,900
RAC: 11,732,998
Message 1839 - Posted: 24 Jun 2019, 8:24:49 UTC - in response to Message 1838.  

Just try to download some workunits. If you then look at the collatz directory inside boinc's projects you may notice that at least half of the input files have zero bytes size. This happens on a bunch of different machines, it is for sure a server problem (this kind of behavior seems the result of a 'disk full' problem)
ID: 1839 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JugNut

Send message
Joined: 26 Sep 11
Posts: 5
Credit: 2,123,485,909
RAC: 0
Message 1840 - Posted: 24 Jun 2019, 9:03:29 UTC - in response to Message 1839.  
Last modified: 24 Jun 2019, 9:18:05 UTC

LOL you beat me to it by a few minutes valterc.
Yea the ones with 1 kb or more run fine those with 0 kb will all fail.

Hopefully this info is of help to slicker.
ID: 1840 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cruncher Pete

Send message
Joined: 17 Jun 09
Posts: 1
Credit: 7,493,482,776
RAC: 256,835
Message 1841 - Posted: 24 Jun 2019, 11:37:50 UTC - in response to Message 1824.  

I confirm James post. I have not checked my machines all day but yesterday everything appeared OK. Now I have 683 Computation errors and 20 ready to report.
ID: 1841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 56,596,610,323
RAC: 91,219,149
Message 1842 - Posted: 24 Jun 2019, 16:27:14 UTC - in response to Message 1841.  

The curious thing is that some systems run clean, but others with identical configurations generate only comp errors, while others generate a mix of comp errors and clear work units.

Those handful of systems which are in comp-error city are now busily running GPUGrid or Moo until the problem is resolved on the project side.
ID: 1842 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tackleway

Send message
Joined: 29 Sep 13
Posts: 24
Credit: 3,285,756,065
RAC: 1,741,475
Message 1843 - Posted: 24 Jun 2019, 17:45:10 UTC - in response to Message 1842.  

I've noticed that if each work unit is received singly (not as a batch), they are okay,
but when a block of units arrive they are mostly failures. Check out...
C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz and look for Size 0 KB
These will fail (I abort these in my BOINC manager) but all 1 KB size are fine.
ID: 1843 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 56,596,610,323
RAC: 91,219,149
Message 1844 - Posted: 24 Jun 2019, 18:10:08 UTC - in response to Message 1843.  

My suspicion is that the project internal filter which should filter out these bad boy work units is not doing that job and is simply allowing the comp error work units to recycle and get sent out.
ID: 1844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 10 Sep 15
Posts: 12
Credit: 26,131,897,058
RAC: 42,297,884
Message 1845 - Posted: 24 Jun 2019, 18:25:55 UTC

Barry and Tackle, Thank you for the leads. Barry, LOL, I had switched over to PrimeGrid, but I have set my store work limits to .01 days of work, and switched back to Collatz on three machines I can watch to test Tackle's theory. This way I will get only one WU at a time. Does seem like Jon may want to check his filtering. I'll post back after I run for a few hours. (Just for your info, I run a few 1070's, a few 1060's, a few 1050 ti's. a 980 ti, 750ti, 660 ti, and and a 660. So, I test EVERYTHING.. LOL)
ID: 1845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnPetterson

Send message
Joined: 30 Aug 09
Posts: 2
Credit: 8,034,042,779
RAC: 10,955,451
Message 1846 - Posted: 24 Jun 2019, 19:03:13 UTC

FWIW: No failures on AMD 7870, Intel 4400, and Intel 4400. Failures on Nvidia 1030, AMD 570, Intel 4600, Intel 630. It almost looks like older GPU's work and new GPU's have failures. These all running Collatz 1.30 on Windows 7 or 10.
ID: 1846 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gordon Lack

Send message
Joined: 14 Apr 12
Posts: 6
Credit: 350,919,233
RAC: 713,707
Message 1847 - Posted: 24 Jun 2019, 20:04:38 UTC - in response to Message 1835.  
Last modified: 24 Jun 2019, 20:08:45 UTC

Any idea if the problem is restricted to a specific brand and model of board?

I've had some failures on an Intel GPU, so it seems unlikely.

PS The BoincStats data are back now too.
ID: 1847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 10 Sep 15
Posts: 12
Credit: 26,131,897,058
RAC: 42,297,884
Message 1849 - Posted: 24 Jun 2019, 22:17:21 UTC

After switching to only receiving one WU at a time, and getting rid of all the files with 0 sizes that were queued, I have NO longer had any failures on any of my machines!

Thanks guys!

Something for Jon to look at :)

James
ID: 1849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tackleway

Send message
Joined: 29 Sep 13
Posts: 24
Credit: 3,285,756,065
RAC: 1,741,475
Message 1850 - Posted: 24 Jun 2019, 22:34:13 UTC - in response to Message 1849.  
Last modified: 24 Jun 2019, 22:35:22 UTC

I've had a couple of 0's turn up since my last but this seems like the way to go for now :)
ID: 1850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Computation errors


©2020 Jon Sonntag; All rights reserved