Computation errors

Message boards : Number crunching : Computation errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 53,687,615,713
RAC: 88,259,630
Message 1851 - Posted: 24 Jun 2019, 23:00:53 UTC - in response to Message 1845.  

By the way -- I've had no problems on a 1660ti I deployed last week -- it takes under 13 minutes per WU.
ID: 1851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 732
Credit: 22,079,867,274
RAC: 808,277
Message 1852 - Posted: 25 Jun 2019, 10:19:29 UTC - in response to Message 1851.  

By the way -- I've had no problems on a 1660ti I deployed last week -- it takes under 13 minutes per WU.


Your 1660Ti numbers:
770.22 0.20 37,625.13 Collatz Sieve v1.30 (opencl_nvidia_gpu)
windows_x86_64

And my 1660Ti numbers:
346.82 1.42 29,795.14 Collatz Sieve v1.30 (opencl_nvidia_gpu)
windows_x86_64

I'm using these optimization codes for mine:
verbose=1
kernels_per_reduction=48
threads=7
lut_size=17
sieve_size=30
cache_sieve=1
sleep=0

I run 1 wu at a time.

You get more credits per wu but also take twice as long to run a wu.
ID: 1852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 10 Sep 15
Posts: 12
Credit: 23,873,065,227
RAC: 69,222,083
Message 1854 - Posted: 25 Jun 2019, 11:27:08 UTC

Problems for comp errors have crept back in. Even saw a few times when I got a 0 file size for a WU for a single WU file download. There still is a major problem of sending out WU files with a 0 KB size. ALL of those sent fail.

Seems the filtering problem still needs to be fixed. Where is the fixer?

James.
ID: 1854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 53,687,615,713
RAC: 88,259,630
Message 1855 - Posted: 25 Jun 2019, 16:56:57 UTC - in response to Message 1854.  

James, the Collatz project is a one person project -- he was out of town over the weekend and has other 'life' to balance.

I suspect he is trying to figure this one out and it will simply take a certain amount of guess work and luck on his part and a fair amount of patience on our part.
ID: 1855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kombizahl

Send message
Joined: 29 Sep 09
Posts: 2
Credit: 751,578,989
RAC: 30,941
Message 1857 - Posted: 25 Jun 2019, 18:17:07 UTC

On my Rig are sometimes errors, not at all tasks.
ID: 1857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 53,687,615,713
RAC: 88,259,630
Message 1858 - Posted: 25 Jun 2019, 19:16:25 UTC - in response to Message 1851.  

So I have three "classes" of Collatz systems in the current computation error situation

1) More than half my systems process work units without any computation error work units

2) A few systems have an intermix of computation error work units and clean work units.

3) Several systems have no luck at all -- all work units received are the comp error work units.

There doesn't seem to be anything specific at all here -- I have a mix of systems (Windows 10, 8.1 and 7). I have a mix of GPU's (GTX 1050, 1050ti, 1060, 1060ti, and a single 1660ti)
ID: 1858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 128
Credit: 26,858,307,845
RAC: 84,806,783
Message 1859 - Posted: 25 Jun 2019, 19:16:48 UTC - in response to Message 1855.  

Echoing Barry's comments, this appears to be a tough nut to crack and we should have patience while Jon tries to sort it out. By the way, in my experience, the tasks that terminate with an error in 2 seconds don't hurt my machines' throughput very much - typically after no more than 5-6 such tasks, the machine will start working on a task that can be crunched to completion. So, relatively speaking, this aspect of the problem doesn't waste a lot of machine time and electricity.

However, in my experience, throughput is heavily impacted by the fact that tasks that can be crunched to completion now take twice as long to complete. For example, my computer 850946 (Mac Pro 5,1; GTX 1060) used to take 12-13 minutes to crunch a GPU task but now takes 24-25 minutes to crunch a GPU tasks. I also note that CPU usage has increased from 12-13 seconds per task to 160-170 seconds per task. Thus, this machine used to average 27-31K credits/completed task but now averages 35-39K credits/completed task. So, even though it takes twice as long to crunch a task to completion, the throughput of this machine has been cut by only ~40%. Fortunately, only two of my boxes have been affected by this problem and I hope that you too have had only a fraction of your boxes affected. Hang in there! I am sure that Jon is working as hard as he can to fix things.
ID: 1859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 30 Jul 09
Posts: 55
Credit: 37,141,881,233
RAC: 1,823,270
Message 1860 - Posted: 25 Jun 2019, 22:59:24 UTC - in response to Message 1859.  

David,
I had the same problem with WUs taking twice as long (but only on one machine). The fix for that problem, in my case, was to add the parameters back into the collatz_sieve_1.30_windows_x86_64__opencl_nvidia_gpu file, which seemed to have been cleared. I hope that's all your problem is too.

Steve
ID: 1860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Riese
Avatar

Send message
Joined: 23 Sep 12
Posts: 128
Credit: 26,858,307,845
RAC: 84,806,783
Message 1861 - Posted: 26 Jun 2019, 2:37:11 UTC - in response to Message 1860.  

David,
I had the same problem with WUs taking twice as long (but only on one machine). The fix for that problem, in my case, was to add the parameters back into the collatz_sieve_1.30_windows_x86_64__opencl_nvidia_gpu file, which seemed to have been cleared. I hope that's all your problem is too.

Steve


Steve:

Thanks for the suggestion! Yep, the custom config file had been replaced by a blank file. Restored the custom file and the throughput returned. Hoorah!

Regards,

Dave
ID: 1861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 21 Aug 09
Posts: 47
Credit: 53,687,615,713
RAC: 88,259,630
Message 1862 - Posted: 26 Jun 2019, 5:18:17 UTC - in response to Message 1859.  

One of the other factors with computation error work units is that as all the error work units are sent back, it appears they are dumped right into the the workstation queue to be downloaded. So comp error work units are getting recycled.
ID: 1862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tackleway

Send message
Joined: 29 Sep 13
Posts: 24
Credit: 3,229,739,207
RAC: 1,764,617
Message 1863 - Posted: 26 Jun 2019, 15:05:21 UTC - in response to Message 1862.  

One of the other factors with computation error work units is that as all the error work units are sent back, it appears they are dumped right into the the workstation queue to be downloaded. So comp error work units are getting recycled.


You're quite correct I've looked back at some of my 600 aborted units and they're being recycled over and over to other users!
ID: 1863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jack-Hiker

Send message
Joined: 3 Apr 16
Posts: 5
Credit: 180,242,876,762
RAC: 98,370,023
Message 1864 - Posted: 26 Jun 2019, 18:48:30 UTC - in response to Message 1863.  

I have had to change my config file to, as follows:
verbose=1
kernels_per_reduction=48
sleep=1
threads=8
lut_size=17
sieve_size=28
cache_sieve=1
These settings seem to work fine using Evga 2080 graphic cards. Downside is about 360 sec. comp time now. Before comp time was about 205 sec.
--------------------------------
Before config was (below) and I got nothing but comp errors
verbose=1
kernels_per_reduction=50
sleep=1
threads=8
lut_size=18
sieve_size=30
cache_sieve=1

Hope this info is of some use. I wonder if Collatz is wanting to award less credits
ID: 1864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 30 Jul 09
Posts: 55
Credit: 37,141,881,233
RAC: 1,823,270
Message 1865 - Posted: 26 Jun 2019, 20:22:42 UTC - in response to Message 1864.  
Last modified: 26 Jun 2019, 20:24:34 UTC

Jack,
Bet you could get away with a thread setting of 9 and reduce the time just a bit more. And not to be redundant but did you check the file size in the project directory when you were getting all of the compute errors? If they were all 0, then it wouldn't matter what your setting were. (Sorry if I'm being obvious)
ID: 1865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Computation errors


©2020 Jon Sonntag; All rights reserved