Message boards :
Number crunching :
Computation errors
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
BarryAZ Send message Joined: 21 Aug 09 Posts: 47 Credit: 72,395,848,977 RAC: 90,483,733 |
By the way -- I've had no problems on a 1660ti I deployed last week -- it takes under 13 minutes per WU. |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 805 Credit: 24,496,510,371 RAC: 2,626,508 |
By the way -- I've had no problems on a 1660ti I deployed last week -- it takes under 13 minutes per WU. Your 1660Ti numbers: 770.22 0.20 37,625.13 Collatz Sieve v1.30 (opencl_nvidia_gpu) windows_x86_64 And my 1660Ti numbers: 346.82 1.42 29,795.14 Collatz Sieve v1.30 (opencl_nvidia_gpu) windows_x86_64 I'm using these optimization codes for mine: verbose=1 kernels_per_reduction=48 threads=7 lut_size=17 sieve_size=30 cache_sieve=1 sleep=0 I run 1 wu at a time. You get more credits per wu but also take twice as long to run a wu. |
James Lee* Send message Joined: 10 Sep 15 Posts: 12 Credit: 26,241,480,048 RAC: 560,100 |
Problems for comp errors have crept back in. Even saw a few times when I got a 0 file size for a WU for a single WU file download. There still is a major problem of sending out WU files with a 0 KB size. ALL of those sent fail. Seems the filtering problem still needs to be fixed. Where is the fixer? James. |
BarryAZ Send message Joined: 21 Aug 09 Posts: 47 Credit: 72,395,848,977 RAC: 90,483,733 |
James, the Collatz project is a one person project -- he was out of town over the weekend and has other 'life' to balance. I suspect he is trying to figure this one out and it will simply take a certain amount of guess work and luck on his part and a fair amount of patience on our part. |
Kombizahl Send message Joined: 29 Sep 09 Posts: 2 Credit: 985,680,217 RAC: 1,460,334 |
On my Rig are sometimes errors, not at all tasks. |
BarryAZ Send message Joined: 21 Aug 09 Posts: 47 Credit: 72,395,848,977 RAC: 90,483,733 |
So I have three "classes" of Collatz systems in the current computation error situation 1) More than half my systems process work units without any computation error work units 2) A few systems have an intermix of computation error work units and clean work units. 3) Several systems have no luck at all -- all work units received are the comp error work units. There doesn't seem to be anything specific at all here -- I have a mix of systems (Windows 10, 8.1 and 7). I have a mix of GPU's (GTX 1050, 1050ti, 1060, 1060ti, and a single 1660ti) |
David Riese![]() Send message Joined: 23 Sep 12 Posts: 152 Credit: 46,710,816,633 RAC: 120,601,101 |
Echoing Barry's comments, this appears to be a tough nut to crack and we should have patience while Jon tries to sort it out. By the way, in my experience, the tasks that terminate with an error in 2 seconds don't hurt my machines' throughput very much - typically after no more than 5-6 such tasks, the machine will start working on a task that can be crunched to completion. So, relatively speaking, this aspect of the problem doesn't waste a lot of machine time and electricity. However, in my experience, throughput is heavily impacted by the fact that tasks that can be crunched to completion now take twice as long to complete. For example, my computer 850946 (Mac Pro 5,1; GTX 1060) used to take 12-13 minutes to crunch a GPU task but now takes 24-25 minutes to crunch a GPU tasks. I also note that CPU usage has increased from 12-13 seconds per task to 160-170 seconds per task. Thus, this machine used to average 27-31K credits/completed task but now averages 35-39K credits/completed task. So, even though it takes twice as long to crunch a task to completion, the throughput of this machine has been cut by only ~40%. Fortunately, only two of my boxes have been affected by this problem and I hope that you too have had only a fraction of your boxes affected. Hang in there! I am sure that Jon is working as hard as he can to fix things. ![]() |
![]() Send message Joined: 30 Jul 09 Posts: 55 Credit: 37,297,578,656 RAC: 70,509 |
David, I had the same problem with WUs taking twice as long (but only on one machine). The fix for that problem, in my case, was to add the parameters back into the collatz_sieve_1.30_windows_x86_64__opencl_nvidia_gpu file, which seemed to have been cleared. I hope that's all your problem is too. Steve ![]() |
David Riese![]() Send message Joined: 23 Sep 12 Posts: 152 Credit: 46,710,816,633 RAC: 120,601,101 |
David, Steve: Thanks for the suggestion! Yep, the custom config file had been replaced by a blank file. Restored the custom file and the throughput returned. Hoorah! Regards, Dave ![]() |
BarryAZ Send message Joined: 21 Aug 09 Posts: 47 Credit: 72,395,848,977 RAC: 90,483,733 |
One of the other factors with computation error work units is that as all the error work units are sent back, it appears they are dumped right into the the workstation queue to be downloaded. So comp error work units are getting recycled. |
Tackleway Send message Joined: 29 Sep 13 Posts: 24 Credit: 3,572,856,980 RAC: 1,616,309 |
One of the other factors with computation error work units is that as all the error work units are sent back, it appears they are dumped right into the the workstation queue to be downloaded. So comp error work units are getting recycled. You're quite correct I've looked back at some of my 600 aborted units and they're being recycled over and over to other users! |
Jack-Hiker Send message Joined: 3 Apr 16 Posts: 5 Credit: 207,789,012,424 RAC: 60,104,517 |
I have had to change my config file to, as follows: verbose=1 kernels_per_reduction=48 sleep=1 threads=8 lut_size=17 sieve_size=28 cache_sieve=1 These settings seem to work fine using Evga 2080 graphic cards. Downside is about 360 sec. comp time now. Before comp time was about 205 sec. -------------------------------- Before config was (below) and I got nothing but comp errors verbose=1 kernels_per_reduction=50 sleep=1 threads=8 lut_size=18 sieve_size=30 cache_sieve=1 Hope this info is of some use. I wonder if Collatz is wanting to award less credits |
![]() Send message Joined: 30 Jul 09 Posts: 55 Credit: 37,297,578,656 RAC: 70,509 |
Jack, Bet you could get away with a thread setting of 9 and reduce the time just a bit more. And not to be redundant but did you check the file size in the project directory when you were getting all of the compute errors? If they were all 0, then it wouldn't matter what your setting were. (Sorry if I'm being obvious) ![]() |
©2021 Jon Sonntag; All rights reserved