"Computational error"

Message boards : Number crunching : "Computational error"
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Mike Belanger
Avatar

Send message
Joined: 6 Apr 20
Posts: 6
Credit: 28,599,621
RAC: 0
Message 2501 - Posted: 14 Apr 2020, 15:30:01 UTC
Last modified: 14 Apr 2020, 15:30:49 UTC

Getting a lot of "computational error" messages only a few seconds into crunching the WUs, at least on my iMac. Why?
ID: 2501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mike Belanger
Avatar

Send message
Joined: 6 Apr 20
Posts: 6
Credit: 28,599,621
RAC: 0
Message 2502 - Posted: 14 Apr 2020, 15:32:26 UTC - in response to Message 2501.  

Seems to be running OK on my two Windows machines, though.
ID: 2502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TimeLord04
Avatar

Send message
Joined: 30 Aug 18
Posts: 398
Credit: 344,280,772
RAC: 0
Message 2503 - Posted: 14 Apr 2020, 23:31:49 UTC
Last modified: 15 Apr 2020, 0:00:15 UTC

Since 3-29-2020, I too, have had issues running Collatz on my Hackintosh.
For over a year, Hackintosh-Andromeda, (iMac 18,3 - [5K Retina 27" - 2017] Profile),
was running 2 Units at a time on my MacVidCards' NVIDIA GTX-1070 8GB Card.
It ran this way FLAWLESSLY until 3-29.

Suddenly, I'm getting "Compute Errors". Now, from 3-18 to 3-28, the System was
OFF as I was on a trip. So, the 29th of March was the first day I turned Andromeda
back ON, and I took that time to Update the System to High Sierra 10.13.6 - (17G12034),
and the new NVIDIA Web Driver 387.10.10.10.40.135. THEN I resumed Crunching.
That's when I noticed the "Compute Errors"...

I thought the new NVIDIA Driver may have been the issue. I spent the next week and a
half combing through other Software Packages I had Installed on Andromeda. I even
retested Blizzard's StarCraft:Remastered. SC:R and ALL other Software ran fine.
SC:R running fine at least indicated that OpenGL on the new Driver was working fine.

FINALLY, I rolled back the Update of MacOS and went back to (17G11023), and the
(.134) NVIDIA Web Driver. I then Resumed BOINC and STILL got "Compute Errors".

After MUCH thought, I decided to Reboot to Windows 7 Pro SP-1 x64. There, under
the Prometheus Profile, I ran BOINC and Relaunched Collatz. The difference here is
that the Windows-NVIDIA (WHQL) 388.13 Driver DOES allow OpenCL to run on BOTH
of my GPUs, (I have a Secondary EVGA GTX-1050 2GB Card in the System), and Collatz
ran 1 Unit Per Card and ALL Units completed NORMALLY.

I went back into Andromeda, Reinstalled the (.134) Driver on (17G11023). THEN Resumed
Collatz AGAIN, AND STILL got "Compute Errors"!

On a complete whim, I opened the app_config.xml File with TextEdit, and changed to
running JUST 1 Unit on the 1070 at a time. VOILA!!!! I can now Crunch Collatz again
in MacOS!!!!!

SEE Thread here: MacOS High Sierra - (17G12034) - Web Driver 387.10.10.10.40.135

From that point, I Re-Updated to (17G12034) and the (.135) Driver and still, all is well
on the 1070 now running ONLY 1 Unit at a time.

I think it's possible, (for my issue), that my 1070 Card, (after running 2 Units at a time for
over a year), with MacOS Limitations of NOT allowing GPU Fan Speed Controls to ramp
up the fan on the GPU under Load has somehow DAMAGED the Compute Cores of
my 1070.... Can't prove that, though.


[Hackintosh-Andromeda - System Specs:]

Profile: iMac 18,3.

i7 7700K 4.2GHz, 4c/8t.
Gigabyte GA-H270-HD3 MOBO
CoolerMaster Hyper212-EVO
32GB Corsair Vengeance LPX DDR4-CL14 2400MHz RAM - (4x 8GB)
Fenvi FV-T919 WiFi-AC & Bluetooth 4 PCI-e x1 Card. Lite-On DVD-Burner
Silverstone FS303B 3-Bay Hot Swap Bay.
(Bay-1) - Samsung VNAND 860 Pro 1TB SSD - High Sierra 10.13.6 - (17G12034), (APFS), w/Web Driver 387.10.10.10.40.135 & CUDA Driver 418.163.
(Bay-2) - Samsung VNAND 860 Pro 1TB SSD - Win 7 Pro SP-1 x64 w/NVIDIA (WHQL) Driver 388.13.
(Bay-3) - Western Digital Black SATA 2TB HD - MacOS Time Machine/Games - (HFS+), 1TB Each Partition.
One MacVidCards' GTX-1070 8GB GDDR5 VRAM - Low Power - (one 8 Pin Connector)
One EVGA GTX-1050 2GB GDDR5 VRAM.
Rosewill, NightHawk-117, E-ATX Case.
Corsair HX750i Platinum 750 Watt PSU.
[EDIT:] Forgot my Mouse and Keyboard...
Logitech M510 (USB Fob) Wireless Mouse.
MacAlly iKeySlim Full Sized Keyboard with 10-Key.
SabreNT USB Audio.
Logitech AK5370 USB Microphone
Logitech Extreme3DPro USB Joystick
UNITEK 10-Port USB 3.0 Hub.
ASUS VE278 27" Monitor on HDMI on GTX-1070.
Altec Lansing 45 2.1 Speaker Set.


TimeLord04
Have TARDIS will travel!!!
Come along, K-9!
ID: 2503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAREL

Send message
Joined: 15 Mar 16
Posts: 21
Credit: 45,654,040
RAC: 0
Message 2637 - Posted: 10 May 2020, 13:28:15 UTC

hi all, I made a vid on youtube benching wu's for RADEON VII Gigabyte // Collatz Conjecture 1_WUs BOINC link is https://www.youtube.com/watch?v=ZNzuN4S80Ek

RADEON R9 290X GIGABYTE // Collatz Conjecture 1_WUs BOINC
https://www.youtube.com/watch?v=q29ihVlw8t4

really impressed with this card
ID: 2637 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nomis

Send message
Joined: 3 May 11
Posts: 1
Credit: 100,900,606
RAC: 16,607
Message 2758 - Posted: 13 Jun 2020, 18:52:24 UTC

Leaving the project now after several computations errors after spending almost 2 days on each task.
Other projects have not this frustrating problem and gives much more credit (except for Rosatta, which i even worse)
ID: 2758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 927
Credit: 24,523,632,110
RAC: 0
Message 2759 - Posted: 13 Jun 2020, 20:50:33 UTC - in response to Message 2758.  

Leaving the project now after several computations errors after spending almost 2 days on each task.
Other projects have not this frustrating problem and gives much more credit (except for Rosatta, which i even worse)


That's because you are running them on your cpu not your graphics card.
ID: 2759 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3254 - Posted: 28 Mar 2021, 16:50:42 UTC

Two hundred and ninety WU's errored out in the last few days. Why?
ID: 3254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 927
Credit: 24,523,632,110
RAC: 0
Message 3255 - Posted: 29 Mar 2021, 11:36:41 UTC - in response to Message 3254.  

Two hundred and ninety WU's errored out in the last few days. Why?


Boinc error -102 means it's having problems finding your drive, are you running multiple units at once? If so try using the optimization codes and run only one unit at a time and see if that helps. Both of your pc's are having the exact same problem one just lots more than the other.
ID: 3255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3257 - Posted: 30 Mar 2021, 4:04:01 UTC - in response to Message 3255.  

Two hundred and ninety WU's errored out in the last few days. Why?


Boinc error -102 means it's having problems finding your drive, are you running multiple units at once? If so try using the optimization codes and run only one unit at a time and see if that helps. Both of your pc's are having the exact same problem one just lots more than the other.

________________________________________
NO. I am not running multiple WU"S. Absolute no. Nothing has changed even an iota. After the last Collatz fiasco, this started. My cache size is set to 10+10 days, so these error phenomena showed up late in my instance but there has been grumbling on the Boinc Stats, ShoutBox. Todays or the record that stands to date is 345. Better to point all the microscopes at the Collatz server. I know it is easier to pick up the corner of a carpet and shove everything underneath. Which means my end.
Stacie, I have a Jinni on my computer now or maybe a ghost.
I can assure everyone that I do not put a leash on my hard drives and take them out for a walk and neither do they play "Peek a Boo".
Should I do a detach from the project then re-attach. Let me think.
ID: 3257 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3258 - Posted: 30 Mar 2021, 11:21:48 UTC

Now error rate 1639. Einstein and GPUGrid can easily find my hard drives.
ID: 3258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 927
Credit: 24,523,632,110
RAC: 0
Message 3259 - Posted: 30 Mar 2021, 12:09:39 UTC - in response to Message 3258.  

Now error rate 1639. Einstein and GPUGrid can easily find my hard drives.


1639 seems to be a SQL Server problem when you use a space in the folder name when you install it, ie 'SQL Server 2020' has 2 spaces in it and it will fail. Boinc doesn't use SQL Server on your pc so I have no clue what that means for a Boinc error code
ID: 3259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 16 Jul 16
Posts: 5
Credit: 41,125,052
RAC: 833
Message 3261 - Posted: 30 Mar 2021, 21:16:22 UTC
Last modified: 30 Mar 2021, 21:17:04 UTC

4 tasks all crashed
https://boinc.thesonntags.com/collatz/workunit.php?wuid=102580094
ID: 3261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 927
Credit: 24,523,632,110
RAC: 0
Message 3264 - Posted: 31 Mar 2021, 10:43:59 UTC - in response to Message 3261.  

4 tasks all crashed
https://boinc.thesonntags.com/collatz/workunit.php?wuid=102580094


Have you installed this?
https://visualstudio.microsoft.com/downloads/

It's on the front page of the project that it needs to be installed, maybe the one you have is older?
ID: 3264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3266 - Posted: 31 Mar 2021, 12:21:35 UTC

The problem is that we have been computing with the stock equipment and OS for years. All the tasks that have crashed, have crashed five times before on different machines. After six crashes it is written in red "bug in the script". Do not believe me, just go and check yourselves. Now my crashed score is 1633 WU's. It has never happened before, why now? It cannot excess the hard drive well, well and good. How on earth are Einstein and GPUGrid plus CPDN WU's finding the hard drive? Instead of giving off the cuff answers, it is time to think. I have removed one machine which was just about crashing all the WU's but I am not happy with the other one either. 1633 WU's are quite a few days of just wasting energy, time plus checking of the computers for problems which are non-existent. Maybe in the mind.
ID: 3266 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 195
Credit: 79,658,060,533
RAC: 86,750,230
Message 3273 - Posted: 2 Apr 2021, 15:18:25 UTC - in response to Message 3266.  
Last modified: 2 Apr 2021, 15:19:00 UTC

I agree with your frustration. I just brought a "new" Mac Mini to the project on March 30 < https://boinc.thesonntags.com/collatz/show_host_detail.php?hostid=879920 >. At last check, that computer has attempted to crunch 2191 tasks. 213 of those tasks prematurely terminated with the "Error while computing". With all due respect to Mikey, who has been a tremendous help to me and other members of the Collatz community over the years, I don't think what he has proposed applies to these errors. Rather, I think there is a problem with the project.

Yet, I can't criticize Slicker. As others have noted quite eloquently, this is a volunteer effort ("labor of love") for him and life events can impact his ability to solve the project's problems. Moreover, having participated in DC projects since June 2001, I think that Collatz has been remarkably stable. For example, even today I can only run one class of Einstein AMD GPU tasks on my Macs without having a validation error rate in excess of 25%.

I have allocated approximately 20% of my GPU resources to Einstein and I allocate 100% of my available CPU resources to Rosetta. But, otherwise, I am going to stick with Collatz. Nonetheless, I would understand if you and others decide to leave Collatz.
ID: 3273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3275 - Posted: 2 Apr 2021, 19:15:53 UTC

Still too early to comment but some problems seem to be self-created I think. CPDN (Climate) still uses 32-bit lib's, those WU's are in Unix. Changed, played around with beyond me. On the front page, it has this in writing but to read that part you have to read the page carefully. My WU's were erroring out in Windows and I was getting a "Signal 11"? After running around found out it was a Unix signal.
This project also has the same story written on the front page but in all my years I never had this problem before so I did not pay much attention. How can someone, the silly WU"s are marked x64? Mikey has pointed out this fact in another thread that I came across today. I have downloaded that Visual Studio 2019 and installed it. I have stopped Boinc from uploading the results and so far (keeping my fingers crossed) no WU has errored out in the last twelve hours. I will observe for another twelve hours then allow BOINC to upload. I hope the problem has been solved otherwise? Back to square one.
ID: 3275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 23 Sep 12
Posts: 195
Credit: 79,658,060,533
RAC: 86,750,230
Message 3276 - Posted: 3 Apr 2021, 2:10:22 UTC - in response to Message 3273.  

I agree with your frustration. I just brought a "new" Mac Mini to the project on March 30 < https://boinc.thesonntags.com/collatz/show_host_detail.php?hostid=879920 >. At last check, that computer has attempted to crunch 2191 tasks. 213 of those tasks prematurely terminated with the "Error while computing". With all due respect to Mikey, who has been a tremendous help to me and other members of the Collatz community over the years, I don't think what he has proposed applies to these errors. Rather, I think there is a problem with the project.

Yet, I can't criticize Slicker. As others have noted quite eloquently, this is a volunteer effort ("labor of love") for him and life events can impact his ability to solve the project's problems. Moreover, having participated in DC projects since June 2001, I think that Collatz has been remarkably stable. For example, even today I can only run one class of Einstein AMD GPU tasks on my Macs without having a validation error rate in excess of 25%.

I have allocated approximately 20% of my GPU resources to Einstein and I allocate 100% of my available CPU resources to Rosetta. But, otherwise, I am going to stick with Collatz. Nonetheless, I would understand if you and others decide to leave Collatz.


At last check of the aforementioned computer, 229 tasks have prematurely terminated (with the "Error while computing" message) out of 2609 total tasks. So, it would appear that I am still receiving problematic tasks, although the percentage of problematic tasks appears to be dropping.
ID: 3276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3277 - Posted: 3 Apr 2021, 4:03:19 UTC

The only thing I can say is, copied and pasted from Home Page.
"Windows applications require the Microsoft Visual C++ Redistributable for Visual Studio 2017. It is recommended to install both the x86 and x64 versions since BOINC may decide to run the 32-bit app if no 64-bit work units are available when requesting work".
After so many years of sleeping peacefully and letting us sleep peacefully, someone has decided to send out 32-bit app's. Ask Mikey or Slicker. No crashed WU's to report so far. I wish I knew the reason why in simple language.
ID: 3277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 927
Credit: 24,523,632,110
RAC: 0
Message 3278 - Posted: 3 Apr 2021, 10:29:25 UTC - in response to Message 3277.  

The only thing I can say is, copied and pasted from Home Page.
"Windows applications require the Microsoft Visual C++ Redistributable for Visual Studio 2017. It is recommended to install both the x86 and x64 versions since BOINC may decide to run the 32-bit app if no 64-bit work units are available when requesting work".
After so many years of sleeping peacefully and letting us sleep peacefully, someone has decided to send out 32-bit app's. Ask Mikey or Slicker. No crashed WU's to report so far. I wish I knew the reason why in simple language.


The Project makes both types of tasks but lately fewer 32bit OS type machines are coming here, too old and slow I am guessing, meaning all the 64bit tasks are being gobbled up by others leaving only 32bit tasks to be sent out. I think Slicker may need to do a dive into how many 32bit tasks are being crunched on 64 bit machines because there are no more 64bit tasks and fiddle with the amounts of each type of tasks being made.
ID: 3278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 20 Oct 11
Posts: 48
Credit: 4,131,507,138
RAC: 8,531,345
Message 3279 - Posted: 3 Apr 2021, 11:01:17 UTC
Last modified: 3 Apr 2021, 11:05:03 UTC

The error rate has slowed down but still, it now stands at 1701? O' Well! Better than before. Frustrating.
ID: 3279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : "Computational error"


©2022 Jon Sonntag; All rights reserved