Computation Errors

Message boards : Number crunching : Computation Errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Gero-T

Send message
Joined: 9 Oct 16
Posts: 9
Credit: 57,231,987,399
RAC: 0
Message 1226 - Posted: 7 Nov 2018, 14:13:04 UTC - in response to Message 1225.  
Last modified: 7 Nov 2018, 14:14:34 UTC

my *.config file looks like this:
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
cache_sieve=1
sieve_size=30

increasing threads=8 results in error !!! on my PC
For 1070 and 1080 cards.
ID: 1226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile step2000
Avatar

Send message
Joined: 1 Aug 13
Posts: 16
Credit: 3,944,129,812
RAC: 7,087,852
Message 1228 - Posted: 8 Nov 2018, 14:16:57 UTC - in response to Message 1226.  

I would increase it as 8 division could be the issue with RAM/Video combo. Try 9 or 11 and see if that hangs. If so then back to 7 would be my direction. You also said it fails but is it a hard fail, does the system lock etc.

If you are also running other processes on these systems you may need to back things down to share the resources properly. Many times the use of some other processes just eat the GPU on thinks like GSYNC and even ADOBE apps.
Just me being a Math Geek!
ID: 1228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gero-T

Send message
Joined: 9 Oct 16
Posts: 9
Credit: 57,231,987,399
RAC: 0
Message 1235 - Posted: 10 Nov 2018, 0:09:04 UTC - in response to Message 1228.  

No no. The main prob is: Error: GPU steps do not match CPU steps. Workunit processing aborted.

Further on it helped to remove sleep=1 or 0
and
to set threads=8
with warning of increasing threads=9 or more
wu fails.
I think the prob with new Nvidia drivers (4xx.xx) on Windows (10) is solved.
ID: 1235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BobMALCS

Send message
Joined: 7 Feb 14
Posts: 2
Credit: 214,068,556
RAC: 98,056
Message 1288 - Posted: 1 Dec 2018, 20:48:57 UTC

Installed NVIDIA 417.01 and Collatz immediately crashed out with Error: GPU steps do not match CPU steps. Workunit processing aborted.

Other projects are running ok.

I'll wait until this problem is reported fixed before running Collatz again.
ID: 1288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Padanian

Send message
Joined: 28 May 10
Posts: 8
Credit: 1,407,385,311
RAC: 206,577
Message 1290 - Posted: 2 Dec 2018, 8:47:54 UTC
Last modified: 2 Dec 2018, 8:48:12 UTC

Same here with 416.34.
A bunch of wus failed on me with threads=10
ID: 1290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 458
Credit: 11,729,045,177
RAC: 10,174,984
Message 1291 - Posted: 2 Dec 2018, 11:42:53 UTC - in response to Message 1288.  

Installed NVIDIA 417.01 and Collatz immediately crashed out with Error: GPU steps do not match CPU steps. Workunit processing aborted.

Other projects are running ok.

I'll wait until this problem is reported fixed before running Collatz again.


Nope the 400 series driver don't work here with less than the brand new 2000 series cards, I've been thru several.
Just uninstall the brand new driver and unless you uninstalled the older drivers they are still in there and will take over, after a reboot,
and you will be crunching again.
ID: 1291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BobMALCS

Send message
Joined: 7 Feb 14
Posts: 2
Credit: 214,068,556
RAC: 98,056
Message 1294 - Posted: 3 Dec 2018, 1:09:01 UTC - in response to Message 1291.  

If all, or even some, of the other projects I run on the NVIDIA GPU failed with the 400 series drivers I would reinstall the latest 300 driver. However, Collatz is the only one to fail.

I'll stay with the 400 series.
ID: 1294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 458
Credit: 11,729,045,177
RAC: 10,174,984
Message 1295 - Posted: 3 Dec 2018, 11:04:55 UTC - in response to Message 1294.  

If all, or even some, of the other projects I run on the NVIDIA GPU failed with the 400 series drivers I would reinstall the latest 300 driver. However, Collatz is the only one to fail.

I'll stay with the 400 series.


Sounds good, happy crunching.
ID: 1295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Shadak

Send message
Joined: 6 Nov 09
Posts: 1
Credit: 4,846,985
RAC: 0
Message 1309 - Posted: 10 Dec 2018, 19:14:43 UTC - in response to Message 1294.  

If all, or even some, of the other projects I run on the NVIDIA GPU failed with the 400 series drivers I would reinstall the latest 300 driver. However, Collatz is the only one to fail.

I'll stay with the 400 series.


ich have no problems with the 416er. (Geforce 1070)
ID: 1309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 35
Credit: 3,183,984,590
RAC: 2,238
Message 1739 - Posted: 7 May 2019, 22:02:02 UTC

Random Error while computing. Usually 5 - 6 seconds into processing. Error repeats ever several hours.

- Outcome Computation error
- Client state Compute error
- Exit status -102 (0xFFFFFF9A) ERR_READ

System Config: Supermicro SYS-7047R-TRF 4U Server, X9DA7 MB, two Xeon E5-2697 V2, two Nvidia GTX 1080 TI

Have used DDU to clear the drivers and reinstalled older driver, v388.13. No difference with newer driver.
GPU is running at stock factory settings.

Configuration:

verbose=1
kernels_per_reduction=48
threads=9
lut_size=18
reduce_CPU=0
sieve_size=30
cache_sieve=1
sleep=0

Have tried this:

verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=1
sleep=0

Now these are two cards, not new, currently running 1 card at a time due to thermals, summer weather. Both cards behave similarly. Pretty sure its not the cards.

Any ideas?
ID: 1739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 458
Credit: 11,729,045,177
RAC: 10,174,984
Message 1740 - Posted: 8 May 2019, 10:14:06 UTC - in response to Message 1739.  

Random Error while computing. Usually 5 - 6 seconds into processing. Error repeats ever several hours.

- Outcome Computation error
- Client state Compute error
- Exit status -102 (0xFFFFFF9A) ERR_READ

System Config: Supermicro SYS-7047R-TRF 4U Server, X9DA7 MB, two Xeon E5-2697 V2, two Nvidia GTX 1080 TI

Have used DDU to clear the drivers and reinstalled older driver, v388.13. No difference with newer driver.
GPU is running at stock factory settings.

Configuration:

verbose=1
kernels_per_reduction=48
threads=9
lut_size=18
reduce_CPU=0
sieve_size=30
cache_sieve=1
sleep=0

Have tried this:

verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=1
sleep=0

Now these are two cards, not new, currently running 1 card at a time due to thermals, summer weather. Both cards behave similarly. Pretty sure its not the cards.

Any ideas?


Stop using the config files for a day and see if the errors go away, the config files push the card to higher than normal levels and yours could just be getting old and can't handle it.
ID: 1740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 35
Credit: 3,183,984,590
RAC: 2,238
Message 1744 - Posted: 8 May 2019, 22:23:11 UTC - in response to Message 1740.  

That's a possibility. I'll give it a try. Any knowledge about the meaning of Exit status -102 (0xFFFFFF9A) ERR_READ?
ID: 1744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile step2000
Avatar

Send message
Joined: 1 Aug 13
Posts: 16
Credit: 3,944,129,812
RAC: 7,087,852
Message 1745 - Posted: 10 May 2019, 11:43:50 UTC - in response to Message 1744.  

The Hex Memory address is where the application failed to read the memory point. The read error in this range is strange. You could have other issues happening that are masked. I would try no config file and also I would move your ram on the motherboard as I think it is slot one related.
Just me being a Math Geek!
ID: 1745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 458
Credit: 11,729,045,177
RAC: 10,174,984
Message 1748 - Posted: 10 May 2019, 23:48:34 UTC - in response to Message 1744.  

That's a possibility. I'll give it a try. Any knowledge about the meaning of Exit status -102 (0xFFFFFF9A) ERR_READ?


ERR_READ -102 - BOINC has a problem reading from the drive. Maybe you do not have rights to read from the BOINC directory.

Solution: Make sure you have rights in your operating system to read from the drive. Check your drive for consistency, in Windows using chkdsk.

https://boinc.mundayweb.com/wiki/index.php?title=Error_code_-100_to_-110_explained
ID: 1748 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 5
Credit: 1,579,959,743
RAC: 6,258
Message 1749 - Posted: 11 May 2019, 6:38:41 UTC - in response to Message 1739.  

Random Error while computing. Usually 5 - 6 seconds into processing. Error repeats ever several hours.

- Outcome Computation error
- Client state Compute error
- Exit status -102 (0xFFFFFF9A) ERR_READ

...........

Any ideas?


My two cents worth of opinions start here:

This error 102 appeared first ON or after May 1st --- BEFORE this date everything was fine!

My rigs work OK on ALL other projects.
It is NOT a SSD or HDD problem.
NOR is it an access rights problem to any files.
NOR is it an OS (WIN7or WIN10) specific problem.
NOR is it a GPU or its driver version issue.

I would bet that something must have changed on the project side!

As others have said: just ignore it ... which I don't really want to support or accept.
These types of errors usually tend to increase in frequency and become more complex.

End of my two cents ...

HAPPY crunching to all.
ID: 1749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Computation Errors


©2019 Jon Sonntag; All rights reserved