Optimizing Collatz Sieve
log in

Advanced search

Message boards : Number crunching : Optimizing Collatz Sieve

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author Message
Anthony Ayiomamitis
Send message
Joined: 21 Jan 15
Posts: 48
Credit: 1,047,242,999
RAC: 10,082,642
Message 21977 - Posted: 3 Feb 2016, 9:36:45 UTC

Any suggestions for an Intel HD 4000 integrated GPU? I currently need about 75 minutes to complete a work unit.

Thanks.

MindCrime
Send message
Joined: 27 Feb 14
Posts: 6
Credit: 349,003,349
RAC: 934
Message 21990 - Posted: 11 Feb 2016, 18:00:34 UTC - in response to Message 21977.
Last modified: 11 Feb 2016, 18:01:09 UTC

Any suggestions for an Intel HD 4000 integrated GPU? I currently need about 75 minutes to complete a work unit.

Thanks.


That's a good question; I don't have one with me but if I did I would do it like I was overclocking a cpu. Open up the .config see what it is and bump settings from there. I imagine the threads, and lut_size are the most sensitive just like the rest of the GPUs. I'm interested in how much improvement can be had over default on the intel gpus.

Joe
Send message
Joined: 16 Sep 09
Posts: 5
Credit: 902,215,476
RAC: 1,783,657
Message 21996 - Posted: 14 Feb 2016, 23:15:00 UTC

Does anybody have good values for my AMD Radeon R9 390 (Grenada)?

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 22006 - Posted: 17 Feb 2016, 16:41:58 UTC - in response to Message 21990.

Any suggestions for an Intel HD 4000 integrated GPU? I currently need about 75 minutes to complete a work unit.

Thanks.


That's a good question; I don't have one with me but if I did I would do it like I was overclocking a cpu. Open up the .config see what it is and bump settings from there. I imagine the threads, and lut_size are the most sensitive just like the rest of the GPUs. I'm interested in how much improvement can be had over default on the intel gpus.


The embedded Intel GPUs really aren't very fast. I found that when tweaking the settings, my laptop would throttle the GPU due to it overheating. So, even though it would run with higher values for the settings, it actually ran faster with settings lower than the max such that it wouldn't overheat.

MindCrime
Send message
Joined: 27 Feb 14
Posts: 6
Credit: 349,003,349
RAC: 934
Message 22032 - Posted: 24 Feb 2016, 20:00:56 UTC - in response to Message 21996.

Does anybody have good values for my AMD Radeon R9 390 (Grenada)?



I see a 390x making ~5m/day with the following

verbose 1 (yes)
kernels/reduction 32
threads 2^8 (256)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)

HAL9000
Avatar
Send message
Joined: 19 Nov 09
Posts: 15
Credit: 104,993,705
RAC: 0
Message 22040 - Posted: 28 Feb 2016, 9:47:30 UTC - in response to Message 22032.
Last modified: 28 Feb 2016, 9:49:49 UTC

Does anybody have good values for my AMD Radeon R9 390 (Grenada)?



I see a 390x making ~5m/day with the following

verbose 1 (yes)
kernels/reduction 32
threads 2^8 (256)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)

I'm using this configuration on my R9 390x with run times between 55 & 65 seconds.
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30

Joe
Send message
Joined: 16 Sep 09
Posts: 5
Credit: 902,215,476
RAC: 1,783,657
Message 22056 - Posted: 3 Mar 2016, 20:18:46 UTC - in response to Message 22040.

Does anybody have good values for my AMD Radeon R9 390 (Grenada)?



I see a 390x making ~5m/day with the following

verbose 1 (yes)
kernels/reduction 32
threads 2^8 (256)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)

I'm using this configuration on my R9 390x with run times between 55 & 65 seconds.
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30


This works great!!! My run times are now between 65 till 72 seconds. Thanks a lot!

Profile Chris S
Avatar
Send message
Joined: 12 Jul 09
Posts: 257
Credit: 89,016,367
RAC: 0
Message 22060 - Posted: 5 Mar 2016, 14:53:36 UTC

Hi guys.

Just installed my new GTX950 card in my Q6600 machine with 8Gb ram, and I'm getting results of 430seconds for a credit of 4739. It's a simple basic install out of the box, no tweaking or anything. But can it do any better? What do others get with this card?

Many thanks in advance.
____________
Why is there only one Monopolies Commission?

HAL9000
Avatar
Send message
Joined: 19 Nov 09
Posts: 15
Credit: 104,993,705
RAC: 0
Message 22061 - Posted: 6 Mar 2016, 5:30:23 UTC - in response to Message 22060.

Hi guys.

Just installed my new GTX950 card in my Q6600 machine with 8Gb ram, and I'm getting results of 430seconds for a credit of 4739. It's a simple basic install out of the box, no tweaking or anything. But can it do any better? What do others get with this card?

Many thanks in advance.

Doing a quick look through the top 1000 hosts I found 2 other GTX 950's running stock cfgs with similar runs times. One host was a bit faster running ~390 seconds but had a faster 8 core CPU.

Profile Chris S
Avatar
Send message
Joined: 12 Jul 09
Posts: 257
Credit: 89,016,367
RAC: 0
Message 22062 - Posted: 6 Mar 2016, 10:51:07 UTC - in response to Message 22061.

OK thanks Hal. I was looking at the advice you gave to Mindcrime about sieve-size etc.

HAL9000
Avatar
Send message
Joined: 19 Nov 09
Posts: 15
Credit: 104,993,705
RAC: 0
Message 22065 - Posted: 6 Mar 2016, 18:39:21 UTC - in response to Message 22062.

OK thanks Hal. I was looking at the advice you gave to Mindcrime about sieve-size etc.

How I worked out my values.

threads:
Should match the GPUs Max work group size from clinfo. 7^2=128, 8^2=256, 9^2=512

sieve_size:
Increased to a point where the any higher value caused computation errors.

lut_size:
Increased until it caused slower run times. In GPUz I would also see increased Memory controller load. Anything over 50% load here caused slower run times.

kernels_per_reduction:
I started with default and increased by 8. I found using values other than 8 resulted in a non-linear change of run times. I'm unsure of 8 was the magic number for me because it is the value for thread size.

Profile Chris S
Avatar
Send message
Joined: 12 Jul 09
Posts: 257
Credit: 89,016,367
RAC: 0
Message 22067 - Posted: 7 Mar 2016, 9:41:39 UTC

Again, thanks for the reply Hal, but I really haven't got the time to do all that experimenting. What I was hoping for was some recommended settings and which file to put them in. OK that is piggybacking on others hard work, but it all helps the project. I'm sure the card can do more, but I'm happy to stay as I am.
____________
Why is there only one Monopolies Commission?

Joe
Send message
Joined: 16 Sep 09
Posts: 5
Credit: 902,215,476
RAC: 1,783,657
Message 22080 - Posted: 9 Mar 2016, 16:25:34 UTC - in response to Message 22067.

Again, thanks for the reply Hal, but I really haven't got the time to do all that experimenting. What I was hoping for was some recommended settings and which file to put them in. OK that is piggybacking on others hard work, but it all helps the project. I'm sure the card can do more, but I'm happy to stay as I am.


You can try this:
verbose=1
kernels_per_reduction=48
threads=8
lut_size=15
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30

With this I save about 50% time for a WU. It works good with my GTX 680 and my 980ti.

Matt Kowal
Avatar
Send message
Joined: 21 Mar 14
Posts: 34
Credit: 673,520,211
RAC: 497,490
Message 22081 - Posted: 10 Mar 2016, 8:23:07 UTC - in response to Message 22080.

AMD 7970 @ 1ghz

verbose=1
threads=8
kernels_per_reduction=64
sieve_size=30
lut_size=16
reduce_cpu=0
sleep=1

WUs process in about 113 seconds. This card is in a dual GPU setup with a Nvidia 760 and I run the graphics off of the integrated 2500K CPU, so poor video response is not a concern.
____________

Profile Chris S
Avatar
Send message
Joined: 12 Jul 09
Posts: 257
Credit: 89,016,367
RAC: 0
Message 22088 - Posted: 11 Mar 2016, 6:43:57 UTC

Many grateful thanks to Joe and Matt. I tried the settings that Joe suggested on my GTX950 as they seemed to work for his Nvidia cards.

    Previously - 422 secs for credit of 4572 = 10.8 cr/sec

    Afterwards - 229 secs for credit of 3596 = 15.7 cr/sec

An apparent increase of 45% output in terms of credit! Possibly some minor extra tweaking could be used, but I'll happily stick with that for the moment.

I do have one last general Collatz question if I might take a few more moments of your time. On every Boinc project that I know of, the best settings are with the GPU core clock at maximum consistent with stability, and winding the memory clock as low as possible to save on power and heat output. Certainly works at seti.

But I'm told that Collatz also requires high memory speeds as well due to the type of calculations it does. Is this correct?

HAL9000
Avatar
Send message
Joined: 19 Nov 09
Posts: 15
Credit: 104,993,705
RAC: 0
Message 22093 - Posted: 11 Mar 2016, 17:09:54 UTC - in response to Message 22088.

Many grateful thanks to Joe and Matt. I tried the settings that Joe suggested on my GTX950 as they seemed to work for his Nvidia cards.
    Previously - 422 secs for credit of 4572 = 10.8 cr/sec

    Afterwards - 229 secs for credit of 3596 = 15.7 cr/sec

An apparent increase of 45% output in terms of credit! Possibly some minor extra tweaking could be used, but I'll happily stick with that for the moment.

I do have one last general Collatz question if I might take a few more moments of your time. On every Boinc project that I know of, the best settings are with the GPU core clock at maximum consistent with stability, and winding the memory clock as low as possible to save on power and heat output. Certainly works at seti.

But I'm told that Collatz also requires high memory speeds as well due to the type of calculations it does. Is this correct?


I would say that is likely true as well as other system usage. I went from running 0 CPU tasks to running 2 climate tasks and Collatz run times went from
~60s to ~90s on my 390X.

Unrelated to your GTX950 but for this thread.
I found values for the HD 6370M in my notebook. Stock values produced some lag.

Great deal of lag and unusable system: note: crunches just fine
verbose=1
kernels_per_reduction=40
threads=7
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=28

No lag and usable system:
verbose=1
kernels_per_reduction=32
threads=7
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=24

Profile Chris S
Avatar
Send message
Joined: 12 Jul 09
Posts: 257
Credit: 89,016,367
RAC: 0
Message 22094 - Posted: 12 Mar 2016, 8:11:41 UTC

Just a quick addendum on the GTX950.

I wanted to check the config file settings and rather then just opening it and closing it, like an idiot I used the edit option and save. But I forgot to save it as "all files". So the settings weren't being used. I had to reset the project and start again to solve it. Sorry to any wingmen about that.

The first time around there were both X86 and X86-64 config files, so using a 64 bit rig I amended just the 64 bit one. It all worked fine. The second time around after the reset there was only the X86 one, so I amended that. Upon checking this morning the times had doubled again, and there was the other X86-64 config file that had appeared overnight. I amended that and we are now cooking again!

The moral is be careful when you amend and save config files, and check back if you reset or start from scratch on a 64 bit machine. Dumbos R us.

Alfred Zimmel
Send message
Joined: 4 Feb 10
Posts: 1
Credit: 149,849,003
RAC: 1,509,958
Message 22175 - Posted: 28 Mar 2016, 3:58:59 UTC

Today I played a bit with optimizations and got these for a GTX980ti:

lut_size=18
kernels_per_reduction=64
sieve_size=30

(everything else is left at their default values)

This resulted in:

- 1:06 min. per WU (instead of 2:26)
- 400 watts power consumption (instead of 360)
- around 3400 credits per WU (instead of about 4400)

The system is a bit laggy now but still usable.

ExtraTerrestrial Apes
Avatar
Send message
Joined: 22 Aug 09
Posts: 56
Credit: 262,359,591
RAC: 207,890
Message 22317 - Posted: 30 Apr 2016, 9:13:23 UTC

Slicker, would it make sense to read some parameters of the OpenCL device upon app startup to set it optimally? Building upon HAL's comment those could be:

threads:
Should match the GPUs Max work group size from clinfo. 7^2=128, 8^2=256, 9^2=512

and lut_size:
clinfo can read out the amount of L2 cache, so setting something that fits in there is better than setting a too small value just to be safe for any GPU. I.e. for my HD530 that's 524 kB, an unusually large size for such a relatively weak GPU. Increasing the lut size showed very nice performance gains (I'm approaching 2x speed-up in my testing now) over the default setting.

You could easily support a manual override of this as well: "if the user has set anything in the .config file, use this value instead".

MrS
____________
Scanning for our furry friends since Jan 2002

HAL9000
Avatar
Send message
Joined: 19 Nov 09
Posts: 15
Credit: 104,993,705
RAC: 0
Message 22325 - Posted: 1 May 2016, 17:23:20 UTC - in response to Message 22040.

Does anybody have good values for my AMD Radeon R9 390 (Grenada)?



I see a 390x making ~5m/day with the following

verbose 1 (yes)
kernels/reduction 32
threads 2^8 (256)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)

I'm using this configuration on my R9 390x with run times between 55 & 65 seconds.
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30


I found this configuration works well when running two tasks at once on my R9 390X. Run times are mostly in the 80-85 second range. With a few running 115 seconds.
verbose=1
kernels_per_reduction=32
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30

Still using driver 15.12. As I had several problems with 16.3.2 on my system.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Post to thread

Message boards : Number crunching : Optimizing Collatz Sieve


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.