Optimizing Collatz Sieve
log in

Advanced search

Message boards : Number crunching : Optimizing Collatz Sieve

Previous · 1 . . . 4 · 5 · 6 · 7
Author Message
koschi
Send message
Joined: 4 Nov 09
Posts: 1
Credit: 29,471,002
RAC: 0
Message 23650 - Posted: 27 Dec 2016, 9:04:30 UTC - in response to Message 23628.
Last modified: 27 Dec 2016, 9:04:49 UTC

Anyone have some good settings for a GTX 1060? Currently using;
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=26
cache_size=1

Tasks are taking about 15 minutes each. That seems a bit long with the GPU showing 98% load.


I'm down to 780 seconds (28030 credit) on my overclocked GTX1060 3GB and 13708 seconds (28842 credit) on my overclocked GT730 using a higher sieve_size of 30. This gave me a 7% runtime reduction over the previously used sieve_size=28. I didn't try 26 though...
Unfortunately credits decreased by 5%, so tuning and potentially sacrificing desktop performance might not be worth it.

Jon Fox
Send message
Joined: 6 Sep 09
Posts: 36
Credit: 351,380,599
RAC: 260,574
Message 23660 - Posted: 29 Dec 2016, 9:16:16 UTC - in response to Message 21584.

I wanted to update this thread with the optimization results i am seeing for the larger CPU WUs. The optimized larger CPU WUs are reporting a 28-32% decrease in runtimes, CPU, and of course, credits over like WUs using the default configuration settings.

My config settings:

Collatz Config Settings:
verbose 1 (yes)
lut_size 18 (2097152 bytes)
sieve_size 2^30 (51085096 bytes)
cache_sieve 1 (yes)


Here's the start-up log:

    Tue Dec 27 10:51:24 2016 | | Starting BOINC client version 7.6.33 for x86_64-apple-darwin
    Tue Dec 27 10:51:24 2016 | | log flags: file_xfer, task, slot_debug
    Tue Dec 27 10:51:24 2016 | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 c-ares/1.10.0
    Tue Dec 27 10:51:24 2016 | | Data directory: /Library/Application Support/BOINC Data
    Tue Dec 27 10:51:24 2016 | | OpenCL: NVIDIA GPU 0: GeForce GT 750M (driver version 10.14.20 355.10.05.15f03, device version OpenCL 1.2, 1024MB, 1024MB available, 178 GFLOPS peak)
    Tue Dec 27 10:51:24 2016 | | OpenCL CPU: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
    Tue Dec 27 10:51:24 2016 | | App version needs CUDA but GPU doesn't support it
    Tue Dec 27 10:51:24 2016 | Moo! Wrapper | Application uses missing NVIDIA GPU
    Tue Dec 27 10:51:29 2016 | | Host name: Jons-iMac.local
    Tue Dec 27 10:51:29 2016 | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz [x86 Family 6 Model 60 Stepping 3]
    Tue Dec 27 10:51:29 2016 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx smx est tm2 ssse3 fma cx16 tpr pdcm sse4_1 sse4_2 x2apic movbe popcnt aes pcid xsave osxsave seglim64 tsctmr avx rdrand f16c
    Tue Dec 27 10:51:29 2016 | | OS: Mac OS X 10.12.2 (Darwin 16.3.0)
    Tue Dec 27 10:51:29 2016 | | Memory: 8.00 GB physical, 780.01 GB virtual
    Tue Dec 27 10:51:29 2016 | | Disk: 930.71 GB total, 779.77 GB free
    Tue Dec 27 10:51:29 2016 | | Local time is UTC -5 hours



--
Jon

dem0707
Send message
Joined: 7 Dec 15
Posts: 6
Credit: 2,013,541,880
RAC: 0
Message 23694 - Posted: 6 Jan 2017, 23:14:13 UTC

I just want to throw in my 2 cents worth on this optimization thing. Sadly, I have no experience with the new 10XX GPUs, and have not tinkered around with my setups since the new 8x sized WUs have started. But I have had good luck with the following on my 5 GTX980s:

    <app_config>
    <app>
    <name>collatz_sieve</name>
    <max_concurrent>5</max_concurrent>
    <gpu_versions>
    <gpu_usage>1.0</gpu_usage>
    <cpu_usage>0.4</cpu_usage>
    </gpu_versions>
    </app>
    </app_config>


I have always found that for Collatz, running more than 1 WU per GPU will only slow things down.

    verbose=0
    kernels_per_reduction=48
    threads=8
    lut_size=17
    sleep=0
    cache_sieve=1
    reduce_cpu=0
    sieve_size=30


And don't be afraid to go for the faster time per WU even if the credit is less. I found that the credits per second was always better. I think that with the lut_size optimized a GPU can get a WU done in less time with less work (thus less credit) by not having to recalculate some of the same things over and over. I think this is even more important with new bigger WUs.

Happy New Year to everyone, and keep on Crunching(@EVGA)!!!

David

Profile bcavnaugh
Avatar
Send message
Joined: 24 Mar 14
Posts: 19
Credit: 1,620,803,392
RAC: 812,434
Message 23702 - Posted: 9 Jan 2017, 0:47:21 UTC - in response to Message 23694.
Last modified: 9 Jan 2017, 0:52:31 UTC

Thanks for the info. My GTX Titans take 40-45 minutes to complete one task using.
<app_config>
<app>
<name>collatz_sieve</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>0.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

I will try your settings and see if I can get back down to 8-10 minutes as they used to run before they changed the run times to be longer.
My GTX 1080 cards with the same settings about take about 12 minutes.
My AMD 290X were good running cards to but now take 20 minutes and before the change they took less the 6 minutes to complete.

I ran 2 Tasks Per GPU all last year with the above times but now you really cannot do this anymore, What a Bummer.

Where do I put this at?

verbose=0
kernels_per_reduction=48
threads=8
lut_size=17
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30
____________
Crunching@EVGA The Number One Team in the BOINC Community.
Folding@EVGA The Number One Team in the Folding@Home Community.

EG
Avatar
Send message
Joined: 9 Jun 13
Posts: 74
Credit: 28,731,858,336
RAC: 27,522,226
Message 23706 - Posted: 9 Jan 2017, 7:35:05 UTC - in response to Message 23702.
Last modified: 9 Jan 2017, 7:35:49 UTC

.........

I ran 2 Tasks Per GPU all last year with the above times but now you really cannot do this anymore, What a Bummer.

Where do I put this at?

verbose=0
kernels_per_reduction=48
threads=8
lut_size=17
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30


You paste that into the two ****.config files contained in the collatz project directory and save it as a pure ascii text file.

Put it in both, and then I would ditch the app_config file as you won't need it anymore.
Running a single WU will run faster on AMD hardware than running two.
____________

Profile Grigory Kostykov
Send message
Joined: 7 Oct 12
Posts: 19
Credit: 3,429,137,042
RAC: 0
Message 23711 - Posted: 10 Jan 2017, 2:24:06 UTC - in response to Message 23706.

C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz\

PUT THIS: (best for 1080)

verbose=1
kernels_per_reduction=48
sleep=1
threads=9
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=1

INSIDE THIS

collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config

It will utilize 99-100% gpu

Run only one task at once !

Luke Formosa
Send message
Joined: 17 Mar 14
Posts: 1
Credit: 113,192,841
RAC: 165,688
Message 23947 - Posted: 4 Apr 2017, 22:06:23 UTC

Can anyone point me in the right direction for a GTX960M (yep, laptop card), and intel 4720HQ (HD4600 iGPU)?

Currently I'm using the following (found on the forums) as a starting point, but tasks take 1-2 hours to complete on GPU (and 8 hours on the iGPU), so optimisation is going very slowly!

verbose=1
kernels_per_reduction=56
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=28

Joe
Send message
Joined: 16 Sep 09
Posts: 5
Credit: 902,215,476
RAC: 1,783,657
Message 23950 - Posted: 6 Apr 2017, 16:06:09 UTC - in response to Message 23711.

C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz\

PUT THIS: (best for 1080)

verbose=1
kernels_per_reduction=48
sleep=1
threads=9
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=1

INSIDE THIS

collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config

It will utilize 99-100% gpu

Run only one task at once !


Thanks, thats works great fo rmy 1080Ti! About 4:40 for one WU. With my old config I need about 6:15 for one WU...

ill87
Send message
Joined: 6 Jun 17
Posts: 1
Credit: 44,074,136
RAC: 0
Message 24049 - Posted: 7 Jun 2017, 5:40:04 UTC

Little bit of thread Necromancy..

Seeing some great results with
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=29

However I think I can do a little bit better with a little help from my friends?

Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9](8 processors) NVIDIA GeForce GTX 1080 (4095MB) driver: 382.33 OpenCL: 1.02

The 1080 is an ASUS STRIX and runs 1607mhz/5005mhz on gpu/memory clocks no problem, see here: https://www.newegg.ca/Product/Product.aspx?Item=N82E16814126116

Have a super fast SSD and 4.5ghz OC on the processor (though presently I am only doing GPU work)

Is there any way I can modify this further to enhance my runtime?

As you can see there is a very similar PC on my account as well, with same processor + SSD but GTX1070 also ASUS STRIX... obtaining similar run time on both and while I know it's not always the case, I assume the 1080 > 1070 if the settings are optimized.
I'm seeing 95% usage at these settings but my cooling is good and I want 100%.


Thanks in advance for reading and for your help.

NN
Send message
Joined: 27 Jun 10
Posts: 3
Credit: 100,263,120
RAC: 2,117
Message 24120 - Posted: 6 Jul 2017, 14:27:30 UTC - in response to Message 24049.
Last modified: 6 Jul 2017, 14:40:18 UTC

Hello ill87,

I found a way to catch this last 5% with my good ol´ GTX 650: JUST START A MOVIE (not the player only) and stop (don´t close) it !
I don´t know why, but GPU-Z shows an usage of 100% then and the runtime decreases from nearly 1h49min to about 1h44min.
I hope this works on your machine as well, please let me know about it :)

My settings are:
kernels_per_reduction=2
threads=7
lut_size=20
sieve_size=29
reduce_cpu=0
verbose=1
on a Windows 7 box (Core i7 2600) with the 382.33 driver.

With greetings

NN

JOHN
Send message
Joined: 8 Feb 10
Posts: 2
Credit: 1,706,791,080
RAC: 9,576,612
Message 24139 - Posted: 23 Jul 2017, 19:21:09 UTC

verbose=1
kernels_per_reduction=48
sleep=1
threads=9
lut_size=18
reduce_CPU=0
sieve_size=30
cache_sieve=1
been using this on my 1080ti.4:10-4:20 times,got the core clock turned up to 150,and the memory to 175.stays under 150 temp wise.

Eudy Carvalhaes
Send message
Joined: 5 Aug 17
Posts: 3
Credit: 44,022,198
RAC: 1
Message 24169 - Posted: 7 Aug 2017, 4:29:31 UTC - in response to Message 23628.
Last modified: 7 Aug 2017, 4:32:46 UTC

Anyone have some good settings for a GTX 1060? Currently using;
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=26
cache_size=1

Tasks are taking about 15 minutes each. That seems a bit long with the GPU showing 98% load.


This is what I'm using (GTX 1060, 3GB) (the significant differences from your settings are threads=10, sieve_size=30):


verbose=0
kernels_per_reduction=48
sleep=1
threads=10
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=0


It's taking around 14 minutes per WU.

Previously, when I had no optimization, it was taking around 30 minutes/WU.

HTH

Eudy Carvalhaes
Send message
Joined: 5 Aug 17
Posts: 3
Credit: 44,022,198
RAC: 1
Message 24174 - Posted: 8 Aug 2017, 20:53:32 UTC - in response to Message 24169.
Last modified: 8 Aug 2017, 20:54:38 UTC

Increased kernels_per_reduction to 64.
Now each WU is crunched in a little bit less than 14 min, on average.

For a GTX 1060 3GB:


verbose=0
kernels_per_reduction=64
sleep=1
threads=10
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=0

Eudy Carvalhaes
Send message
Joined: 5 Aug 17
Posts: 3
Credit: 44,022,198
RAC: 1
Message 24187 - Posted: 16 Aug 2017, 6:39:15 UTC - in response to Message 24174.

I've just found a better setting yet for my GTX 1060 3GB:


verbose=0
kernels_per_reduction=64
sleep=1
threads=9
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=0

Here are my valid tasks.

HassanShebli
Send message
Joined: 28 Oct 10
Posts: 10
Credit: 16,976,418
RAC: 15
Message 24261 - Posted: 13 Sep 2017, 11:27:28 UTC

Greetings:

I have amd 6970 and I am using gpu only on collatz.

Is there a way to crunch two WUs? Is it faster than than single WU?

give me a file that I can copy it to my directory

Thanks

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,688,614,392
RAC: 6,003,828
Message 24315 - Posted: 1 Oct 2017, 21:29:11 UTC - in response to Message 24261.

Greetings:

I have amd 6970 and I am using gpu only on collatz.

Is there a way to crunch two WUs? Is it faster than than single WU?

give me a file that I can copy it to my directory

Thanks


Yes you can but no it's not faster!!

Previous · 1 . . . 4 · 5 · 6 · 7
Post to thread

Message boards : Number crunching : Optimizing Collatz Sieve


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.