Optimizing the apps

Message boards : Number crunching : Optimizing the apps
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Arnulf

Send message
Joined: 30 Oct 17
Posts: 3
Credit: 829,542,626
RAC: 8,107,547
Message 69 - Posted: 20 Apr 2018, 15:04:05 UTC

First of all, thanks to those that makes the hard work og getting Collatz up and running!

I have removed and added the project and in the prosess deleted most of my notes on how I optimized the GPU client.
Just as a test of my recent AMD RX Vega 64 I copy and pasted a setup from a Nvidia 1080 that I had taken some notes on.
As you can see at this result: https://boinc.thesonntags.com/collatz/result.php?resultid=72530 it seems to work OK, at least better than the standard blank .config file.

The new units are trundeling by at the rate of 1 in about 6 minuts and 20 to 30 seconds.

This is my .config:

verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sieve_size=30
cache_sieve=1
sleep=1
reduce_cpu=0

Does anyone have the original first post on optimizing?
It mentioned the parameters and how you could adjust them towards faster results.
The following posts had many good recommendations for different GPU's - I miss them too! :D

With regards, Arnulf
ID: 69 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 83
Credit: 4,858,524,563
RAC: 37,899,070
Message 71 - Posted: 20 Apr 2018, 15:32:42 UTC - in response to Message 69.  

This should help:

For Linux:
It may be in boinc-client in your home directory. It could also be in /var/lib/boinc-client

For Windows:
C:\program data\boinc\projects\collatz\...BOTH config files

"Here are some of my experiences per setting:

verbose=1 Leave on so you can check your results

kernels_per_reduction=48 Very little effect, maybe smoother desktop/video the higher the better I think. I have yet to cause a crash with this setting.

threads=8 this is the most sensitive IMO, can't go over 8 on ati, seems to be best for most Nvidias, you might try 9 on a new 900 series.

lut_size=15 2nd most critical, Adjust up until times begin to slow down. I believe nvidia (especially maxwell) can be rather high, i have 17 on 750ti and 16 on 7870

sleep=0 haven't touched it

cache_sieve=1 haven't touched it

reduce_cpu=0 haven't touched it

sieve_size=30 third most critical, 30 is pretty high for most cards. I believe my 7970 ran at 30, 7870 at 29, 28 for the 750ti and 26 for gtx560. If you crank this down to like 24 and run and see your GPU utilization is sub 99% turn this up until it you're at 99% or crash."


AMD 290X
verbose=1
items_per_kernel=22
kernels_per_reduction=12
threads=8
lut_size=16
sleep=1

AMD 39X
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30

AMD 480
verbose=1
kernels/reduction=48
threads=8
lut_size=17
sieve_size=30
sleep=1
cache_sieve=1
reduce_cpu=0

AMD 1060
verbose=0
kernels_per_reduction=48
sleep=1
threads=10
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=0

AMD 1070
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=28

AMD 1080
verbose=1
kernels_per_reduction=48
sleep=1
threads=8
lut_size=17
reduce_CPU=0
sieve_size=25
cache_sieve=1

Dual AMD 1080's
verbose=1
kernels_per_reduction=48
sleep=1
threads=8
lut_size=17
reduce_CPU=0
sieve_size=29
cache_sieve=1

AMD 1080Ti
verbose=1
kernels_per_reduction=48
sleep=1
threads=9
lut_size=18
reduce_CPU=0
sieve_size=30
cache_sieve=1

AMD 7850
verbose=1
kernels_per_reduction=10
threads=7
lut_size=16
sleep=1
reduceCPU=1

AMD 7970
verbose=1
threads=8
kernels_per_reduction=64
sieve_size=30
lut_size=16
reduce_cpu=0
sleep=1

Nvidia 660
verbose=1
kernels_per_reduction=48
threads=8
lut_size=15
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30

Nvidia 680
lut_size=18
kernels_per_reduction=64
sieve_size=30
verbose=1
threads=10
sleep=1
reduce_cpu=0

Nvidia 660Ti
verbose=1
kernels_per_reduction=48
threads=8
lut_size=15
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30

Nvidia 980Ti
lut_size=18
kernels_per_reduction=64
sieve_size=30
verbose=1
threads=10
sleep=1
reduce_cpu=0

Nvidia 980
verbose=0
kernels_per_reduction=48
threads=8
lut_size=17
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30

Tesla 10
verbose=1
kernels_per_reduction=48
threads=8
lut_size=15
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30

Fury Nano
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30

Nvidia 750Ti
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=0
cache_sieve=1
reduce_cpu=0
ID: 71 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arnulf

Send message
Joined: 30 Oct 17
Posts: 3
Credit: 829,542,626
RAC: 8,107,547
Message 72 - Posted: 20 Apr 2018, 17:07:51 UTC - in response to Message 71.  

Thanks!

My finding so far on the Vega 64: ( small sample size, but I have seen consistent numbers today and I'm using 06:06 as a reference time with the setting I mentioned in the first post. )

lut_size:17 as the reference number, will test other settings.
- Changing the lut_size from 17 to 18 slows down the prosessing much, going up from around 06:06 to 11:06, 5 minutes more time!
- Setting it to 16 results in a small increase in the prosessing time and it ends on 06:28.

I'm letting it stand at 17 for now.

kernels pr reduction:48 as the reference number, will test other settings.
- Changing it to 64 seems to have no effect
- Changing it to 32 results in a slight increase of processing time, 3-5 secs.

I'm letting it stand at 48 for now.

sieve_size:30 as the reference number, will test other settings.
- Changing to 32 makes the tasks fail.
- Changing it to 28 results in a increase in processing time, almost a minute slower.

Conclusion for now, I will use the setup I mentioned in the first post.
If I find other setting that speed up processing I will post them here!
ID: 72 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 29 May 16
Posts: 12
Credit: 224,609,518
RAC: 1,828
Message 74 - Posted: 20 Apr 2018, 22:06:16 UTC
Last modified: 20 Apr 2018, 22:06:24 UTC

ID: 74 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cautilus

Send message
Joined: 29 Jul 14
Posts: 4
Credit: 1,710,247,258
RAC: 35,105,194
Message 81 - Posted: 21 Apr 2018, 10:52:50 UTC

This is what I'm using on my TITAN V, roughly 3 minutes per task with 2 tasks running concurrently:
verbose=1
kernels_per_reduction=48
threads=7
lut_size=19
sleep=1
reduce_cpu=0
sieve_size=30
ID: 81 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nedmanjo

Send message
Joined: 7 Feb 16
Posts: 8
Credit: 667,680,816
RAC: 779,417
Message 83 - Posted: 21 Apr 2018, 14:19:13 UTC - in response to Message 71.  
Last modified: 21 Apr 2018, 14:19:22 UTC

Any chance of anyone having an optimization for a Titan Black? Also, what's the best way to determine if the changes are improving computational speed? Do you just run a task and then check to see if the task time is decreasing / increasing?
ID: 83 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nedmanjo

Send message
Joined: 7 Feb 16
Posts: 8
Credit: 667,680,816
RAC: 779,417
Message 84 - Posted: 21 Apr 2018, 14:22:18 UTC - in response to Message 83.  

One other question. What are your thoughts on running 1 task per GPU or more? I've run as many as 4 per each Titan Black GPU.
ID: 84 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Renato

Send message
Joined: 19 Apr 11
Posts: 12
Credit: 4,712,043,662
RAC: 14,511,720
Message 88 - Posted: 21 Apr 2018, 20:56:19 UTC

I have found that with one WU on an Nvidia 1080 Ti the GPU is only ~65-70% busy.
That's why I let 3 WUs run in parallel, so it is also ensured that the pauses are filled in when terminating/starting the WUs.
one WU needs 5:32, three WUs in parallel need 11:12 each
I control the GPU load/temperature with the MSI Afterburner.
ID: 88 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EG

Send message
Joined: 9 Jun 13
Posts: 7
Credit: 30,161,400,316
RAC: 79,207
Message 105 - Posted: 23 Apr 2018, 1:26:57 UTC - in response to Message 88.  
Last modified: 23 Apr 2018, 1:27:22 UTC

I have found that with one WU on an Nvidia 1080 Ti the GPU is only ~65-70% busy.
That's why I let 3 WUs run in parallel, so it is also ensured that the pauses are filled in when terminating/starting the WUs.
one WU needs 5:32, three WUs in parallel need 11:12 each
I control the GPU load/temperature with the MSI Afterburner.


Sounds to me like your not using the configuration file....

Start using it and you will soon drop to one per GPU....

Two (or more) per GPU for me resulted in WU's erroring out at a far greater rate than any benefit gained..
ID: 105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 83
Credit: 4,858,524,563
RAC: 37,899,070
Message 109 - Posted: 23 Apr 2018, 11:17:00 UTC - in response to Message 105.  

I have found that with one WU on an Nvidia 1080 Ti the GPU is only ~65-70% busy.
That's why I let 3 WUs run in parallel, so it is also ensured that the pauses are filled in when terminating/starting the WUs.
one WU needs 5:32, three WUs in parallel need 11:12 each
I control the GPU load/temperature with the MSI Afterburner.


Sounds to me like your not using the configuration file....

Start using it and you will soon drop to one per GPU....

Two (or more) per GPU for me resulted in WU's erroring out at a far greater rate than any benefit gained..


You are correct he is NOT using the optimization codes, he's running each unit at @600 seconds on more!! My own 1080Ti is doing them in about 250 seconds per workunit one at a time.
ID: 109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 83
Credit: 4,858,524,563
RAC: 37,899,070
Message 110 - Posted: 23 Apr 2018, 11:21:19 UTC - in response to Message 88.  

I have found that with one WU on an Nvidia 1080 Ti the GPU is only ~65-70% busy.
That's why I let 3 WUs run in parallel, so it is also ensured that the pauses are filled in when terminating/starting the WUs.
one WU needs 5:32, three WUs in parallel need 11:12 each
I control the GPU load/temperature with the MSI Afterburner.


Stop running several at a time and put the following codes in the "*.config" file in c:\program data/boinc/projects/collatz

verbose=1
kernels_per_reduction=48
sleep=1
threads=9
lut_size=18
reduce_CPU=0
sieve_size=30
cache_sieve=1

Use NOTEPAD if you are running Windows to copy and paste the lines, no need to stop Boinc, they will take affect on the very next unit you run.

Currently your workunits are taking @650 seconds each, my 1080Ti is doing one unit at a time in @250 seconds each with those codes and the gpu is loaded at 100%.
ID: 110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Renato

Send message
Joined: 19 Apr 11
Posts: 12
Credit: 4,712,043,662
RAC: 14,511,720
Message 118 - Posted: 23 Apr 2018, 14:10:26 UTC - in response to Message 110.  
Last modified: 23 Apr 2018, 14:16:19 UTC

Currently your workunits are taking @650 seconds each, my 1080Ti is doing one unit at a time in @250 seconds each with those codes and the gpu is loaded at 100%.


I don't agree with you.
I calculate three WUs simultaneously and need ~11:50 seconds for each WU, that means 11:50 = 710 sec. / 3 = 236 sec.
Of course, the difference to your 250 sec is small.
I have now activated your configuration and get ~230 sec.
Probably you could test dozens of configurations until you find the optimum.

At the moment I am in 3rd place: https://boinc.thesonntags.com/collatz/top_hosts.php
ID: 118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EG

Send message
Joined: 9 Jun 13
Posts: 7
Credit: 30,161,400,316
RAC: 79,207
Message 119 - Posted: 23 Apr 2018, 17:29:39 UTC - in response to Message 118.  

.......
At the moment I am in 3rd place: https://boinc.thesonntags.com/collatz/top_hosts.php


Don't mean to burst your bubble but......

Your basing your ideals of how well your machine is producing based upon one week old Rac?

It's going to take two months of solid steady crunching for Rac to stabilize sufficiently to be any judge of how well a machine is producing.

I've got a machine that is producing 12.8 million a day right now all by it's lonesome, but it's Rac is only 3.2 million.

So your producing at 227 seconds per WU on average. that is roughly 380.5 WU's per day, (86400 sec/day divided by your 227 per WU)

That 380.5 WU per day times the 27200 credit per WU average your getting equals 10.35 million per day.

Pencils out right where a 1080Ti should be,. twice the Rac your now demonstrating.

We have been doing this a while.

That machine will probably be down below 20th position when the dual machine Racs catch up.... Pretty much average for a single 1080Ti....
ID: 119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anthony Ayiomamitis

Send message
Joined: 21 Jan 15
Posts: 9
Credit: 2,388,025,014
RAC: 10,076,740
Message 120 - Posted: 23 Apr 2018, 18:59:23 UTC
Last modified: 23 Apr 2018, 19:05:24 UTC

My GTX 1060 has gone from 13 minutes per task (pre down-time) to about 16 minutes (post down-time) and this is using the original parms I had in place prior to the temporary down-time. I know that others are still producing results at around 13 minutes per unit. I have tried various changes but no luck. Can someone assist me in this regard?

What is puzzling is that I applied my original GTX 1060 parms to my GTX 1080 since I did not have a copy of the parms for the latter saved and my times are as expected and at just under 6 minutes per task (ie 345-350 seconds).

Thanks!

------------- cut here --------------
verbose=0
kernels_per_reduction=48
threads=10
lut_size=17
sieve_size=30
sleep=1
cache_sieve=1
reduce_CPU=0
---------------------------------------
ID: 120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Slicker
Project administrator

Send message
Joined: 11 Jun 09
Posts: 38
Credit: 766,049,588
RAC: 706,753
Message 124 - Posted: 23 Apr 2018, 21:20:24 UTC

sleep=0 any setting other than 0 will likely slow things down as it literally sleeps after submitting the kernel to the GPU.

cache_sieve=1 any setting other than 1 will add several seconds to the run time as it will re-create the sieve for each WU run rather than re-using it

reduce_cpu=0 This does nothing in the current version. The summarization of the kernels is done on the GPU and then the best result is checked on the CPU to make sure they match. If not, it reports it as an error.
ID: 124 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brent

Send message
Joined: 25 Jun 14
Posts: 6
Credit: 210,558,241
RAC: 169,317
Message 125 - Posted: 23 Apr 2018, 21:31:34 UTC - in response to Message 71.  
Last modified: 23 Apr 2018, 21:32:53 UTC

This should help:

For Linux:
It may be in boinc-client in your home directory. It could also be in /var/lib/boinc-client

For Windows:
C:\program data\boinc\projects\collatz\...BOTH config files

"Here are some of my experiences per setting:

verbose=1 Leave on so you can check your results

kernels_per_reduction=48 Very little effect, maybe smoother desktop/video the higher the better I think. I have yet to cause a crash with this setting.

threads=8 this is the most sensitive IMO, can't go over 8 on ati, seems to be best for most Nvidias, you might try 9 on a new 900 series.

lut_size=15 2nd most critical, Adjust up until times begin to slow down. I believe nvidia (especially maxwell) can be rather high, i have 17 on 750ti and 16 on 7870

sleep=0 haven't touched it

cache_sieve=1 haven't touched it

reduce_cpu=0 haven't touched it

sieve_size=30 third most critical, 30 is pretty high for most cards. I believe my 7970 ran at 30, 7870 at 29, 28 for the 750ti and 26 for gtx560. If you crank this down to like 24 and run and see your GPU utilization is sub 99% turn this up until it you're at 99% or crash."


Thank you for this info. I am successfully running with the default values (no config file) and would like to optimize my settings. What would help greatly is if I knew what the default settings are to use as a starting point, since I have not been able to find any settings for my NVIDIA GT 730 GPU. While I fully realize this is not a powerful GPU, it is nevertheless better than nothing. Any help in this area would be greatly appreciated, since am not a developer or programmer.

Brent
ID: 125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cautilus

Send message
Joined: 29 Jul 14
Posts: 4
Credit: 1,710,247,258
RAC: 35,105,194
Message 130 - Posted: 24 Apr 2018, 2:22:49 UTC

Slicker thanks for the info, I'll update my config accordingly.

I'll give this a try with my TITAN V when I get home and see if I get an improvement:
verbose=1
cache_sieve=1
kernels_per_reduction=48
threads=7
lut_size=19
sleep=0
sieve_size=30
ID: 130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 83
Credit: 4,858,524,563
RAC: 37,899,070
Message 138 - Posted: 24 Apr 2018, 10:42:46 UTC - in response to Message 120.  

My GTX 1060 has gone from 13 minutes per task (pre down-time) to about 16 minutes (post down-time) and this is using the original parms I had in place prior to the temporary down-time. I know that others are still producing results at around 13 minutes per unit. I have tried various changes but no luck. Can someone assist me in this regard?

What is puzzling is that I applied my original GTX 1060 parms to my GTX 1080 since I did not have a copy of the parms for the latter saved and my times are as expected and at just under 6 minutes per task (ie 345-350 seconds).

Thanks!

------------- cut here --------------
verbose=0
kernels_per_reduction=48
threads=10
lut_size=17
sieve_size=30
sleep=1
cache_sieve=1
reduce_CPU=0
---------------------------------------


1060 codes are:
verbose=0
kernels_per_reduction=48
sleep=1
threads=10
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=0

so you are doing just fine, as suggested you can play with the settings but some will definitely make things slower.
ID: 138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 83
Credit: 4,858,524,563
RAC: 37,899,070
Message 139 - Posted: 24 Apr 2018, 10:44:27 UTC - in response to Message 118.  

Currently your workunits are taking @650 seconds each, my 1080Ti is doing one unit at a time in @250 seconds each with those codes and the gpu is loaded at 100%.


I don't agree with you.
I calculate three WUs simultaneously and need ~11:50 seconds for each WU, that means 11:50 = 710 sec. / 3 = 236 sec.
Of course, the difference to your 250 sec is small.
I have now activated your configuration and get ~230 sec.
Probably you could test dozens of configurations until you find the optimum.

At the moment I am in 3rd place: https://boinc.thesonntags.com/collatz/top_hosts.php


Yes there's always the chance that each gpu will respond positively to some changes.
ID: 139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JOHN

Send message
Joined: 8 Feb 10
Posts: 2
Credit: 3,179,270,747
RAC: 15,151,334
Message 155 - Posted: 24 Apr 2018, 21:55:46 UTC

this is where i have my 1080 ti set currently.verbose=1
kernels_per_reduction=48
threads=8
lut_size=18
sleep=0
cache_sieve=1
reduce_cpu=0
sieve_size=30 there running between 235 and 245 sec.not making anywhere near the credit i was making before the issues started.was getting around 9-10 million a day,now down to little over 3 million.oh well.
ID: 155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Optimizing the apps


©2018 Jon Sonntag; All rights reserved