Posts by HAL9000
log in
1) Message boards : Windows : HD 4890 getting no WUs, says no OpenCL support (Message 24232)
Posted 141 days ago by HAL9000
The HD4000 series did only ever have Beta OpenCL support. Which is a shame since the RX 570 has only a small fraction better PPW for DP.
I would probably stick Cat 12.8, 12.10, or 13.4 for your HD4890. Cat 13.1 has a known APP runtime 1084.4 compiler bug. So should be avoided for GPGPU processing.

I think Moo is the last project that still support Brook/CAL processing and the rest have switched to OpenCL. I still run my pair of HD6870's on Moo every so often using the Brook/CAL app. It is faster than the Open CL app and doesn't require as much overhead. My HD4870 sadly sits on a shelf next to my HD3850.
2) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22351)
Posted 625 days ago by HAL9000
Slicker, would it make sense to read some parameters of the OpenCL device upon app startup to set it optimally? Building upon HAL's comment those could be:

threads:
Should match the GPUs Max work group size from clinfo. 7^2=128, 8^2=256, 9^2=512

and lut_size:
clinfo can read out the amount of L2 cache, so setting something that fits in there is better than setting a too small value just to be safe for any GPU. I.e. for my HD530 that's 524 kB, an unusually large size for such a relatively weak GPU. Increasing the lut size showed very nice performance gains (I'm approaching 2x speed-up in my testing now) over the default setting.

You could easily support a manual override of this as well: "if the user has set anything in the .config file, use this value instead".

MrS

Which clinfo value are you reading to determine the GPU cache size?
For my R9 390X clinfo shows Cache size: 16384. However it has a 16 64KB blocks of L2 cache or a total of 1024KB of L2 cache.
For my HD6870 clinfo shows Cache size: 0. However it has 4 128KB blocks of L2 cache or a total of 512KB of L2 cache.
3) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22325)
Posted 629 days ago by HAL9000
Does anybody have good values for my AMD Radeon R9 390 (Grenada)?



I see a 390x making ~5m/day with the following

verbose 1 (yes)
kernels/reduction 32
threads 2^8 (256)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)

I'm using this configuration on my R9 390x with run times between 55 & 65 seconds.
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30


I found this configuration works well when running two tasks at once on my R9 390X. Run times are mostly in the 80-85 second range. With a few running 115 seconds.
verbose=1
kernels_per_reduction=32
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30

Still using driver 15.12. As I had several problems with 16.3.2 on my system.
4) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22093)
Posted 680 days ago by HAL9000
Many grateful thanks to Joe and Matt. I tried the settings that Joe suggested on my GTX950 as they seemed to work for his Nvidia cards.
    Previously - 422 secs for credit of 4572 = 10.8 cr/sec

    Afterwards - 229 secs for credit of 3596 = 15.7 cr/sec

An apparent increase of 45% output in terms of credit! Possibly some minor extra tweaking could be used, but I'll happily stick with that for the moment.

I do have one last general Collatz question if I might take a few more moments of your time. On every Boinc project that I know of, the best settings are with the GPU core clock at maximum consistent with stability, and winding the memory clock as low as possible to save on power and heat output. Certainly works at seti.

But I'm told that Collatz also requires high memory speeds as well due to the type of calculations it does. Is this correct?


I would say that is likely true as well as other system usage. I went from running 0 CPU tasks to running 2 climate tasks and Collatz run times went from
~60s to ~90s on my 390X.

Unrelated to your GTX950 but for this thread.
I found values for the HD 6370M in my notebook. Stock values produced some lag.

Great deal of lag and unusable system: note: crunches just fine
verbose=1
kernels_per_reduction=40
threads=7
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=28

No lag and usable system:
verbose=1
kernels_per_reduction=32
threads=7
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=24
5) Message boards : Number crunching : Your best result so far? (Message 22086)
Posted 681 days ago by HAL9000
http://boinc.thesonntags.com/collatz/highest_steps.php


When does this list get updated? I'm pretty sure it has many updates to it.

It's at the bottom of the page.

Last Update: November 15,2013
I guess something isn't running to generate it.
6) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22065)
Posted 685 days ago by HAL9000
OK thanks Hal. I was looking at the advice you gave to Mindcrime about sieve-size etc.

How I worked out my values.

threads:
Should match the GPUs Max work group size from clinfo. 7^2=128, 8^2=256, 9^2=512

sieve_size:
Increased to a point where the any higher value caused computation errors.

lut_size:
Increased until it caused slower run times. In GPUz I would also see increased Memory controller load. Anything over 50% load here caused slower run times.

kernels_per_reduction:
I started with default and increased by 8. I found using values other than 8 resulted in a non-linear change of run times. I'm unsure of 8 was the magic number for me because it is the value for thread size.
7) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22061)
Posted 686 days ago by HAL9000
Hi guys.

Just installed my new GTX950 card in my Q6600 machine with 8Gb ram, and I'm getting results of 430seconds for a credit of 4739. It's a simple basic install out of the box, no tweaking or anything. But can it do any better? What do others get with this card?

Many thanks in advance.

Doing a quick look through the top 1000 hosts I found 2 other GTX 950's running stock cfgs with similar runs times. One host was a bit faster running ~390 seconds but had a faster 8 core CPU.
8) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22040)
Posted 692 days ago by HAL9000
Does anybody have good values for my AMD Radeon R9 390 (Grenada)?



I see a 390x making ~5m/day with the following

verbose 1 (yes)
kernels/reduction 32
threads 2^8 (256)
lut_size 17 (1048576 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)

I'm using this configuration on my R9 390x with run times between 55 & 65 seconds.
verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=30
9) Message boards : Number crunching : Benchmarking the collatz application (Message 20958)
Posted 899 days ago by HAL9000
For SETI@home the lunatics selected or modified some tasks so that benchmarking wouldn't task as long.

Could some tasks form Collatz be shortened or some shortish ones generated for benchmarking purposes?

The purpose of the benchmark is to do offline testing to tune the app and/or parameters. So credits & results wouldn't even be a factor.
10) Message boards : Number crunching : Errors on CUDA workunit (Message 20736)
Posted 929 days ago by HAL9000
I don't think the errors are limited to the CUDA workunits. I have also received these same kinds of errors trying to use the OpenCL app for Nvidia and the Intel OpenCL app as well. I had one work unit run and validate, and another one fail after a couple of seconds.

http://boinc.thesonntags.com/collatz/results.php?userid=45606

It's almost like there are 2 threads about the same issue
http://boinc.thesonntags.com/collatz/forum_thread.php?id=1279
11) Message boards : Number crunching : Errors on CUDA workunit (Message 20705)
Posted 933 days ago by HAL9000
I released a 6.05 version of the CUDA app. It _should_ fix the bug where the CPU and GPU numbers don't match at times.

It also includes a new config parameter called "lut_size" where lut stands for look up table. The default for the lut_size is 12 rather than the previous 20. This allows the lookup table to fit in most GPU's cache which increases performance. It also reduces the memory footprint so it will run better on older GPUs (e.g. 8400GS w/ 256MB RAM or a mobile GPU which uses shared system RAM).

Will there be an update for opencl_amd_gpu & opencl_intel_gpu as well?
12) Message boards : Number crunching : Errors on CUDA workunit (Message 20672)
Posted 939 days ago by HAL9000
Yes problem STILL around. Hal, that is what I suspect also - corruption of data and/or other hardware damages from the power surge/lighting strike.

6 more (so far) task for me have crashed since project came back on-line

Only you can access your full task list. Everyone else only gets to see an error message. However I suspect that the tasks you are referring to are mostly on this host.
13) Message boards : Number crunching : Errors on CUDA workunit (Message 20669)
Posted 939 days ago by HAL9000
Another indication this problem has nothing to do with 32-bit apps on 64-bit platforms.
Have a look at this host:

http://boinc.thesonntags.com/collatz/show_host_detail.php?hostid=127401

It is a 32-bit Windows system and it has exactly the same problems.

Tom

Looking back it seems like this started after June 10th.
Although nothing else seemed to happen around that same time.
14) Message boards : Number crunching : Computation Errors (Message 20643)
Posted 945 days ago by HAL9000
Possible.
But only few are failing. Most wus are ok.

This errors started suddenly around a week before. Nothing on the clients was changed.

Yeah in the CUDA issue thread it looks like they are thinking it is some 64-bit vs 32-bit issue. But running opencl_amd_gpu there is only the 32-bit app. Given the new app 6.05 was put up on June 1st. I would hazard a guess that is when it started. However I started getting the same "At offset 1769472 got 446 from the GPU when expecting 534 Error: GPU steps do not match CPU steps. Workunit processing aborted." on my iGPU around the same time & those apps are from April 2014.

Also it looks like my HD5750 is getting those about 25% of the time. At least they are basically instant fails. Rather than ting up the GPU for hours.

I guess I should add that my 5 hosts with HD3450's running the ati14 app, which is is a 64-bit & 32-bit version have had 0 of these errors.
15) Message boards : Number crunching : iGPU high CPU usage (Message 20553)
Posted 954 days ago by HAL9000
I have my little BayTrail machine setup to use its iGPU for Collatz as a backup project when SETI@home runs out of AP for it. However it looks like it is running almost entirely on the CPU.

I'm already reserving a core for the GPU in my app_config.xml
<app_config> <app> <name>mini_collatz</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config>

However I'm not really familiar with Collatz as I am with SETI, but from the looks of the Optimizing Collatz v6.xx OpenCL and CUDA Applications thread. I may need to use values lower than default for items_per_kernel & kernels_per_reduction? Unless there is some kind of known app driver compatibility I should be aware of.




Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.