Posts by ExtraTerrestrial Apes
log in
1) Message boards : Windows : Intel GPU error (Message 24136)
Posted 184 days ago by ExtraTerrestrial Apes
Follow-up: using the Display Driver Uninstaller to clean the Intel driver, followed by a reinstall of the current one solved my problem.

MrS
2) Message boards : Windows : Intel GPU error (Message 23972)
Posted 277 days ago by ExtraTerrestrial Apes
I have the same problem since the Win 10 Creator's update but can't make CC work again. The VS runtimes are installed (reinstalled to make sure). I've downloaded the current driver from Intel (15.45.16.4627) and a prior one from last fall (15.45.10.4542). Removed the current driver, rebootet without network, installed the new one, rebootet. Device manager shows it as 21.45.16.4627 and it doesn't work at CC (same error Forretrio got). If I install the older one it shows up as 21....4542 and still doesn't work.

What else could I try?
MrS
3) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22443)
Posted 609 days ago by ExtraTerrestrial Apes
God catch! In the 1st post it's written that 6 is actually the minimum for threads. And it makes sense that too small values should not be allowed, since we've got vector ALUs and not scalar ones. I wrote that the difference between those thread values was quite small, 3% at most, so maybe I simply didn't average over enough WUs.

Regarding lut_size: yes, values of 1 - 2 high than what I wrote perform better. But you should see increased DRAM power consumption with that (doesn't matter much) and, more importantly, increased memory bandwidth consumption which probably slows down your other tasks. Hence I would not generally recommend this, especially if your CPU is also feeding a fast discrete GPU.

MrS
4) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22431)
Posted 610 days ago by ExtraTerrestrial Apes
Update: reducing the number of threads improved performance on my HD530. Surprisingly the optimum is 1, whereas AMD and nVidia prefer much higher values. The difference is small, i.e. ~3% going from 4 to 1 thread, but it's been consistent in my measurements nevertheless. Finally I would recommend these values:

Core i3 and lower
threads=1
kernels_per_reduction=64
lut_size=18
sieve_size=26


Core i5
threads=1
kernels_per_reduction=64
lut_size=19
sieve_size=26


Core i7
threads=1
kernels_per_reduction=64
lut_size=20
sieve_size=26


CPUs with Crystal Well (Iris Pro) may be able to profit further from far higher lut_size values. And again, I didn't test the screen responsivity.. but decreasing the number of threads should not have made this any worse.
5) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22372)
Posted 623 days ago by ExtraTerrestrial Apes
Which clinfo value are you reading to determine the GPU cache size?
For my R9 390X clinfo shows Cache size: 16384. However it has a 16 64KB blocks of L2 cache or a total of 1024KB of L2 cache.
For my HD6870 clinfo shows Cache size: 0. However it has 4 128KB blocks of L2 cache or a total of 512KB of L2 cache.

Ouch! Apparently I was a bit naive, thinking that OpenCL would provide means to reliably check which hardware it is running on (to optimize for this at run-time). Or would there be other means? I have no experience programming OpenCL.

MrS
6) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22371)
Posted 623 days ago by ExtraTerrestrial Apes
I have now settled on:
threads=4
kernels_per_reduction=64
lut_size=18
sieve_size=26

I hesitate to recommend these values generally, though, as I don't have my display attached to the iGPU and am not measuring responsivity, so this config may lag as hell.

I tested all other paramters and they did not make any statistically significant difference in those ranges:

threads=4..6 almost similar, 7+ slower
kernels_per_reduction=32..64 doesnt' matter, did not test lower
sieve_size=25..29, slight tendency towards worse results at 30

The threads showed the biggest response, so I should also check lower values.

MrS
7) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22347)
Posted 627 days ago by ExtraTerrestrial Apes
I've checked the impact on the GPU utilization of my GTX970 running POEM:

without CC: 92.8%
LUT 18 (2 MB): 92.8%
LUT 19 (4 MB): 91.8%
LUT 20 (8 MB): 87.5%

This makes a lot of sense since my i3 has 3 MB L3 cache. With this setting my iGPU would reach 211k RAC (if it runs CC 24/7), whereas with LUT=20 it would reach 232k RAC. I'm going for the slower config with LUT=18 now because for me the work POEM is worth (a lot) more than CC.

And generally both are respectable numbers, especially considering that the iGPU needs less than 10 W for that (I have mine running at reduced voltage)!

MrS
8) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22343)
Posted 628 days ago by ExtraTerrestrial Apes
Then go for this config :)
lut_size=20

... and maybe higher in a few days, if you want to test it. In the worst case CC and other BOINc projects would become slower.

MrS
9) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22339)
Posted 629 days ago by ExtraTerrestrial Apes
I'm still testing, but can say so far that increasing the LUT size improved the throughput the most, by far. I'm currently working with a value of 20, i.e. an 8 MB Loo Up Table size. I could probably improve things further going to larger numbers, but would not generally recommend to do so:

I'm using DDR4-3200 dual channel, i.e. pretty fast memory. Increasing LUT I can see the power consumption of my DRAM rising, from 1.7 W to around 3.7 W (e.g. HWinfo64 shows this). I know that if I run SETI and reach over 4 W, performance of other tasks suffers, which I want to avoid. My i3 only has 3 MB L3 cache, i.e. an 8 MB LUT exceeds the cache by a significant amount and hence the increased main memory access. On a fast GPU this should reduce performance, but apparently it'S faster for the HD530 to access main memory than to recompute the values stored in the LUT. With an i5 or i7 you may want to go higher, with slower main memory or other demanding tasks you may want to reduce it.

Apart from that I can add a word of caution towards changing the sieve size: increasing it reduced my runtimes significantly, but reduced the credits as well. The "seconds per credit" remain almost constant, with the default (26) performing the best in my case.

MrS
10) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22326)
Posted 631 days ago by ExtraTerrestrial Apes
Can the app be tested in a stand-alone mode, i.e. with a small yet representative task and without BOINC? This might be neat to find the optimal parameters automatically.

MrS
11) Message boards : Number crunching : Optimizing Collatz Sieve (Message 22317)
Posted 632 days ago by ExtraTerrestrial Apes
Slicker, would it make sense to read some parameters of the OpenCL device upon app startup to set it optimally? Building upon HAL's comment those could be:

threads:
Should match the GPUs Max work group size from clinfo. 7^2=128, 8^2=256, 9^2=512

and lut_size:
clinfo can read out the amount of L2 cache, so setting something that fits in there is better than setting a too small value just to be safe for any GPU. I.e. for my HD530 that's 524 kB, an unusually large size for such a relatively weak GPU. Increasing the lut size showed very nice performance gains (I'm approaching 2x speed-up in my testing now) over the default setting.

You could easily support a manual override of this as well: "if the user has set anything in the .config file, use this value instead".

MrS
12) Message boards : Number crunching : Intel HD 530 - Functional Driver (Message 22311)
Posted 634 days ago by ExtraTerrestrial Apes
I haven't had a Skylake driver problem with Collatz, but to avoid those initial failures you need to have the x86 and x64 "Microsoft Visual C++ Resistributable" installed. Not sure if 2012 or 2013, I've got both and since then the problems disappeared. There's a thread on this somewhere, but I can't find it right now.

MrS
13) Message boards : News : Intel GPUs Supported (Message 17099)
Posted 1656 days ago by ExtraTerrestrial Apes
Seems like the Intel bandwagon is slowly getting up to cruise speed: Einstein now also supports Intel GPUs!

MrS
14) Message boards : Number crunching : Not Requesting Tasks.... I want tasks! (Message 17077)
Posted 1659 days ago by ExtraTerrestrial Apes
Is anyone still using them? I know Aqua did at some point, but didn't hear of any other ones.

MrS
15) Message boards : Number crunching : How to: reduce CPU usage while running on Intel GPU (Message 17035)
Posted 1665 days ago by ExtraTerrestrial Apes
The goal is to crunch on the iGPU without using a full CPU thread and without loosing much performance. This can be achieved using the settings I propose in the 1st post. But performance of each WUs drops with these settings, hence the need to run 3 of them concurrently to get the approximately the same throughput as before.

As I wrote somewhere here.. whatever 4.09 changed didn't change anything for me. I still need to provide the *.config and the app_config.

@TPL: could you explain the problem, and what exactly you're doing? So far I only get from your post that there's some problem.

MrS
16) Message boards : Number crunching : How to: reduce CPU usage while running on Intel GPU (Message 17029)
Posted 1666 days ago by ExtraTerrestrial Apes
OK, I traced down the root of my inconsistencies: it's caused by running several POEM OpenCL tasks on an nVidia GPU along with CC on the iGPU. At some point this seems to disturb the iGPU crunching: GPU utilization, clock speeds and performance drop. I didn't notice this during the 1st large batch of tests since there was no POEM work available at that time.

MrS
17) Message boards : Number crunching : Crunching with an Intel GPU? (Message 17028)
Posted 1666 days ago by ExtraTerrestrial Apes
So far 1350 Mhz at ~1.10 V seems to be stable for me.. this is getting impressive :)
Well, with "stable" I mean no calculation errors in stand-alone mode or while crunching. 3D stuff will obviously break at some point.. but that's what my GTX660Ti is for.

MrS
18) Message boards : Number crunching : How to: reduce CPU usage while running on Intel GPU (Message 17026)
Posted 1666 days ago by ExtraTerrestrial Apes
The tests I wrote about in the 1st post were done with driver 9.18.10.3071 for Win8 64. Now I've got access to a 2nd system with an HD4000 and here running 3 CC tasks with the settings as written in the 1st post results in ~85% GPU load and it's not even reaching its top turbo speed consistently.

The main difference I can see is the driver version, according to GPU-Z:
1st system: 9.18.10.3071
2nd system: 9.18.10.3165

Noteworthy is that CPU usage is down even further for the 2nd system. I might try running 4 WUs on this rig tomorrow.

MrS
19) Message boards : Number crunching : Crunching with an Intel GPU? (Message 17022)
Posted 1666 days ago by ExtraTerrestrial Apes
For my taste 1.35 V is far too much for those delicate 22 nm transistors. I wouldn't want to go any higher than 1.20 V, although 1.25 V is probably also still fine. Besides, energy efficiency should take a huge hit at 1.35 V.

So far I have gradually lowered my iGPU voltage at 1250 MHz. Previously I had been at an offset of 0.08 V, but now I'm still stable at stock voltage (set by Intel for 1150 MHz). Yesterday I then made a bold move and went straight to 1300 MHz at stock voltage.. still seems to work.

@Zy: using a normal 3D graphics benchmark should work in the way "ih it's stable here, it's probably also stable while crunching". Not sure how Intels GPU works, but often the ALUs can take higher clock speeds than the fixed-function units for normal rendering (hence nVidias idea with the hot clock from G80 until Fermi).

What I tried now was to run 3DMark (Valley is probably better) and CC in stand-alone mode, as I've got a reference result for that. Didn't throw any error yet, though. And neither did normal crunching.

MrS
20) Message boards : Number crunching : Not Requesting Tasks.... I want tasks! (Message 17016)
Posted 1667 days ago by ExtraTerrestrial Apes
You could run these in stand-alone mode as well. But if you needed more statistic the app_info could indeed be of good use.

MrS


Next 20

Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.