using both nvidia-gpu and intel-gpu
log in

Advanced search

Message boards : Number crunching : using both nvidia-gpu and intel-gpu

Author Message
Werinbert
Send message
Joined: 7 May 13
Posts: 24
Credit: 100,197,949
RAC: 0
Message 20004 - Posted: 30 Oct 2014, 0:21:01 UTC

My main crunching computer is running solos on my nvidia card (750Ti) and on my intel igpu (HD4000). The problem is that the igpu WU in reality needs 1 cpu core not .538. I thought about modifying the app_config file but this would also affect the nvidia work units (which are running just fine with minimal cpu...I am running cuda).

The only solution I have come up with is to manually set bionc to run only 90% of my cores. Has anyone come up with a better solution?
____________
"For those who have so little patience that they equate a single day to eternity: yes, the project is dead. For all the others, the project is back online. :-)" -- Slicker

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 20007 - Posted: 31 Oct 2014, 3:54:43 UTC

http://boinc.berkeley.edu/wiki/Client_configuration

Take another look at the app_config.xml, specifically the app_version which allows for a plan class (cuda55, opencl_intel_gpu etc.) so you can set the avg_ncpus for each plan class.

Profile FalconFly
Avatar
Send message
Joined: 25 Oct 09
Posts: 12
Credit: 207,961,802
RAC: 0
Message 20016 - Posted: 31 Oct 2014, 12:34:01 UTC - in response to Message 20007.
Last modified: 31 Oct 2014, 12:49:31 UTC

From what I've seen, when running fully optimized (!), Collatz needs only a tiny fraction of a core (even on high-end GPUs).

What little it needs it then takes by running (by default) at a slightly higher Priority than CPU tasks.
The key is using highly optimized parameters, then your CPU can be easily fully loaded with other tasks without any problems, while your GPUs run at max. potential.

Under such optimized conditions, Collatz will still take any free CPU core and put a high workload on it - but it basically won't affect runtimes at all (not sure what the CPU core is doing all that time, but from the looks it seems to be useless overhead ?).

Only when running the extremely conservative default values, the free CPU core will do wonders to computing times (which will still be about 2x to 2.5x slower than running fully optimized, while additionally having all CPU cores available for other tasks).

I'm running all GPUs I have with anywhere from 0.1% CPU load to max. ~8% GPU load (R9 290), with all CPUs fully loaded on other tasks.
That way, Collatz is perfectly suited for running along with very CPU-intensive projects (i.e. SIMAP).
On less CPU intensive projects (i.e. Constellation), CPU load taken will be higher (upto a full core) but still doesn't affect Collatz runtimes.

For your GTX 750Ti I'd recommend :
verbose=1
threads=10
items_per_kernel=21
kernels_per_reduction=9
sleep=1


For your HD4000 I'd recommend :
verbose=1
threads=8
items_per_kernel=20
kernels_per_reduction=9
sleep=1


I'd strongly suggest you try those parameters and then observe the changed Collatz behaviour. It should work just as you need, plus you'll literally double your Collatz output (at the price of higher GPU temperatures due to 100% workload).
____________

Profile FalconFly
Avatar
Send message
Joined: 25 Oct 09
Posts: 12
Credit: 207,961,802
RAC: 0
Message 20023 - Posted: 1 Nov 2014, 20:03:53 UTC - in response to Message 20016.

Errata :

Seems either my estimate for optimum HD4000 parameters is off - or the intel OpenCL Application is behaving different.

Either case, after testing with my own HD4000, I saw full CPU utilization (required for performance), so my previous statement seems wrong at least for intel OpenCL :p
____________

Werinbert
Send message
Joined: 7 May 13
Posts: 24
Credit: 100,197,949
RAC: 0
Message 20026 - Posted: 3 Nov 2014, 5:39:14 UTC

I have been testing the values given by FalconFly.

GTX 750Ti:
verbose=1
threads=10 (previously I had 8)
items_per_kernel=21 (previously I had 20)
kernels_per_reduction=9
sleep=1

I have seen no change in time nor credits per time. So this is a wash.

HD4000:
(previously I used the defaults)
verbose=1
threads=8
items_per_kernel=20
kernels_per_reduction=9
sleep=1

I haven't run enough tests yet to be assured of my answer... the run time has gone down by about 10-15% (this is good) but I now have noticeable (if small) amount of lag (not so good). As I am often using this computer for other things I not sure I like the lag and will probably go back to the defaults. It is nice to have these numbers but, they do not solve my problem.

As to my original question I will shortly test the solution given by Slicker. I read a while ago about the new app_version tag in the app_config file, and promptly forgot all about it as I never expected to use it. Thanks for the reminder. I fully expect this to solve my problem (so long as I can find the right class names and keep myself from entering any typos).
____________
"For those who have so little patience that they equate a single day to eternity: yes, the project is dead. For all the others, the project is back online. :-)" -- Slicker

Profile FalconFly
Avatar
Send message
Joined: 25 Oct 09
Posts: 12
Credit: 207,961,802
RAC: 0
Message 20029 - Posted: 5 Nov 2014, 2:37:21 UTC - in response to Message 20026.
Last modified: 5 Nov 2014, 2:39:38 UTC

I've done some additional tests and it seems the intel HD4000 isn't taking the harsh optimization too well.

I'd say this should work better :

verbose=1
threads=8
items_per_kernel=17 (Note : 18 might work as well)
kernels_per_reduction=8
sleep=1

Also note that of course the HD4000 (like any iGPU) varies greatly in performance depending on other CPU workload (since it's basically always RAM bandwidth starved and any CPU task competes for that limited bandwidth).
That actually makes it a bit tough finding out the sweet spot for optimization as runtimes have a generally much greater variance depending on CPU loads by other tasks.
____________

Werinbert
Send message
Joined: 7 May 13
Posts: 24
Credit: 100,197,949
RAC: 0
Message 20030 - Posted: 5 Nov 2014, 22:09:45 UTC

using app_config with the following:

<app_config>
<app_version>
<app_name>solo_collatz</app_name>
<plan_class>opencl_intel_gpu</plan_class>
<avg_ncpus>1.0</avg_ncpus>
<ngpus>1.0</ngpus>
</app_version>
</app_config>

I am still getting 0.538 cpus not 1.0 as hoped for the intel gpu. I am probably missing something, I just don't know what.
____________
"For those who have so little patience that they equate a single day to eternity: yes, the project is dead. For all the others, the project is back online. :-)" -- Slicker

Werinbert
Send message
Joined: 7 May 13
Posts: 24
Credit: 100,197,949
RAC: 0
Message 20031 - Posted: 6 Nov 2014, 0:31:29 UTC - in response to Message 20030.

I stand corrected, the app info is working almost like expected. The igpu WU is in fact using 1 cpu thread for processing however it still shows up on the Boinc Manager as 0.538.
____________
"For those who have so little patience that they equate a single day to eternity: yes, the project is dead. For all the others, the project is back online. :-)" -- Slicker

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 30 Aug 09
Posts: 219
Credit: 676,877,192
RAC: 23,722
Message 20033 - Posted: 6 Nov 2014, 15:03:18 UTC - in response to Message 20031.

I stand corrected, the app info is working almost like expected. The igpu WU is in fact using 1 cpu thread for processing however it still shows up on the Boinc Manager as 0.538.


Have you restarted BOINC since you edited your app_config, BOINC will show the old values until you either get new work units or restart the client.
____________

Werinbert
Send message
Joined: 7 May 13
Posts: 24
Credit: 100,197,949
RAC: 0
Message 20035 - Posted: 7 Nov 2014, 1:18:35 UTC - in response to Message 20033.

I stand corrected, the app info is working almost like expected. The igpu WU is in fact using 1 cpu thread for processing however it still shows up on the Boinc Manager as 0.538.


Have you restarted BOINC since you edited your app_config, BOINC will show the old values until you either get new work units or restart the client.


Yeah, I realized that the old values were being shown. I also found out by inadvertent testing that just changing the app_info file's name (so it would not load) and then reloading the app_info files via the Advance menu will not reset the app_info values to default. I am happy to say everything is working as I want it in any of various scenarios.

Now my current endeavor is to run 8 PrimeGrid 321 (llr) tasks along with solos on my GPU and iGPU and seeing how far it heats up as I have installed a new cooler on my machine. [OC'ing will come sooner or later.]
____________
"For those who have so little patience that they equate a single day to eternity: yes, the project is dead. For all the others, the project is back online. :-)" -- Slicker

Profile FalconFly
Avatar
Send message
Joined: 25 Oct 09
Posts: 12
Credit: 207,961,802
RAC: 0
Message 20037 - Posted: 7 Nov 2014, 2:22:42 UTC - in response to Message 20035.

Keep mind that the WorkUnits currently processed will still have their old parameters to run.

Only when a new WorkUnit is started will the new parameters be used.

Same is valid for the app_info.xml AFAIK, currently running WorkUnits will have this data stored and used from the client_info.xml in the BOINC main directory.
Should changes need to immediately go into effect, editing the client_info.xml (in the right spot) will do the trick.
____________


Post to thread

Message boards : Number crunching : using both nvidia-gpu and intel-gpu


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.