Workload balancing NVIDIA and AMD/ATI GPU broken
log in

Advanced search

Message boards : Windows : Workload balancing NVIDIA and AMD/ATI GPU broken

Author Message
OM
Send message
Joined: 8 May 10
Posts: 2
Credit: 2,648,657,631
RAC: 3,244,023
Message 22197 - Posted: 2 Apr 2016, 12:04:45 UTC

Hi!

I have been crunching Collatz for a while.
Now, there is a problem with getting work units for my rig.
It has one NVIDIA and one AMD/ATI GPU which have similar speed.
The boinc client requests similar amount of work for both types, but gets work for only one type, even if the other GPU is idle.
The problem is then that the job cache of 100 work unit fills with work units for one GPU type only, so that the other GPU remains idle.

The only solution up to now is to abort some work units, let's say 30, then the request for work succeeds.
Here are some logs from the boinc client:


02.04.2016 13:58:51 | | Starting BOINC client version 7.6.22 for windows_x86_64

02.04.2016 13:58:52 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)



31.03.2016 20:36:59 | Collatz Conjecture | Reporting 5 completed tasks
31.03.2016 20:36:59 | Collatz Conjecture | Requesting new tasks for NVIDIA GPU and AMD/ATI GPU
31.03.2016 20:36:59 | Collatz Conjecture | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
31.03.2016 20:36:59 | Collatz Conjecture | [sched_op] NVIDIA GPU work request: 422537.02 seconds; 0.00 devices
31.03.2016 20:36:59 | Collatz Conjecture | [sched_op] AMD/ATI GPU work request: 420586.75 seconds; 0.00 devices
31.03.2016 20:37:01 | Collatz Conjecture | Scheduler request completed: got 5 new tasks
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] Server version 703
31.03.2016 20:37:01 | Collatz Conjecture | Project requested delay of 303 seconds
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] estimated total CPU task duration: 0 seconds
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] estimated total AMD/ATI GPU task duration: 1149 seconds


31.03.2016 20:42:05 | Collatz Conjecture | Reporting 4 completed tasks
31.03.2016 20:42:05 | Collatz Conjecture | Requesting new tasks for NVIDIA GPU and AMD/ATI GPU
31.03.2016 20:42:05 | Collatz Conjecture | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
31.03.2016 20:42:05 | Collatz Conjecture | [sched_op] NVIDIA GPU work request: 423062.67 seconds; 0.00 devices
31.03.2016 20:42:05 | Collatz Conjecture | [sched_op] AMD/ATI GPU work request: 419954.59 seconds; 0.00 devices
31.03.2016 20:42:07 | Collatz Conjecture | Scheduler request completed: got 4 new tasks
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] Server version 703
31.03.2016 20:42:07 | Collatz Conjecture | Project requested delay of 303 seconds
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] estimated total CPU task duration: 0 seconds
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] estimated total AMD/ATI GPU task duration: 919 seconds



02.04.2016 10:55:39 | Collatz Conjecture | Reporting 29 completed tasks
02.04.2016 10:55:39 | Collatz Conjecture | Requesting new tasks for NVIDIA GPU and AMD/ATI GPU
02.04.2016 10:55:39 | Collatz Conjecture | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
02.04.2016 10:55:39 | Collatz Conjecture | [sched_op] NVIDIA GPU work request: 417068.61 seconds; 0.00 devices
02.04.2016 10:55:39 | Collatz Conjecture | [sched_op] AMD/ATI GPU work request: 432000.00 seconds; 1.00 devices
02.04.2016 10:55:42 | Collatz Conjecture | Scheduler request completed: got 29 new tasks
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] Server version 703
02.04.2016 10:55:42 | Collatz Conjecture | Project requested delay of 303 seconds
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] estimated total CPU task duration: 0 seconds
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] estimated total NVIDIA GPU task duration: 6184 seconds
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] estimated total AMD/ATI GPU task duration: 0 seconds


The first two logs show the problem occurring for both GPUs.
The last log shows the same behavior although one (AMD/ATI) GPU is idle.

Is there any configuration of the boinc client or the collatz application for changing this behavior?
Thank you for your help.

Regards, OM

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 22199 - Posted: 2 Apr 2016, 17:09:10 UTC - in response to Message 22197.

Hi!

I have been crunching Collatz for a while.
Now, there is a problem with getting work units for my rig.
It has one NVIDIA and one AMD/ATI GPU which have similar speed.
The boinc client requests similar amount of work for both types, but gets work for only one type, even if the other GPU is idle.
The problem is then that the job cache of 100 work unit fills with work units for one GPU type only, so that the other GPU remains idle.

The only solution up to now is to abort some work units, let's say 30, then the request for work succeeds.
Here are some logs from the boinc client:


02.04.2016 13:58:51 | | Starting BOINC client version 7.6.22 for windows_x86_64

02.04.2016 13:58:52 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)



31.03.2016 20:36:59 | Collatz Conjecture | Reporting 5 completed tasks
31.03.2016 20:36:59 | Collatz Conjecture | Requesting new tasks for NVIDIA GPU and AMD/ATI GPU
31.03.2016 20:36:59 | Collatz Conjecture | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
31.03.2016 20:36:59 | Collatz Conjecture | [sched_op] NVIDIA GPU work request: 422537.02 seconds; 0.00 devices
31.03.2016 20:36:59 | Collatz Conjecture | [sched_op] AMD/ATI GPU work request: 420586.75 seconds; 0.00 devices
31.03.2016 20:37:01 | Collatz Conjecture | Scheduler request completed: got 5 new tasks
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] Server version 703
31.03.2016 20:37:01 | Collatz Conjecture | Project requested delay of 303 seconds
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] estimated total CPU task duration: 0 seconds
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
31.03.2016 20:37:01 | Collatz Conjecture | [sched_op] estimated total AMD/ATI GPU task duration: 1149 seconds


31.03.2016 20:42:05 | Collatz Conjecture | Reporting 4 completed tasks
31.03.2016 20:42:05 | Collatz Conjecture | Requesting new tasks for NVIDIA GPU and AMD/ATI GPU
31.03.2016 20:42:05 | Collatz Conjecture | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
31.03.2016 20:42:05 | Collatz Conjecture | [sched_op] NVIDIA GPU work request: 423062.67 seconds; 0.00 devices
31.03.2016 20:42:05 | Collatz Conjecture | [sched_op] AMD/ATI GPU work request: 419954.59 seconds; 0.00 devices
31.03.2016 20:42:07 | Collatz Conjecture | Scheduler request completed: got 4 new tasks
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] Server version 703
31.03.2016 20:42:07 | Collatz Conjecture | Project requested delay of 303 seconds
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] estimated total CPU task duration: 0 seconds
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
31.03.2016 20:42:07 | Collatz Conjecture | [sched_op] estimated total AMD/ATI GPU task duration: 919 seconds



02.04.2016 10:55:39 | Collatz Conjecture | Reporting 29 completed tasks
02.04.2016 10:55:39 | Collatz Conjecture | Requesting new tasks for NVIDIA GPU and AMD/ATI GPU
02.04.2016 10:55:39 | Collatz Conjecture | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
02.04.2016 10:55:39 | Collatz Conjecture | [sched_op] NVIDIA GPU work request: 417068.61 seconds; 0.00 devices
02.04.2016 10:55:39 | Collatz Conjecture | [sched_op] AMD/ATI GPU work request: 432000.00 seconds; 1.00 devices
02.04.2016 10:55:42 | Collatz Conjecture | Scheduler request completed: got 29 new tasks
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] Server version 703
02.04.2016 10:55:42 | Collatz Conjecture | Project requested delay of 303 seconds
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] estimated total CPU task duration: 0 seconds
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] estimated total NVIDIA GPU task duration: 6184 seconds
02.04.2016 10:55:42 | Collatz Conjecture | [sched_op] estimated total AMD/ATI GPU task duration: 0 seconds


The first two logs show the problem occurring for both GPUs.
The last log shows the same behavior although one (AMD/ATI) GPU is idle.

Is there any configuration of the boinc client or the collatz application for changing this behavior?
Thank you for your help.

Regards, OM


As you have found out, BOINC may request work for all three but the server only sends work for the first one it finds. The others are ignored until subsequent requests. So, if the cache isn't full, it only ever gets data for the first one. About all you can do for now it so reduce your cache size to a small enough size that the first nVidia request will fill it and it will then get AMD WUs on the next request.

I am working on getting a Hyper-V VM running Ubuntu LTS server for BOINC up and running so I can move the Collatz server off the current server to a more robust box. Supporting 19 platforms and plan classes is taxing enough for the machine. Doubling or tripling that for multiple WU sizes takes literally double or triple the processing and I still have to work out the contention issue for creating WUs since each work generator daemon wants to access the same record in the same table at the same time. When there are lots of WUs to generate (e.g. a fast GPU wants 200 of them) the other daemons can timeout while waiting for access to the record.

OM
Send message
Joined: 8 May 10
Posts: 2
Credit: 2,648,657,631
RAC: 3,244,023
Message 22204 - Posted: 3 Apr 2016, 14:21:44 UTC - in response to Message 22199.

Thank you for your reply.


About all you can do for now it so reduce your cache size to a small enough size that the first nVidia request will fill it and it will then get AMD WUs on the next request.


As the third log block shows, the AMD/ATI GPU remains idle, since the server only sends work for NVIDIA. The fact that AMD/ATI is idle does not lead to new work units. The job cache remains filled with NVIDIA workunits. There were 9 consecutive requests within 1 h and no job for AMD/ATI.

The situation occurs, even if at the beginning there are enough work units for both GPU types.

Resetting the project has no effect except that the situation is the other way round, no nVidia WUs.

Only aborting WUs succeeds in getting new WUs for the other GPU type.
Is there any log flag to find out, what is getting on?
Is it a problem of the BOINC client or of the server?

Regards, OM

PS: Keep on going, you're doing a great job!


Post to thread

Message boards : Windows : Workload balancing NVIDIA and AMD/ATI GPU broken


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.