Longer Sieve WUs
log in

Advanced search

Message boards : Number crunching : Longer Sieve WUs

1 · 2 · Next
Author Message
Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 23308 - Posted: 9 Nov 2016, 22:28:08 UTC

Given how the fastest GPUs are completing a WU in less than a minute, it is once again time to increase the size of the WUs. I'm shooting for 15 minutes or so per WU for the fast GPUs. That same WU takes about two hours on my AMD HD6970. That means the older, slower GPUs will take much longer. The increase in size _should_ help people get larger caches of WUs and also reduce the network bandwidth and server load.

So far, I've tested WUs that are 8x larger and 16x larger than the current sieve WUs and the app seems to handle them OK. I do NOT want to go back to having multiple sizes of WUs since most people never selected the appropriate size for their machine which caused lots of complaints about not getting the ones they wanted and secondly, the server can't guess very well since the integer operations benchmark used by the BOINC client leaves something to be desired (e.g. Do you really think an Android phone is half the speed of an Intel i7?)

So, what are your thoughts?

Profile JohnMD
Avatar
Send message
Joined: 25 Mar 14
Posts: 29
Credit: 15,481,538
RAC: 0
Message 23311 - Posted: 9 Nov 2016, 23:12:06 UTC - in response to Message 23308.

Collatz has a significant number of users without a suitable GPU - current CPU WU's require 10+ hours processing.
A factor 16 will bring this to at least a week for fast CPU's - several weeks for many others.

Could you consider dynamically choosing the size ?
For example, if successful WU's are returned with less than - say - 10 minutes elapse-time, so double-up size for following WU's.

EG
Avatar
Send message
Joined: 9 Jun 13
Posts: 74
Credit: 28,733,364,065
RAC: 27,420,207
Message 23323 - Posted: 10 Nov 2016, 7:17:35 UTC - in response to Message 23308.
Last modified: 10 Nov 2016, 7:21:36 UTC

Given how the fastest GPUs are completing a WU in less than a minute, it is once again time to increase the size of the WUs. I'm shooting for 15 minutes or so per WU for the fast GPUs. That same WU takes about two hours on my AMD HD6970. That means the older, slower GPUs will take much longer. The increase in size _should_ help people get larger caches of WUs and also reduce the network bandwidth and server load.

So far, I've tested WUs that are 8x larger and 16x larger than the current sieve WUs and the app seems to handle them OK. I do NOT want to go back to having multiple sizes of WUs since most people never selected the appropriate size for their machine which caused lots of complaints about not getting the ones they wanted and secondly, the server can't guess very well since the integer operations benchmark used by the BOINC client leaves something to be desired (e.g. Do you really think an Android phone is half the speed of an Intel i7?)

So, what are your thoughts?


As long as the credit per WU is proportional, I see no problem....

1 min WU = 3600 credits....

15 min WU should = 54,000 credits.

This would be the most equitable result, only difference being the time it takes to complete the WU being longer reducing the load on the server.

I could live with that.

And it eliminates the old and slow GPU issue. Everyone still gets the same credit for the same work. The work just takes longer to complete.
____________

Profile eXtreme Warhead
Avatar
Send message
Joined: 18 Nov 12
Posts: 15
Credit: 229,064,135
RAC: 123,040
Message 23324 - Posted: 10 Nov 2016, 9:07:40 UTC

As long as the credit per WU is proportional, I see no problem....


i only can speak about the nvidia wu, because my amd does not get any wus, i will have no problem with this. older cards with 2h on a 6970 (so my 5850 will take much longer for that) runtime isn't too bad...thats ok i think

Profile entigy
Send message
Joined: 1 Jul 10
Posts: 11
Credit: 155,051,150
RAC: 238,721
Message 23340 - Posted: 10 Nov 2016, 19:32:14 UTC - in response to Message 23323.

+1 Here.

I don't mind doing fewer, slower WUs if the credit is increased.

Rymorea
Send message
Joined: 14 Oct 14
Posts: 100
Credit: 200,411,819
RAC: 4
Message 23342 - Posted: 11 Nov 2016, 4:01:21 UTC

+1

If task time & credit ratio will be balance in a good way :)
____________
Seti@home Classic account User ID 955 member since 8 Sep 1999 classic CPU time 539,770 hours

JLDun
Send message
Joined: 27 Jan 11
Posts: 10
Credit: 192,485
RAC: 242
Message 23343 - Posted: 11 Nov 2016, 4:54:02 UTC
Last modified: 11 Nov 2016, 4:55:23 UTC

Edit

(Ignore me... I misread this as about CPU, not GPU.)
____________

rcthardcore
Send message
Joined: 15 May 10
Posts: 9
Credit: 42,217,413
RAC: 5,791
Message 23377 - Posted: 14 Nov 2016, 18:15:33 UTC

+1 for this. GPU sieve workunits on my GTX 980ti take about 2 minutes to run.

Eric
Send message
Joined: 20 Jan 13
Posts: 12
Credit: 655,848,168
RAC: 0
Message 23386 - Posted: 16 Nov 2016, 1:38:23 UTC - in response to Message 23308.

+1, with corresponding proportional credit adjustments please.

Padanian
Send message
Joined: 28 May 10
Posts: 15
Credit: 685,456,078
RAC: 1,427,050
Message 23425 - Posted: 21 Nov 2016, 15:26:29 UTC

+1

My 750ti takes 11 minutes to complete. I could live with 90minutes or 180minutes.

el_teniente
Send message
Joined: 9 Sep 15
Posts: 40
Credit: 3,514,908,890
RAC: 7,302,087
Message 23432 - Posted: 23 Nov 2016, 21:25:20 UTC

+1
longer WUs -> less server connections, less DB records

Rymorea
Send message
Joined: 14 Oct 14
Posts: 100
Credit: 200,411,819
RAC: 4
Message 23580 - Posted: 17 Dec 2016, 8:29:46 UTC
Last modified: 17 Dec 2016, 8:31:24 UTC

I think longer WUs start to come GPUs.

Run time CPU time Credit Application (before and after)

287.87 - 18.08 - 4,522.38 - Collatz Sieve v1.21 (opencl_amd_gpu)
2,373.31 - 132.20 - 37,365.24 - Collatz Sieve v1.21 (opencl_amd_gpu)

570.58 - 0.31 - 4,332.22 - Collatz Sieve v1.21 (opencl_nvidia_gpu)
4,419.59 - 0.63 - 37,480.01 - Collatz Sieve v1.21 (opencl_nvidia_gpu)
____________
Seti@home Classic account User ID 955 member since 8 Sep 1999 classic CPU time 539,770 hours

Jack
Send message
Joined: 10 Mar 10
Posts: 4
Credit: 83,208,647
RAC: 0
Message 23587 - Posted: 17 Dec 2016, 17:23:16 UTC
Last modified: 17 Dec 2016, 17:42:39 UTC

i guess i've started to get longer WUs for GPU. unless something went wonky with my computer.

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
Collatz Conjecture Sieve 1.21 Windows x86_64 for OpenCL
Written by Slicker (Jon Sonntag) of team SETI.USA
Based on the AMD Brook+ kernels by Gipsel of team Planet 3DNow!
Sieve code and OpenCL optimization provided by Sosiris of team BOINC@Taiwan
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 64
threads 2^10 (1024)
lut_size 18 (2097152 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Platform NVIDIA
Device 0000020C05614670
Max Dimensions 3
Max Work Items 1024 1024 64
Max Work Groups 1024
Max Kernel Threads 1024
Device Vendor NVIDIA Corporation
Name GeForce GTX 980 Ti
Driver Version 373.06
OpenCL Version OpenCL 1.2 CUDA
actual threads 1024
Start 3019898951266392342528
Stop 3019899004042950475776
Best 3019898952152652638367
Highest steps 2044
Total steps 358104709262722
Average steps 570
CPU time 0.5625 seconds
Elapsed time 417.903seconds
05:15:50 (8680): called boinc_finish

</stderr_txt>
]]>


and one straggler earlier was this

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
Collatz Conjecture Sieve 1.21 Windows x86_64 for OpenCL
Written by Slicker (Jon Sonntag) of team SETI.USA
Based on the AMD Brook+ kernels by Gipsel of team Planet 3DNow!
Sieve code and OpenCL optimization provided by Sosiris of team BOINC@Taiwan
Collatz Config Settings:
verbose 1 (yes)
kernels/reduction 64
threads 2^10 (1024)
lut_size 18 (2097152 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)
Platform NVIDIA
Device 000002C747813F50
Max Dimensions 3
Max Work Items 1024 1024 64
Max Work Groups 1024
Max Kernel Threads 1024
Device Vendor NVIDIA Corporation
Name GeForce GTX 980 Ti
Driver Version 373.06
OpenCL Version OpenCL 1.2 CUDA
actual threads 1024
Start 3009427725890988539904
Stop 3009427732488058306560
Best 3009427726170699087719
Highest steps 1775
Total steps 44769881447430
Average steps 570
CPU time 0.546875 seconds
Elapsed time 52.2847seconds
03:43:25 (5096): called boinc_finish

</stderr_txt>
]]>


EDIT cool thing is that, i guess you all knew this but i'm bad at math, that the extra time spent on one WU is worth the same. i could do 7-8 of the previous length units in the time it takes to do one of these. although i do feel bad for those with lesser gpus

Profile [FB]The Temptations
Avatar
Send message
Joined: 10 Jul 10
Posts: 14
Credit: 512,806,085
RAC: 1,285,181
Message 23591 - Posted: 18 Dec 2016, 10:14:47 UTC

If I understand the small cg its penalize or you have to work 24/24 or buy big cg .....
____________

Nige Welch
Send message
Joined: 29 Mar 14
Posts: 1
Credit: 77,844,471
RAC: 460,167
Message 23594 - Posted: 18 Dec 2016, 16:10:59 UTC

I suppose I'll have to be the lone dissenter. I have an old Mac Pro upgraded to 8 cores of 2.66Ghz Xeon goodness with a Quadro FX 3800. It takes (or rather did) 45 minutes on the GPU and 12 hours on the CPU. I get not far off 5000 points per WU. Today I find my work stack has few jobs in it and according to the log my work cache is full. Confused I finally realise a GPU task is taking 6 hours and a CPU one 100 hours !!! I was given no warning of this and am rather pi**ed off. What happens if my rig spends 99 hours on a job and there's a computing/validation error?? Guess it's find another project time.

Tackleway
Send message
Joined: 29 Sep 13
Posts: 53
Credit: 1,736,808,645
RAC: 1,759,677
Message 23596 - Posted: 18 Dec 2016, 20:34:09 UTC - in response to Message 23594.
Last modified: 18 Dec 2016, 20:35:47 UTC

I suppose I'll have to be the lone dissenter. I have an old Mac Pro upgraded to 8 cores of 2.66Ghz Xeon goodness with a Quadro FX 3800. It takes (or rather did) 45 minutes on the GPU and 12 hours on the CPU. I get not far off 5000 points per WU. Today I find my work stack has few jobs in it and according to the log my work cache is full. Confused I finally realise a GPU task is taking 6 hours and a CPU one 100 hours !!! I was given no warning of this and am rather pi**ed off. What happens if my rig spends 99 hours on a job and there's a computing/validation error?? Guess it's find another project time.


My old Dell got two tasks this p.m. with remaining hours of 8+ DAYS to run!
I think not! I've changed it to Micro Coll.. and rejected the huge tasks.
I'll be watching my other toys very carefully over the next week Ho Ho Ho.
____________

fractal
Send message
Joined: 11 Jul 09
Posts: 14
Credit: 1,001,340,489
RAC: 0
Message 23597 - Posted: 18 Dec 2016, 20:45:31 UTC - in response to Message 23308.

I do NOT want to go back to having multiple sizes of WUs since most people never selected the appropriate size for their machine which caused lots of complaints about not getting the ones they wanted.

If we were voting, I would vote for a optional larger WU with appropriately larger credit. Donors would see a credit from reducing the number of turnarounds. No need to increase it.

And, even though you say you do NOT want to go to it, I would make the default the current set of work. Donors MUST select the newer units if they want them. And, those who can not figure out how to select them, or can not figure out how to run optimized settings can go bleep themselves ;)

So, default doesn't change. AFK drivers continue to get what they got. Attentive donors can get longer work some of the time. The only cost is an additional work queue.

Gero-T
Send message
Joined: 9 Oct 16
Posts: 2
Credit: 18,777,253,939
RAC: 85,901,541
Message 23598 - Posted: 18 Dec 2016, 21:55:31 UTC

For me everything is ok! longer wu´s bigger credit less network traffic. who wants to complain?
And again ist not only on our side with network etc. on the server side there is lower work too,
and if you remember: servers are running 24/7...

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 23600 - Posted: 19 Dec 2016, 2:51:12 UTC - in response to Message 23598.

For me everything is ok! longer wu´s bigger credit less network traffic. who wants to complain?
And again ist not only on our side with network etc. on the server side there is lower work too,
and if you remember: servers are running 24/7...


And racking up 1 TB of network traffic a month. Hopefully the larger sieve size will reduce that. There's no way I'm going to spend $500 a month to host it on "the cloud". To keep the slow machines happy, I'll probably create a a smaller sieve app that people have to opt in to run so that by default, they get the larger WUs. In the past 90% of the users never bothered to set their preferences so millions of micro WUs were being run per day by machines that should have been running the large or huge WUs. Since BOINC continues to base everything on floating point operations per second and since this project uses ZERO floating point calculations, the estimates are crap (as are the benchmarks which make an Andoid seem as powerful as an i7 even though they take 80 times as long to finish a WU) and there's no decent way to determine the real processing speed, there's no way, short of writing my own entire BOINC scheduler, to send out the appropriate sized WUs to each machine.

EG
Avatar
Send message
Joined: 9 Jun 13
Posts: 74
Credit: 28,733,364,065
RAC: 27,420,207
Message 23601 - Posted: 19 Dec 2016, 3:34:51 UTC - in response to Message 23600.
Last modified: 19 Dec 2016, 3:43:40 UTC

For me everything is ok! longer wu´s bigger credit less network traffic. who wants to complain?
And again ist not only on our side with network etc. on the server side there is lower work too,
and if you remember: servers are running 24/7...


And racking up 1 TB of network traffic a month. Hopefully the larger sieve size will reduce that. There's no way I'm going to spend $500 a month to host it on "the cloud". To keep the slow machines happy, I'll probably create a a smaller sieve app that people have to opt in to run so that by default, they get the larger WUs. In the past 90% of the users never bothered to set their preferences so millions of micro WUs were being run per day by machines that should have been running the large or huge WUs. Since BOINC continues to base everything on floating point operations per second and since this project uses ZERO floating point calculations, the estimates are crap (as are the benchmarks which make an Andoid seem as powerful as an i7 even though they take 80 times as long to finish a WU) and there's no decent way to determine the real processing speed, there's no way, short of writing my own entire BOINC scheduler, to send out the appropriate sized WUs to each machine.


The longer WU's are running fine for me also, no perceived drop in credits. Although, the difference in credits over time from one unit to the next is magnified so it shows up.

I agree with the way boinc is set up there is nothing that can be done about it. it will average out over time which is what the high powered crunchers are showing as we are receiving the same average credit now with the larger WU's as we were getting with the smaller ones, which was my major concern...

An opt in smaller wu for those with slower HW would probably be the best solution. Or just keeping a modicum of the old small units around for those who opt in. (since they run in the same average time)

The problem with this is going to be those that like to run their hardware as efficiently as possible to generate the largest PPD possible. if it is found that the small WU are more efficient at delivering PPD, then those are the WU's that are going to be in demand.

This is what happened last time you had different sized WU's, the smallest wu's delivered the largest PPD. and naturally it followed that those racing at the top set up for what generated the most points.

Just something to consider.

I'm all for one size WU, delivering the same amount of work to every machine and let the machine deal with it as best it can. Then there is no point to scaling out the WU's as far as size.

Technology does march on, there is no way around it.

If a smaller WU gives more PPD than a large one over time, eventually someone with efficient hardware is going to run them exclusively, which will eventually force everyone to run the more efficient WU's if they wish to keep up.

Been down that road before Jon, we know where it leads.
____________

1 · 2 · Next
Post to thread

Message boards : Number crunching : Longer Sieve WUs


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.