Posts by marmot

1) Message boards : Cafe : The Last Person To Post Here Wins 7. (Message 1274)
Posted 15 days ago by marmot
Post:
Thanks, Mikey, for the Win!

The new Winning Number is a Post Count Number which I have committed to memory. (...What was that number, again???)


[Rules:]

1. Winner of the last thread starts the new one with a post of these rules.
2. Chooses a future event which triggers the winner, e.g. post number, date-time, and records it privately somehow (yellow sticky, etc.)
3. Everyone has fun posting until the target event is reached.
4. When the target is reached, old winner posts declaration of new winner and also PMs them.
5. Shortly after this a Mod will lock the recently won thread.
6. No double posts. Violator buys a round for the house in The SETI Refugee Pub.
7. Additional rule -- No posts that APPEAR to be blank - these will be ignored or deleted. (Sig lines do not count as part of a post.)

Rules borrowed from SETI's LPTPHW Thread. (Substitute "Rocky's" for "The SETI Refugee Pub".)



Great choice of rule set. Thanks.

Seen all Doctor Who episodes, at least twice and sometimes 7 or 8 times, from black n white Hartnell to the second season of Tennant, and have to say that my favorite Doctor/Companion ends up being Eccleston/Piper. Baker/Tamm & Ward my second favorite.
2) Message boards : Cafe : Last Person to Post---version 6.0 (Message 1262)
Posted 17 days ago by marmot
Post:
What is the requirement to winning this?

Is there a rule, time stamped and stored, to be revealed at the end of the game?
3) Message boards : Cafe : Last Person to Post - Version 5! (Message 1261)
Posted 17 days ago by marmot
Post:
AND MIKEY WINS!!!


WOO HOO!!!

The new game will start asap!!


How did you win?

Steve Dodd didn't fill people in on what the requirement was

"5. The winner may be chosen on the basis of the date/time of the post, the number of the post in the thread (50th post; 31415th post; ...), the presence/absence of a particular word/phrase in the post, or any other reason selected by the organizer of the episode."

A time stamped picture of what the win was to be, revealed at the end, would stop people from just picking their friends as winners every time.
4) Message boards : Number crunching : Optimizing the apps (Message 1255)
Posted 20 days ago by marmot
Post:
Martin Orpen says:

This NVIDIA GTX 960 config works OK:

verbose 1 (yes)
kernels/reduction 48
threads 2^8 (256)
lut_size 16 (524288 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)


Marmot said:
Testing to see if I understand these parameters correctly and can use it to guess the parameters before-hand given a 1GB RAM older video card.


Looks like my understanding wasn't proper. Kernels are only stored in video RAM one at a time and RAM used is unaffected by a modification in kernel_per_reduction=.
threads= also doesn't appear to affect RAM usage.

Martin Orpen reports 2^30 = 51,085,096 bytes


Can't confirm Martin's 51MB figure; are WU's consistent in data set size?
On my old Dell engineering workstation with a Quadro FX 3700M (128 CUDAs, 1024MB RAM), I measure with MSI on Windows 7 and kernels/reduction=32:
     6 threads  7 threads
2^23 -> 28MB      28MB
2^24 -> 31MB      31MB
2^25 -> 36MB      36MB
2^26 -> 45MB      45MB
2^27 -> 59MB      59MB
2^28 -> 93MB      93MB
2^29 -> 147MB (video card then black screened, reset driver, failed the WU and had to be restarted even though it had plenty of video RAM remaining and was under 65C)


Wondering what settings would work best and keep a machine user friendly.
Going to run a test with the machine required to also play a game at average 24 FPS.
Test settings:
sleep=0
>sieve_size=22, kernels/reduction=36 - (frame rate too chaotic at this point so not testing below sieve_size=23)
>sieve_size=23, kernels/reduction=18
>sieve_size=24, kernels/reduction=9
>sieve_size=25, kernels/reduction=4
>sieve_size=26, kernels/reduction=2
>sieve_size=27, kernels/reduction=1

Choices above meet the requirement of responsive Idle Champions of Forgotten Realms at 24 FPS and will post results in a couple weeks (hopefully).
5) Message boards : Number crunching : Optimizing the apps (Message 1252)
Posted 20 days ago by marmot
Post:

lut_size
default: 10
range: 2...31
definition: "the size (in power of 2) of the lookup table. Chances are that any value over 20 will cause the GPU driver to crash and processing to hang. The default results in 2^10 or 1024 items. Each item uses 8 bytes. So 10 would result in 2^10 * 8 bytes or 8192 bytes. Larger is better so long as it will fit in the GPUs L1/L2 cache. Once it exceeds the cache size, it will actually take longer to complete a WU since it has to read from slower global memory rather than high speed cached memory."
comment: "I [sosiris] choose 16, 65536 items for the look up table because it would fit into the L2$ (512KB) in GCN devices. IMHO it could be 20 for NV GPUs, just like previous apps, because NV GPUs have better caching."



Where do you find out the L1/L2 cache sizes of your video card to calculate a starting lut_size?
nVidia doesn't list it in their specs sheet and GPU-Z doesn't report these.

I found this blog discussing Fermi architecture (and later) GPU's and there is a table of typical L1/L2 cache based from on-board RAM:

Typical high end workstation GPU memory and cache
    Memory: 768MB -> 6,000 MB
    Bandwidth: 100GB -> 200 GB/s
    L2 Cache : 512kB -> 768 kB
    L1 Cache: 16kb -> 48 kB



If someone knows a database of GPU's that also lists their L1/L2, or knows a data gathering app that measures it accurately, please drop a link.

6) Message boards : Number crunching : Optimizing the apps (Message 1251)
Posted 20 days ago by marmot
Post:


I agree wholeheartedly with Video and Gaming while Crunching... NOT recommended... EVEN crunching SETI, I CANNOT watch Plex linked media in FF without dropping frames... Audio doesn't drop, but Video IS affected. If I need to watch something on Plex, I Suspend BOINC Activity until I'm done with the Video. As to Games, I DO NOT Game while Crunching...


TL



Just wanted to suggest another option; use TThrottle and RivaTuner's FPS targeting.
It seems off topic but if you are ending up suspending BOINC then all that 0% WU for a few hours is not optimizing Collatz.
Letting it function at reduced capacity is optimal.

I actually play Defiance 2050 in low texture mode with RivaTuner targeting 24 FPS (as good as my eyeballs can see) while not suspending any BOINC.
For Defiance, TThrottle is adjusted by reducing the target temperatures so that BOINC is averaging 75% CPU and GPU usage from TThrottle's interactive WU suspensions.

Typically that means dropping CPU from target of 80C to 72C and GPU from 70C to 63C depending on ambient room temperature.

For watching live streaming TV, news or sports, it only takes a couple degrees sliced off the CPU/GPU temps.
7) Message boards : Number crunching : Optimizing the apps (Message 1250)
Posted 20 days ago by marmot
Post:
Martin Orpen says:

This NVIDIA GTX 960 config works OK:

verbose 1 (yes)
kernels/reduction 48
threads 2^8 (256)
lut_size 16 (524288 bytes)
sieve_size 2^30 (51085096 bytes)
sleep 1
cache_sieve 1 (yes)
reducecpu 0 (no)



Testing to see if I understand these parameters correctly and can use it to guess the parameters before-hand given a 1GB RAM older video card.

sieve_size
range: 15...32
definition: "controls both the size of the sieve used 2^15 thru 2^32 as well as the items per kernel are they are directly associated with the sieve size. A sieve size of 26 uses approx 1 million items per kernel. Each value higher roughly doubles the amount. Each value lower decreases the amount by about half. Too high a value will crash the video driver."


Martin Orpen reports 2^30 = 51,085,096 bytes
extrapolating from definition (each lower value halving) 2^26 = 3,192,818 bytes and so each of the ~ 1 million items are about 3 bytes each?

@Martin Orpen, it would be nice to know how much measured RAM your video card is using per WU. I'm guessing that 48 kernels x 51MB ~= 2.5GB but there must be overhead from GUI and unknown small overhead from each Collatz WU.

Going to reserve 128 MB of video RAM of the video card's RAM for system GUI usage and a few tabs of quick browsing, so going with 896 MB available to Collatz WU:
kernels: 17 or 18
sieve_size: 2^30 (~ 51MB)
17 or 18 x 2^30 ~= 868MB or 919 MB

If kernels are optimal at 48 (@Mikey mentioned they were most important parameter) then:
kernels: 48
sieve_size: 2^28 (~ 12.7MB)
48 x 2^28 items ~= 613MB (under using RAM)

Try maximum kernel count:
kernels: 64
sieve_size: 2^28 (~12.7MB)
64 x 2^28 items ~= 817 MB

Might be best to go with 2^29 and adjust kernels till actual measured RAM usage stays under 1 GB under real world circumstances:
kernels: 35
sieve_size: 2^29 (~25.5MB)
35 x 2^29 items ~= 893 MB

Just to see where the next two levels of sieve_size fall:
kernels:4 "kernels_per_reduction, default: 32, range: 1...64"
sieve_size: 2^32 (~204.4MB)
4 x 2^32 items ~= 817 MB

kernels: 9
sieve_size: 2^31 (~102.2MB)
9 x 2^31 items ~= 919 MB
(This one is interesting)

Of course anyone attempting to run 2 WU per GPU would need to adjust the settings to fit both into the card's RAM.
Also, I wonder if adjusting so that a good amount of video RAM is left if the computer could play videos and minor games with minimal stutter such as with SETI@Home at 99% usage?

I'll test my ideas out on the older laptops that I'm firing up now that it's cold and I'd rather add heat to the house with inefficient older GPU's than electric heat that provides no BOINC work.




©2018 Jon Sonntag; All rights reserved