Message boards :
Number crunching :
Optimizing the apps
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 12 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 28,194 |
Mike, Try it but I don't think there's enough overhead left since you said you are already using 99 to 100% of the gpu on one workunit now. |
Anthony Ayiomamitis Send message Joined: 21 Jan 15 Posts: 14 Credit: 10,000,363,396 RAC: 0 |
Mike, The reason I ask is that I had a work unit cancelled by the server before I even started on it and it had to do with the fact that someone else completed it a few minutes earlier. What caught my attention was the fact it was processed in something like 202 seconds (!) with a GTX 1060. I have the same card and I need about 780 seconds for a unit to process. To make matters worse, this fellow had many units complete with very similar run times (even as low as 120 seconds). This led me to wonder if there are gross efficiencies remaining to be gained. |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 28,194 |
Mike, I don't know but you may want to report that to the Admin, that's over twice as fast as mine is doing them too!! IF it's real we are missing something but if not we need to get it caught and stopped ASAP!! |
Senilix Send message Joined: 30 Jul 09 Posts: 4 Credit: 559,529,710 RAC: 0 |
Anthony, There might be a simple explanation for this 'highly optimized' GPU: the runtimes reported to the Collatz server could be plain wrong... I myself stumbled across a task delivered by computer 817800. It's running a NVIDIA GeForce GTX 1060 3GB (same as mine). According to its task list it's processing tasks in 113 to 346 seconds... way faster than my rig (7xx seconds with an optimized config). A closer look at the stderr output files of 817800 shows elapse times of 28 to 32 minutes (about 1700 seconds). This matches with the report times of this computer: about one WU every 30 minutes. My best guess is that the BOINC client on this computer reports wrong runtimes. Not the first time I noticed this happening. Regards, Senilix |
Anthony Ayiomamitis Send message Joined: 21 Jan 15 Posts: 14 Credit: 10,000,363,396 RAC: 0 |
There might be a simple explanation for this 'highly optimized' GPU: the runtimes reported to the Collatz server could be plain wrong... This is precisely the user and computer which I was referring to in my original message. I also looked at the stderr output files but I was not sure if perhaps they were modified with BOINC reporting the actual/real time. Your times are very similar to mine (7xx seconds). |
![]() Send message Joined: 30 May 17 Posts: 119 Credit: 37,173,545,890 RAC: 4 |
The stderr output shows elapsed times are 23-29 minutes per unit -- nothing like the run times that are being reported. |
vseven Send message Joined: 24 Apr 18 Posts: 6 Credit: 1,483,437,063 RAC: 0 |
nVidia Tesla v100 SXM2. Would imagine the Titan V to be similar and possibly the next gen cards: verbose=1 kernels_per_reduction=48 threads=7 lut_size=19 sleep=1 reduce_cpu=0 sieve_size=30 Averaging 87 seconds per WU. Tested a server with 8 cards so I was cranking out a WU every 11 seconds. Only for 2 hours unfortunately (all the test time I had). |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
Trying to optimize an AMD Vega Frontier. verbose=1 kernels_per_reduction=48 threads=7 lut_size=16 sieve_size=30 cache_sieve=1 sleep=1 reduce_cpu=0 Running two tasks simultaneously, finishing between 1,000 - 1,100 seconds each. GPU utilization is at 99 - 100%, some signs of throttling. Tasks at 6 or 8 threads seem slower, at 9 they abort. lut_size at 17 seems to cause a pause every few cycles. Any benefit to adding kernels_per_reduction? Any help / guidance would be appreciated. |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 28,194 |
Trying to optimize an AMD Vega Frontier. Have you tried running just one unit at a time and then working your way back upwards again? The codes are designed to get the most out of your gpu at one workunit at a time. |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
Currently running one task. ~ 600 +/- 5 seconds per. Current settings: verbose=1 kernels_per_reduction=48 threads=8 lut_size=17 sieve_size=30 cache_sieve=1 sleep=1 reduce_cpu=0 |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
I've increased, decreased various parameters. Can't seem to reduce the time further. My GPU does seem to throttle. Spec's say GPU Clock speed: 1382MHz “typical,” 1600MHz peak - GPU Clock cycling between 1100MHz - 1200MHz - Memory clock cycling between 500 - 945MHz - Temperature cycling within a few degrees of limit |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
Switched to Vega Gaming Driver so I could use OnedriveNTool to Down volt the card. I read that could address the throttling problem. So, down volted the card and I'm now under 450 seconds per WU. Less power, dropped the GPU temp by 5. Nice! Probably no where near optimized but improved. GPU clock is stable just under 1,200Mhz. |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 28,194 |
Switched to Vega Gaming Driver so I could use OnedriveNTool to Down volt the card. I read that could address the throttling problem. So, down volted the card and I'm now under 450 seconds per WU. Less power, dropped the GPU temp by 5. Nice! Probably no where near optimized but improved. GPU clock is stable just under 1,200Mhz. That's very good compared to where you were!! |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
Yes, much improved and much improved performance vs the two Titan Black's I retired. Loved those cards but they were making my electric meter whirl and I'm no longer in danger of passing out from heat exhaustion. Both of my Titans ran near their thermal limit... was so hot. The Vega FE is running 20 degrees cooler and it's one thermal plant vs two. Did some more tweaking this morning, ran below 400 seconds per WU all day! |
![]() ![]() Send message Joined: 11 Aug 09 Posts: 963 Credit: 24,557,133,931 RAC: 28,194 |
Yes, much improved and much improved performance vs the two Titan Black's I retired. Loved those cards but they were making my electric meter whirl and I'm no longer in danger of passing out from heat exhaustion. Both of my Titans ran near their thermal limit... was so hot. The Vega FE is running 20 degrees cooler and it's one thermal plant vs two. Did some more tweaking this morning, ran below 400 seconds per WU all day! That's even better!! |
![]() Send message Joined: 30 May 17 Posts: 119 Credit: 37,173,545,890 RAC: 4 |
So now it's time to turn up the overclocking heat again and see if you can handle 2 at a time in 700s ;) |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
I think I'd need to be at < 350 seconds per single WU to achieve two WU's at 700 seconds each. Running with current settings. Looks steady, thermals are good. We'll see how it does. I've currently got the core steady at ~ 1,400Mhz with good thermals. Specs indicate it should be able to run steady > 1,400Mhz but I'm thinking it will definitely take some time and effort to settle the thermals and steady the core. Update: > 900 seconds each, pushed the core clock and voltage and for my efforts got a back screen and a frozen GUI. The best part was to come when I rebooted.... more black screen... not a process for the weak of heart. ![]() |
![]() Send message Joined: 30 May 17 Posts: 119 Credit: 37,173,545,890 RAC: 4 |
How did you get on? The gains are marginal -- 2 work units in 440s rather than 480s -- for a 1080ti, but it's the only way to get you over 10,000,000 RAC ceiling and keeps the GPU running at 100% without wasting time powering down and up every 4 minutes. And it proves that Renato was correct too ;-) |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
The Vega FE gaming driver was rock solid until yesterday. Thermals were good, steady clock, no apparent points of concern. Then, suddenly, my PC locked up. Have been working to get it settled since. Evidently a finicky card and or drivers. Had to do a clean uninstall, reboot, then install the Radeon Pro Enterprise driver 18.Q2.1, reboot, then install the Radeon Adrenalin driver 18.4.1, reboot. Same issue returned. I gave the Crimson Blockchain driver a go and it ran well, for a while, and without tweaking but then the PC locked up again. So, at the moment, I'm a bit perplexed. The only thing I changed leading up to this was my CPU based project. Thought I'd run LHC. Long CPU intensive work but shouldn't be related. I'm running two E5-2667's v2 and only using 75% of the cores at 75%. I'll see how it does just running Collatz and WUProp. See if the stability returns. |
nedmanjo![]() Send message Joined: 7 Feb 16 Posts: 38 Credit: 5,302,003,099 RAC: 0 |
Problem sorted but not fully understood. It was tied to running LHC but not at 75% as I assumed, I was running 23 WU's at 100%. So much for going off memory. MB thermals were fine but guess this PC couldn't take it. Reduced the CPU use % to 75% and I'm stable again. It's not the first time I've run CPU projects at 100%, some projects don't drive high thermals and when they do I reduce % use. Lesson learned. |
©2022 Jon Sonntag; All rights reserved