Optimizing the apps

Message boards : Number crunching : Optimizing the apps
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 12 · Next

AuthorMessage
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 28,194
Message 397 - Posted: 18 May 2018, 10:58:24 UTC - in response to Message 388.  

Mike,

Have you looked at the performance when running two uw's at the same time on one GPU? With my current config, my GPU is running at 99-100% efficiency but I wonder if there is better throughput with two units running simultaneously.

Not that I am complaining with the 345 seconds per unit right now (GTX 1080).


Try it but I don't think there's enough overhead left since you said you are already using 99 to 100% of the gpu on one workunit now.
ID: 397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anthony Ayiomamitis

Send message
Joined: 21 Jan 15
Posts: 14
Credit: 10,000,363,396
RAC: 0
Message 399 - Posted: 18 May 2018, 16:17:43 UTC - in response to Message 397.  
Last modified: 18 May 2018, 16:32:51 UTC

Mike,

Have you looked at the performance when running two uw's at the same time on one GPU? With my current config, my GPU is running at 99-100% efficiency but I wonder if there is better throughput with two units running simultaneously.

Not that I am complaining with the 345 seconds per unit right now (GTX 1080).


Try it but I don't think there's enough overhead left since you said you are already using 99 to 100% of the gpu on one workunit now.

The reason I ask is that I had a work unit cancelled by the server before I even started on it and it had to do with the fact that someone else completed it a few minutes earlier. What caught my attention was the fact it was processed in something like 202 seconds (!) with a GTX 1060. I have the same card and I need about 780 seconds for a unit to process. To make matters worse, this fellow had many units complete with very similar run times (even as low as 120 seconds). This led me to wonder if there are gross efficiencies remaining to be gained.
ID: 399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 28,194
Message 403 - Posted: 19 May 2018, 10:56:03 UTC - in response to Message 399.  

Mike,

Have you looked at the performance when running two uw's at the same time on one GPU? With my current config, my GPU is running at 99-100% efficiency but I wonder if there is better throughput with two units running simultaneously.

Not that I am complaining with the 345 seconds per unit right now (GTX 1080).


Try it but I don't think there's enough overhead left since you said you are already using 99 to 100% of the gpu on one workunit now.

The reason I ask is that I had a work unit cancelled by the server before I even started on it and it had to do with the fact that someone else completed it a few minutes earlier. What caught my attention was the fact it was processed in something like 202 seconds (!) with a GTX 1060. I have the same card and I need about 780 seconds for a unit to process. To make matters worse, this fellow had many units complete with very similar run times (even as low as 120 seconds). This led me to wonder if there are gross efficiencies remaining to be gained.


I don't know but you may want to report that to the Admin, that's over twice as fast as mine is doing them too!! IF it's real we are missing something but if not we need to get it caught and stopped ASAP!!
ID: 403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Senilix

Send message
Joined: 30 Jul 09
Posts: 4
Credit: 559,529,710
RAC: 0
Message 407 - Posted: 19 May 2018, 22:56:23 UTC - in response to Message 399.  

Anthony,

There might be a simple explanation for this 'highly optimized' GPU: the runtimes reported to the Collatz server could be plain wrong...

I myself stumbled across a task delivered by computer 817800. It's running a NVIDIA GeForce GTX 1060 3GB (same as mine). According to its task list it's processing tasks in 113 to 346 seconds... way faster than my rig (7xx seconds with an optimized config).

A closer look at the stderr output files of 817800 shows elapse times of 28 to 32 minutes (about 1700 seconds). This matches with the report times of this computer: about one WU every 30 minutes.

My best guess is that the BOINC client on this computer reports wrong runtimes. Not the first time I noticed this happening.

Regards,
Senilix
ID: 407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anthony Ayiomamitis

Send message
Joined: 21 Jan 15
Posts: 14
Credit: 10,000,363,396
RAC: 0
Message 408 - Posted: 20 May 2018, 0:28:30 UTC - in response to Message 407.  
Last modified: 20 May 2018, 0:29:59 UTC

There might be a simple explanation for this 'highly optimized' GPU: the runtimes reported to the Collatz server could be plain wrong...

I myself stumbled across a task delivered by computer 817800. It's running a NVIDIA GeForce GTX 1060 3GB (same as mine). According to its task list it's processing tasks in 113 to 346 seconds... way faster than my rig (7xx seconds with an optimized config).

A closer look at the stderr output files of 817800 shows elapse times of 28 to 32 minutes (about 1700 seconds). This matches with the report times of this computer: about one WU every 30 minutes.

My best guess is that the BOINC client on this computer reports wrong runtimes. Not the first time I noticed this happening.

This is precisely the user and computer which I was referring to in my original message. I also looked at the stderr output files but I was not sure if perhaps they were modified with BOINC reporting the actual/real time. Your times are very similar to mine (7xx seconds).
ID: 408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile IDEA

Send message
Joined: 30 May 17
Posts: 119
Credit: 37,173,545,890
RAC: 4
Message 411 - Posted: 20 May 2018, 17:50:43 UTC - in response to Message 408.  

The stderr output shows elapsed times are 23-29 minutes per unit -- nothing like the run times that are being reported.
ID: 411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
vseven

Send message
Joined: 24 Apr 18
Posts: 6
Credit: 1,483,437,063
RAC: 0
Message 431 - Posted: 24 May 2018, 12:52:52 UTC

nVidia Tesla v100 SXM2. Would imagine the Titan V to be similar and possibly the next gen cards:

verbose=1
kernels_per_reduction=48
threads=7
lut_size=19
sleep=1
reduce_cpu=0
sieve_size=30

Averaging 87 seconds per WU. Tested a server with 8 cards so I was cranking out a WU every 11 seconds. Only for 2 hours unfortunately (all the test time I had).
ID: 431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 552 - Posted: 26 Jun 2018, 2:49:27 UTC

Trying to optimize an AMD Vega Frontier.

verbose=1
kernels_per_reduction=48
threads=7
lut_size=16
sieve_size=30
cache_sieve=1
sleep=1
reduce_cpu=0

Running two tasks simultaneously, finishing between 1,000 - 1,100 seconds each. GPU utilization is at 99 - 100%, some signs of throttling.

Tasks at 6 or 8 threads seem slower, at 9 they abort. lut_size at 17 seems to cause a pause every few cycles.

Any benefit to adding kernels_per_reduction?

Any help / guidance would be appreciated.
ID: 552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 28,194
Message 556 - Posted: 26 Jun 2018, 11:45:32 UTC - in response to Message 552.  

Trying to optimize an AMD Vega Frontier.

verbose=1
kernels_per_reduction=48
threads=7
lut_size=16
sieve_size=30
cache_sieve=1
sleep=1
reduce_cpu=0

Running two tasks simultaneously, finishing between 1,000 - 1,100 seconds each. GPU utilization is at 99 - 100%, some signs of throttling.

Tasks at 6 or 8 threads seem slower, at 9 they abort. lut_size at 17 seems to cause a pause every few cycles.

Any benefit to adding kernels_per_reduction?

Any help / guidance would be appreciated.


Have you tried running just one unit at a time and then working your way back upwards again? The codes are designed to get the most out of your gpu at one workunit at a time.
ID: 556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 562 - Posted: 26 Jun 2018, 16:18:00 UTC - in response to Message 556.  

Currently running one task. ~ 600 +/- 5 seconds per. Current settings:

verbose=1
kernels_per_reduction=48
threads=8
lut_size=17
sieve_size=30
cache_sieve=1
sleep=1
reduce_cpu=0
ID: 562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 563 - Posted: 26 Jun 2018, 23:44:38 UTC - in response to Message 562.  

I've increased, decreased various parameters. Can't seem to reduce the time further. My GPU does seem to throttle.

Spec's say GPU Clock speed: 1382MHz “typical,” 1600MHz peak

- GPU Clock cycling between 1100MHz - 1200MHz
- Memory clock cycling between 500 - 945MHz
- Temperature cycling within a few degrees of limit
ID: 563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 564 - Posted: 27 Jun 2018, 1:21:22 UTC - in response to Message 563.  

Switched to Vega Gaming Driver so I could use OnedriveNTool to Down volt the card. I read that could address the throttling problem. So, down volted the card and I'm now under 450 seconds per WU. Less power, dropped the GPU temp by 5. Nice! Probably no where near optimized but improved. GPU clock is stable just under 1,200Mhz.
ID: 564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 28,194
Message 565 - Posted: 27 Jun 2018, 10:02:36 UTC - in response to Message 564.  

Switched to Vega Gaming Driver so I could use OnedriveNTool to Down volt the card. I read that could address the throttling problem. So, down volted the card and I'm now under 450 seconds per WU. Less power, dropped the GPU temp by 5. Nice! Probably no where near optimized but improved. GPU clock is stable just under 1,200Mhz.


That's very good compared to where you were!!
ID: 565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 568 - Posted: 27 Jun 2018, 22:31:46 UTC - in response to Message 565.  

Yes, much improved and much improved performance vs the two Titan Black's I retired. Loved those cards but they were making my electric meter whirl and I'm no longer in danger of passing out from heat exhaustion. Both of my Titans ran near their thermal limit... was so hot. The Vega FE is running 20 degrees cooler and it's one thermal plant vs two. Did some more tweaking this morning, ran below 400 seconds per WU all day!
ID: 568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 11 Aug 09
Posts: 963
Credit: 24,557,133,931
RAC: 28,194
Message 569 - Posted: 28 Jun 2018, 11:39:30 UTC - in response to Message 568.  

Yes, much improved and much improved performance vs the two Titan Black's I retired. Loved those cards but they were making my electric meter whirl and I'm no longer in danger of passing out from heat exhaustion. Both of my Titans ran near their thermal limit... was so hot. The Vega FE is running 20 degrees cooler and it's one thermal plant vs two. Did some more tweaking this morning, ran below 400 seconds per WU all day!


That's even better!!
ID: 569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile IDEA

Send message
Joined: 30 May 17
Posts: 119
Credit: 37,173,545,890
RAC: 4
Message 570 - Posted: 28 Jun 2018, 14:46:53 UTC - in response to Message 568.  

So now it's time to turn up the overclocking heat again and see if you can handle 2 at a time in 700s ;)
ID: 570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 571 - Posted: 28 Jun 2018, 22:31:34 UTC - in response to Message 570.  
Last modified: 28 Jun 2018, 23:26:42 UTC

I think I'd need to be at < 350 seconds per single WU to achieve two WU's at 700 seconds each. Running with current settings. Looks steady, thermals are good. We'll see how it does. I've currently got the core steady at ~ 1,400Mhz with good thermals. Specs indicate it should be able to run steady > 1,400Mhz but I'm thinking it will definitely take some time and effort to settle the thermals and steady the core.

Update: > 900 seconds each, pushed the core clock and voltage and for my efforts got a back screen and a frozen GUI. The best part was to come when I rebooted.... more black screen... not a process for the weak of heart.

ID: 571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile IDEA

Send message
Joined: 30 May 17
Posts: 119
Credit: 37,173,545,890
RAC: 4
Message 579 - Posted: 3 Jul 2018, 22:43:02 UTC - in response to Message 571.  

How did you get on?

The gains are marginal -- 2 work units in 440s rather than 480s -- for a 1080ti, but it's the only way to get you over 10,000,000 RAC ceiling and keeps the GPU running at 100% without wasting time powering down and up every 4 minutes.

And it proves that Renato was correct too ;-)
ID: 579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 598 - Posted: 4 Jul 2018, 22:26:44 UTC - in response to Message 579.  

The Vega FE gaming driver was rock solid until yesterday. Thermals were good, steady clock, no apparent points of concern. Then, suddenly, my PC locked up. Have been working to get it settled since. Evidently a finicky card and or drivers. Had to do a clean uninstall, reboot, then install the Radeon Pro Enterprise driver 18.Q2.1, reboot, then install the Radeon Adrenalin driver 18.4.1, reboot. Same issue returned. I gave the Crimson Blockchain driver a go and it ran well, for a while, and without tweaking but then the PC locked up again.

So, at the moment, I'm a bit perplexed. The only thing I changed leading up to this was my CPU based project. Thought I'd run LHC. Long CPU intensive work but shouldn't be related. I'm running two E5-2667's v2 and only using 75% of the cores at 75%.

I'll see how it does just running Collatz and WUProp. See if the stability returns.
ID: 598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nedmanjo
Avatar

Send message
Joined: 7 Feb 16
Posts: 38
Credit: 5,302,003,099
RAC: 0
Message 599 - Posted: 4 Jul 2018, 23:38:03 UTC - in response to Message 598.  

Problem sorted but not fully understood. It was tied to running LHC but not at 75% as I assumed, I was running 23 WU's at 100%. So much for going off memory. MB thermals were fine but guess this PC couldn't take it. Reduced the CPU use % to 75% and I'm stable again. It's not the first time I've run CPU projects at 100%, some projects don't drive high thermals and when they do I reduce % use. Lesson learned.
ID: 599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 12 · Next

Message boards : Number crunching : Optimizing the apps


©2022 Jon Sonntag; All rights reserved