Planned Outage Today
log in

Advanced search

Message boards : News : Planned Outage Today

Author Message
Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 20598 - Posted: 16 Jun 2015, 13:32:13 UTC

The circuit breaker panel will be replaced today which will require all power to be shut off for an extended period today. Unfortunately, I have been given no specific start time or finish time. So, load up on your work units now.

Brent
Send message
Joined: 25 Jun 14
Posts: 38
Credit: 182,763,322
RAC: 216,496
Message 20600 - Posted: 16 Jun 2015, 15:13:03 UTC - in response to Message 20598.

And how does one "load up on your work units now" I have 10 days and another 10 days selected in my preferences, and still have less than a screen full of work units.
____________
Brent
Link to website
See BOINC Stats

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 20604 - Posted: 16 Jun 2015, 17:54:07 UTC

So long as you have a day's worth, you should be OK. The electrician hasn't even arrived yet so I question whether he can get it done in half a day, but then again, I'm not an electrician so I'm just guessing.

Back to your question.... If also connected to other projects, it could be that BOINC has decided that it should crunch some other project and regardless of the cache setting will refuse to get more Collatz work. Also, I seem to recall that the recent versions don't necessarily ask for work when you ask it to update. (It's that the whole point of updating???????? But, maybe "update" means something else in Southern California.)

The only way I have found to force BOINC to fill a cache for a project is to have no other projects connected. Even if you suspend all the other projects, BOINC still counts any WUs for them as part of the cache. It may still take several communication cycles since it won't necessarily download hundreds of WUs at a time. I think the default is 4 or 8 but that may have changed.

Profile BarryAZ
Send message
Joined: 21 Aug 09
Posts: 251
Credit: 13,342,162,753
RAC: 23,677,469
Message 21786 - Posted: 28 Nov 2015, 16:10:05 UTC

I'm wondering with the increasingly frequent database server crashes whether something might done to make them planned instead of unplanned and thus much shorter in duration.

I get it that with the very short sieve units, the processing load has increased a lot.

My own suspicion, perhaps ill-informed, is that the database server is encountering some memory leak (as I suspect it always had), which is made worse by the higher volume processing.

Since it appears that resolving the actual problem is not an option for whatever reason, how about pre-empting it?

My (admittedly novice) suggestion would be a pair of scripts.

One would take down the database server *gracefully* at a programmed time of day (perhaps every day).

The other would restart the database server about 10 minutes later.

Perhaps something along these lines would restore the server to a 'memory clean slate' each cycle.

Just a thought from one of the users.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21791 - Posted: 30 Nov 2015, 20:42:05 UTC - in response to Message 21786.

I'm wondering with the increasingly frequent database server crashes whether something might done to make them planned instead of unplanned and thus much shorter in duration.


Backups are done every night. If there is an outage at any other time, it requires manually stopping and restarting MySQL. Restart fails. Only stop and start works, probably because stop ignores the mysql.sock error and kills the process if it cannot connect.

I get it that with the very short sieve units, the processing load has increased a lot.


Yep. It sure has. Large WUs are coming which will help considerably, but not until I get the checkpoint encrypted for all the apps and add additional validation on the server.

My own suspicion, perhaps ill-informed, is that the database server is encountering some memory leak (as I suspect it always had), which is made worse by the higher volume processing.


The issue is that MySQL stops responding and it is impossible to connect to it when that happens.

Since it appears that resolving the actual problem is not an option for whatever reason, how about pre-empting it?

My (admittedly novice) suggestion would be a pair of scripts.

One would take down the database server *gracefully* at a programmed time of day (perhaps every day).

The other would restart the database server about 10 minutes later.


Good idea. I thought of it two years ago. It is already part of the backup script. It still happens at other times of the day as well even with it restarting MySQL nightly.

Johannes Prinsloo
Send message
Joined: 5 Apr 15
Posts: 2
Credit: 969,966,538
RAC: 12,187
Message 21794 - Posted: 30 Nov 2015, 22:17:15 UTC

on the GPU units mine download 100min. Will not work for a day

Profile BarryAZ
Send message
Joined: 21 Aug 09
Posts: 251
Credit: 13,342,162,753
RAC: 23,677,469
Message 21795 - Posted: 1 Dec 2015, 4:32:04 UTC - in response to Message 21791.

Slicker, thanks for the reply.

I suspect you are more bothered than anyone regarding the frequency of database server crashes.

I'll slink back into the peanut gallery now. <smile>

Profile BarryAZ
Send message
Joined: 21 Aug 09
Posts: 251
Credit: 13,342,162,753
RAC: 23,677,469
Message 21796 - Posted: 1 Dec 2015, 4:39:08 UTC - in response to Message 21794.
Last modified: 1 Dec 2015, 4:39:27 UTC

Yup -- for faster processors one day outages (or even 12 hour outages) can starve them for sure.

The fastest GPU for Collatz I have is a HD 7850 -- it takes about 8 minutes per current work unit -- so 50 work units is a 400 minute cache.

The bulk of my GPU's are 750ti -- they run 10 minutes to 12 minutes per current work unit -- 50 work units is a 500 to 600 minute cache.

What I tend to do when I catch the outage is shift over to the other projects I work with -- mostly GPUGrid and Moo with a bit of POEM on a couple of workstations.

I then check back periodically to see if Collatz is back online.

The database recovery cycle when it does an unfriendly crash seems to run 4 to 6 hours once Slicker is able to do the manual restart. (I could be off base here -- but I suspect the outage cycle includes a significant time component for Slicker to get the notice and then get to the server depending on 'life variables').

AudioElf
Send message
Joined: 15 Oct 10
Posts: 4
Credit: 1,481,546,663
RAC: 0
Message 21803 - Posted: 2 Dec 2015, 13:36:51 UTC

My GTX 980 currently has an average turn around of 98 seconds per WU

That gives my GPU 50 * 98 = 81m 40s cache

Profile BarryAZ
Send message
Joined: 21 Aug 09
Posts: 251
Credit: 13,342,162,753
RAC: 23,677,469
Message 21804 - Posted: 2 Dec 2015, 17:47:05 UTC - in response to Message 21803.

With cards that fast and the current database crash rate and recovery rate, you get an opportunity to have the card go in to 'rest state' periodically <rueful smile>
____________

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21810 - Posted: 4 Dec 2015, 15:43:57 UTC - in response to Message 21804.

Speaking of crashes, I got nothing done this past week after my laptop crashed. That's two latops in two years. Both died less than a month after the warranty expired. The new replacement was DOA. The new, new replacement has the 3 year extended warranty. $250 vs $2500 seemed like a good offer considering my track record. The warranty won't keep it from breaking, but at least repairs will be free or I'll get a new one when it does break. I'm still working on getting everything installed and configured so I haven't had time to see how it will crunch.


Post to thread

Message boards : News : Planned Outage Today


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.