Server Status
log in

Advanced search

Message boards : Number crunching : Server Status

1 · 2 · 3 · 4 . . . 18 · Next
Author Message
Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 269 - Posted: 29 Jul 2009, 12:38:03 UTC

Server was very unresponsive this morning to web access. Gave it a kick and it looks like it is working now.

Profile STE\/E
Avatar
Send message
Joined: 12 Jul 09
Posts: 581
Credit: 761,710,729
RAC: 0
Message 291 - Posted: 30 Jul 2009, 13:09:54 UTC

Kick it before you go to bed Slicker, it might last thru the night that way ... ;)

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 293 - Posted: 30 Jul 2009, 14:47:13 UTC - in response to Message 291.

It isn't the server, it is network line that keeps going down. Sometimes it is just for a minute or so, other times it is for a much longer period. The rocket scientist tech support lackeys I've talked to are useless and refuse to admit the problem is on their end. When they finally agree to send someone out to look it it, it has always been working by the time they get there. Maybe they should spend more money on infrastructure and less money on turtle commercials.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 319 - Posted: 31 Jul 2009, 12:44:44 UTC

I'm leaving later this morning on a motorcycle trip and won't be back until late Sunday. I've spent too many hours in from on a computer screen already this summer and need to get out and enjoy it a little. So, if the server has problems this weekend, they won't be fixed until late Sunday night or Monday morning.

Profile STE\/E
Avatar
Send message
Joined: 12 Jul 09
Posts: 581
Credit: 761,710,729
RAC: 0
Message 321 - Posted: 31 Jul 2009, 13:06:11 UTC

Could you install an Automatic Server kicker before you leave ... :P Have a good trip ...

Profile (_KoDAk_)
Send message
Joined: 13 Jul 09
Posts: 8
Credit: 16,310,534
RAC: 0
Message 366 - Posted: 3 Aug 2009, 16:57:41 UTC

Results ready to send 994

is the sum of all jobs on all platforms?
all platforms considered one type of job?

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 377 - Posted: 3 Aug 2009, 23:22:07 UTC - in response to Message 366.

Results ready to send 994

is the sum of all jobs on all platforms?
all platforms considered one type of job?


All platforms use the exact same WUs. CPU and GPU also use the same WUs. Are people having trouble getting work?

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 414 - Posted: 6 Aug 2009, 15:00:38 UTC

I had to give the database a kick this morning since it wasn't allowing connections. Also, I'm not sure what is going on yet, but there are 400,000 WUs waiting to send when the work generator is set to a cushion of 1000. That's not good. If the number of WUs continues to increase, the server will run out of disk space.

Liuqyn
Send message
Joined: 8 Jul 09
Posts: 26
Credit: 164,516,656
RAC: 0
Message 415 - Posted: 6 Aug 2009, 16:43:42 UTC - in response to Message 414.

lol, I guess no one should be complaining about not having work available. as long as the server doesn't crash from being too full.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 416 - Posted: 6 Aug 2009, 18:37:16 UTC - in response to Message 415.

lol, I guess no one should be complaining about not having work available. as long as the server doesn't crash from being too full.


All I can figure is that after the database stopped responding the work generator asked it how many WUs were left and the error was "handled" by the boinc code by returning 0 as the count. So, the generator assumed there was no work left and created some more, and more, and more, and more...

There are about 6K fewer WUs now than this a.m. so it looks like it is going down. I wonder how long it will take to return to normal.

frankhagen
Send message
Joined: 12 Jul 09
Posts: 188
Credit: 14,222,453
RAC: 1,460
Message 417 - Posted: 6 Aug 2009, 19:23:06 UTC - in response to Message 416.

There are about 6K fewer WUs now than this a.m. so it looks like it is going down. I wonder how long it will take to return to normal.


how about coming up with the new app-version to speed up things a little bit? ;)

____________

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 418 - Posted: 6 Aug 2009, 19:55:12 UTC - in response to Message 417.
Last modified: 6 Aug 2009, 19:56:40 UTC

how about coming up with the new app-version to speed up things a little bit? ;)


They are tested right now. The first incarnations of the new CPU apps had a bug with checkpointing (and one of them even calculated sometimes wrong). The CAL apps got sometimes confused when there were several numbers in a WU with the same highest step count. Those found bugs got straightened out and when they get approved by Slicker (he runs all versions through some kind of a test suite), he has only to decide how much larger the new WUs have to be. I would suggest a factor of 16 ;) That would be 2^35 numbers (~34 billion) to check for each WU with each number needing about 600 steps on average to reach 1.

I forgot Slicker has to put the new algorithm into a CUDA app. That should be easy to do but still cost some time to test afterwards. So I guess it still takes a few days.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 721 - Posted: 26 Aug 2009, 14:13:24 UTC

The validator crashed due to a WUs that appeared to be valid but would not decrypt. That hasn't happened for a couple weeks. It could be due to the Comcast network outage yesterday. I think Comcast should be manufacturing pogo sticks instead of running a network with all the ups and downs they have. Anyway, it is back up and running and the problematic WU was handled manually.

Liuqyn
Send message
Joined: 8 Jul 09
Posts: 26
Credit: 164,516,656
RAC: 0
Message 725 - Posted: 26 Aug 2009, 20:47:03 UTC - in response to Message 721.

was that bad wu a resend? maybe its from the same batch as a few weeks ago.

riptide
Send message
Joined: 7 Aug 09
Posts: 54
Credit: 1,060,610
RAC: 0
Message 735 - Posted: 28 Aug 2009, 15:34:17 UTC

Validator is down.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 738 - Posted: 28 Aug 2009, 18:38:21 UTC - in response to Message 735.

Validator is down.

It is back up now.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 746 - Posted: 29 Aug 2009, 19:41:36 UTC

Network was down. Again.

I installed a bi-directional signal amp and as soon as I plugged it in, the network came back up. Coincidence? Comcast still denies there was anything wrong with the line even after swapping out the modem. So, now I have to try and convince Comcast to credit me for the $35 for the amplifier.

riptide
Send message
Joined: 7 Aug 09
Posts: 54
Credit: 1,060,610
RAC: 0
Message 748 - Posted: 30 Aug 2009, 0:09:23 UTC - in response to Message 746.
Last modified: 30 Aug 2009, 0:10:05 UTC

data-driven web pages debian Running
upload/download server boinc.thesonntags.com Running
scheduler debian Running
feeder debian Running
transitioner debian Running
file_deleter debian Running
hotpo_work_generator debian Running
hotpo_validator debian Not Running
hotpo_assimilator debian Running
db_purge debian Running

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 749 - Posted: 30 Aug 2009, 0:45:07 UTC

The validator is back up now. A bad WU would not decode so I had to manually credit the two good ones and then delete the WU.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 750 - Posted: 30 Aug 2009, 0:58:47 UTC - in response to Message 746.

Network was down. Again.

I installed a bi-directional signal amp and as soon as I plugged it in, the network came back up. Coincidence? Comcast still denies there was anything wrong with the line even after swapping out the modem. So, now I have to try and convince Comcast to credit me for the $35 for the amplifier.


Looks like the signal amp has not done the trick. Line just dropped for a few seconds a couple minutes ago.

1 · 2 · 3 · 4 . . . 18 · Next
Post to thread

Message boards : Number crunching : Server Status


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.