Corrupt Index Caused Authentication Errors
log in

Advanced search

Message boards : News : Corrupt Index Caused Authentication Errors

Author Message
Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 20313 - Posted: 5 Mar 2015, 23:58:18 UTC

Summary:
An index on the user table in the database was corrupt and has been fixed. If you have issues or notice something that is out of sync somehow, please let me know.

Details:
The email address index on the user table was corrupted. This kept people from being able to authenticate when connecting. Ever heard that the definition of lunacy is doing the same thing over and over and expecting different results? If that is true, some of you need some serious counselling. ;-)

When people couldn't connect, they attempted to authenticate and BOINC's logic is that if it can't find you, you must be new. So, it created another record because the index used to look up your existing record was corrupt -- or for some lunatics, another and another and another and another... That resulted in multiple user records with the same email address and because of that, I couldn't just drop and re-create the corrupt index because there were now duplicate data values that needed to first be removed.

So, after cleaning up hundreds of records each of which had to be manually evaluated to determine whether any other data was associated with the user record (since BOINC doesn't have any data integrity between tables because it uses no foreign key relationships in the database it will allow me to delete records that do have data associated with them which means I have to be very careful when fixing things). Once they were all removed, I was able to rebuild the index and allow everyone to access the project again.

crweaver
Send message
Joined: 22 Jan 11
Posts: 2
Credit: 924,688,164
RAC: 2,501,870
Message 20314 - Posted: 6 Mar 2015, 2:57:55 UTC

When I ran into this problem on my computer, I looked at my event log which said to 'remove and add', which I did. Hence my outstanding WUs were lost, or 'abandoned'. I'm sure it happened to a few others.

Profile Michael H.W. Weber
Send message
Joined: 13 Jan 15
Posts: 7
Credit: 505,198,348
RAC: 529,766
Message 20315 - Posted: 6 Mar 2015, 8:37:37 UTC
Last modified: 6 Mar 2015, 8:43:17 UTC

I registered a new client computer yesterday using the BAM! account manager. It showed the correct details in the BOINC manager (credits, etc.). This morning, I saw the following message in my BOINC manager:

Collatz Conjecture: Notice from server
Ungültiger oder fehlender Kontoschlüssel. Projekt entfernen und neu hinzufügen zum beheben.
05.03.2015 13:47:59

Hence, I de-attached this client from the project and re-attached it.

Is that sufficient?

And one more thing:
When I use BAM! and select to NOT use WUs that employ the CPU of my machine(s), it still would ALSO download WUs for my CPU. What's wrong there?

Michael.

P.S.: It says "Incorrect or missing account key. Please remove project and add new to solve the issue."
____________

DGG
Send message
Joined: 1 Aug 10
Posts: 7
Credit: 48,804,817
RAC: 847
Message 20316 - Posted: 6 Mar 2015, 8:47:25 UTC
Last modified: 6 Mar 2015, 8:50:35 UTC

Thanks Slicker for taking care of our accumulated data and yet still fixing the issue at hand. Although many of us probably seem to gripe endlessly to you folks, we really do appreciate the work you do.

Probably shouldn't have followed the BOINC notice or event message to detach and reattach then? I guess many of us followed that message off the edge of the database cliff.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 20317 - Posted: 6 Mar 2015, 14:23:04 UTC

There were 100+ users with anywhere from 2 to 15 accounts each. I removed the ones with no credits, hosts, etc. but it looks like BOINC already started using the new cross project ID (CPID) from the account that it created while corrupt. At least one user has PM'd me stating my fix for his duplicate account didn't work. It looks like it could be another long day of tedious data analysis and database repair.

I'm still don't understand how "can't connect to the database" means to tell the users to "detach and create a new account". That must be West coast logic. ;-) When we write software here in the Midwest, we take a much more conservative approach. If we can't connect to the database, we don't assume we can insert a new record into it either.

Helix Von Smelix
Send message
Joined: 2 Aug 10
Posts: 43
Credit: 10,190,344,992
RAC: 1,407,656
Message 20318 - Posted: 6 Mar 2015, 14:24:11 UTC

cheers for your hard work. I sat back and waited (prayed) ;-)

Profile [AF>Quebec]MDodier
Send message
Joined: 14 Mar 10
Posts: 14
Credit: 1,467,840,580
RAC: 1,417,104
Message 20319 - Posted: 6 Mar 2015, 16:07:59 UTC

Slicker,

I'm back on track with Collatz,

Thank you for the great job!
____________

Profile David Riese
Send message
Joined: 23 Sep 12
Posts: 132
Credit: 4,009,571,978
RAC: 5,357,170
Message 20320 - Posted: 6 Mar 2015, 20:31:12 UTC - in response to Message 20318.

Slicker, I want to echo the others by thanking you for all of your hard work on behalf of us crunchers.
____________

Brent
Send message
Joined: 25 Jun 14
Posts: 38
Credit: 181,880,965
RAC: 212,451
Message 20322 - Posted: 7 Mar 2015, 7:21:54 UTC - in response to Message 20317.

I'm still don't understand how "can't connect to the database" means to tell the users to "detach and create a new account". That must be West coast logic. ;-) When we write software here in the Midwest, we take a much more conservative approach. If we can't connect to the database, we don't assume we can insert a new record into it either.


Perhaps it is because we received a notice from Boinc Manager which said to "remove and add" the Collatz project, which I did. Hence my outstanding WUs were lost, or "abandoned". I lost over 12 hours of work because of this and I'm sure it happened to a many others besides myself.

Is there anyway to recover the "abandoned" completed workunits
____________
Brent
Link to website
See BOINC Stats

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 20324 - Posted: 9 Mar 2015, 2:45:38 UTC - in response to Message 20322.

I'm still don't understand how "can't connect to the database" means to tell the users to "detach and create a new account". That must be West coast logic. ;-) When we write software here in the Midwest, we take a much more conservative approach. If we can't connect to the database, we don't assume we can insert a new record into it either.


Perhaps it is because we received a notice from Boinc Manager which said to "remove and add" the Collatz project, which I did. Hence my outstanding WUs were lost, or "abandoned". I lost over 12 hours of work because of this and I'm sure it happened to a many others besides myself.

Is there anyway to recover the "abandoned" completed workunits


I can't blame anyone for following the instructions given by the BOINC client or by the account managers. People did what they were told to do. Some were just more persistent than others and kept trying and trying and trying. I'm sure they didn't realize they were making it harder to fix each time.

I wish there was a way to restore the abandoned files but BOINC literally throws away anything in the project folder when detaching. I'm working on a way to fairly grant credit for the lost work. I just need to figure out the best combination of RAC, last server contact, etc. to figure out who lost work that due to the database crash and how much.

Shaun August
Send message
Joined: 3 Mar 15
Posts: 1
Credit: 10,006,184
RAC: 0
Message 20326 - Posted: 9 Mar 2015, 11:01:20 UTC - in response to Message 20324.
Last modified: 9 Mar 2015, 11:39:25 UTC

I think Slicker, that the problem was that initially people followed the instructions as dictated by BOINC, "Give-up and re-join". But when we tried to re-join, as a new member, the e-mail address already existed. Trying to re-join as a current member just failed, but offered the option of using the their alternate ID.
And when that did not work, they started to believe it was their machine at fault, so rebooted and tried again.
And when that did not work they tried a different machine, if available. The point being that most people will assume that the problem is their end rather than the servers, particularly as the 'Server Status' page for Collatz indicated all was running fine, and there was nothing noted on the forum.

Only after all has failed do you finally give up trying......for however long you expect the problem will take to fix. So they come back 2-4 hours later and retry some part of the above. I'm guessing that those of us in Europe are guilty of having tried more times that US members just because we started re-trying sooner after things went wrong
For my part, I'm sorry if doing any/all of the above has contributed to extra work-load for yourself.....your efforts are much appreciated.

Peter Lau
Send message
Joined: 3 Dec 14
Posts: 2
Credit: 27,876,499
RAC: 0
Message 20327 - Posted: 10 Mar 2015, 12:41:46 UTC - in response to Message 20326.
Last modified: 10 Mar 2015, 13:29:52 UTC

Exactly!

I- on the other hand- had a computer crash the same night that this debacle happened.
I had to reconfigure and try to save what could be saved and what ever was given. So I'm one of those who tried to connect and reconnect over and over. You know; that "idiot"!

It did not help that BOINC instructed me to "Give up and re-join" after that. But being as I'm, an idiot and an "QWERTY- challenged person"... Well.

In my defence, I can tell you, that I see my computer as a tool and not an instrument.
Furthermore, I challenge everyone that think otherwise, that everyone who thinks that everybody, "should have a supreme instinct" of computer knowledge other than that of which 99% of the population has- to come and try their knowledge as at, for example arctic survival skills, (military) sniping or skydiving- against mine (because- if you don't skydive, and know exactly how it works- then you are an idiot...)

Otherwise, Slicker.
You are doing an exemplary job!

;)

Peter Lau
Send message
Joined: 3 Dec 14
Posts: 2
Credit: 27,876,499
RAC: 0
Message 20328 - Posted: 10 Mar 2015, 15:15:24 UTC - in response to Message 20327.

Addendum.

If you do not know exactly how a M240 machine gun works, how to disassemble and assemble it again, load it and fire it. - You are an idiot!

If you do not know how to operate a Leopard II- then you are an idiot (me being amongst those).

But if I can operate the AI AW - sniper rifle.
Does that make me a a genius?

Same, same- but different...

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,691,432,629
RAC: 5,698,452
Message 20329 - Posted: 11 Mar 2015, 10:49:16 UTC - in response to Message 20328.

Addendum.

If you do not know exactly how a M240 machine gun works, how to disassemble and assemble it again, load it and fire it. - You are an idiot!

If you do not know how to operate a Leopard II- then you are an idiot (me being amongst those).

But if I can operate the AI AW - sniper rifle.
Does that make me a a genius?

Same, same- but different...


I don't think anyone is an "idiot" for knowing or not knowing those, or any other, things. What I did was sit back, do nothing and let tomorrow come and see if the problems persisted. When they did not I was good to go with no effort on my part. Sometimes the problems are on the OTHER end, not on our end, and with Boinc that seems to be the case more often than not. I'm guessing that's just part of having a few hundred thousand of us all trying to connect to some pc running the Boinc Server side software that I think is free. I guess you get what you pay for, bugs and all. I KNOW Slicker has to spend considerable time and energy everytime a new version of the Server side software is released just to make it work 'his way'. I also know my pc's can't ALWAYS be the problem, since I crunch for multiple projects, and they ALL go down at some point I prefer to wait out outages and see if just maybe it isn't my pc's that are causing the problems.

I also KNEW from past experience that creating a 'new' account WOULD indeed cause me to lose all my completed work units, and I was NOT prepared to do that on day one of an outage!!

Helix Von Smelix
Send message
Joined: 2 Aug 10
Posts: 43
Credit: 10,190,344,992
RAC: 1,407,656
Message 20330 - Posted: 11 Mar 2015, 13:26:23 UTC - in response to Message 20328.

Addendum.

If you do not know exactly how a M240 machine gun works, how to disassemble and assemble it again, load it and fire it. - You are an idiot!

If you do not know how to operate a Leopard II- then you are an idiot (me being amongst those).

But if I can operate the AI AW - sniper rifle.
Does that make me a a genius?

Same, same- but different...

If i had been using a M240 as long as a PC i am sure i could.

Helix Von Smelix
Send message
Joined: 2 Aug 10
Posts: 43
Credit: 10,190,344,992
RAC: 1,407,656
Message 20331 - Posted: 11 Mar 2015, 13:31:29 UTC

Also on the project home page the user of the day was corrupt. A large clue i thought.

Abacurial.com
Send message
Joined: 7 Oct 13
Posts: 3
Credit: 12,296,477
RAC: 0
Message 20332 - Posted: 12 Mar 2015, 18:16:32 UTC

I've gotten so used to Collatz being down I just ignore problems until they go away....

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 20333 - Posted: 12 Mar 2015, 19:24:45 UTC

Hey, if you've got an M240 machine gun, I would RTFM on how to disassemble and reassemble it just so I could experience firing it once!

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,691,432,629
RAC: 5,698,452
Message 20334 - Posted: 13 Mar 2015, 10:41:33 UTC - in response to Message 20333.

Hey, if you've got an M240 machine gun, I would RTFM on how to disassemble and reassemble it just so I could experience firing it once!


You can always go to Las Vegas and shoot one I think, they have almost every gun available at their public shooting ranges. And yes even machine guns! It is a big draw for some tourists, go there and shoot up some sand.

Werinbert
Send message
Joined: 7 May 13
Posts: 24
Credit: 100,197,949
RAC: 0
Message 20335 - Posted: 13 Mar 2015, 15:35:37 UTC - in response to Message 20334.

Hey, if you've got an M240 machine gun, I would RTFM on how to disassemble and reassemble it just so I could experience firing it once!


You can always go to Las Vegas and shoot one I think, they have almost every gun available at their public shooting ranges. And yes even machine guns! It is a big draw for some tourists, go there and shoot up some sand.


I would prefer that than to shoot up some servers.
____________
"For those who have so little patience that they equate a single day to eternity: yes, the project is dead. For all the others, the project is back online. :-)" -- Slicker


Post to thread

Message boards : News : Corrupt Index Caused Authentication Errors


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.