collatz parity help
log in

Advanced search

Message boards : Science : collatz parity help

Author Message
cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8678 - Posted: 3 Jul 2010, 11:55:03 UTC
Last modified: 3 Jul 2010, 11:57:25 UTC

I'm having a lot of trouble understanding the step ahead k steps using a parity sequence.

http://en.wikipedia.org/wiki/Collatz_conjecture#As_a_parity_sequence

The "parity" section above gives a way to speed up simulation of the sequence. To jump ahead k steps on each iteration
(using the f function from that section), break up the current number into two parts, b (the k least significant bits, interpreted as an integer),
and a (the rest of the bits as an integer). The result of jumping ahead k steps can be found as:

f k(a 2k+b) = a 3^c[b]+d[b].
The c and d arrays are precalculated for all possible k-bit numbers b, where d [b] is the result of applying the f function k times to b,
and c [b] is the number of odd numbers encountered on the way. For example, if k=5, you can jump ahead 5 steps on each iteration by separating out the 5 least significant bits of a number and using:

c [0...31] = {0,3,2,2,2,2,2,4,1,4,1,3,2,2,3,4,1,2,3,3,1,1,3,3,2,3,2,4,3,3,4,5}
d [0...31] = {0,2,1,1,2,2,2,20,1,26,1,10,4,4,13,40,2,5,17,17,2,2,20,20,8,22,8,71,26,26,80,242}.


Using the above example and k = 5
Take for example the number 25, in 5 steps I know my result should be 44(76 38 19 29 44). I know that 3^c[b]+d[b] ends up as 49, being c=3 and d=22.
I've thought every way of applying the number 49 to come up with the result of 44, and I can not find it.( nor can I use it to come up with any number in the sequence)

I don't completely understand this math function as a C++ programmer. I've spent hours and hours on this and I can't figure out this math function, and apply it to some C++ code in order to help my big number tester for collatz sequences.
If any one could help me clarify what exactly this function is doing in some C or some psuedo code, I would greatly appreciate it.

mda_
Send message
Joined: 1 Jun 10
Posts: 11
Credit: 41,233
RAC: 0
Message 8693 - Posted: 4 Jul 2010, 21:29:15 UTC - in response to Message 8678.

http://www.ieeta.pt/~tos/3x+1.html

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8697 - Posted: 5 Jul 2010, 4:12:20 UTC - in response to Message 8693.

Allot of neat information on that page, but I can't find any thing specific to the speedup of sequence simulation.

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 8698 - Posted: 5 Jul 2010, 7:24:16 UTC - in response to Message 8678.
Last modified: 5 Jul 2010, 7:36:14 UTC

I'm having a lot of trouble understanding the step ahead k steps using a parity sequence.

http://en.wikipedia.org/wiki/Collatz_conjecture#As_a_parity_sequence

The "parity" section above gives a way to speed up simulation of the sequence. To jump ahead k steps on each iteration
(using the f function from that section), break up the current number into two parts, b (the k least significant bits, interpreted as an integer),
and a (the rest of the bits as an integer). The result of jumping ahead k steps can be found as:

f k(a 2k+b) = a 3^c[b]+d[b].
The c and d arrays are precalculated for all possible k-bit numbers b, where d [b] is the result of applying the f function k times to b,
and c [b] is the number of odd numbers encountered on the way. For example, if k=5, you can jump ahead 5 steps on each iteration by separating out the 5 least significant bits of a number and using:

c [0...31] = {0,3,2,2,2,2,2,4,1,4,1,3,2,2,3,4,1,2,3,3,1,1,3,3,2,3,2,4,3,3,4,5}
d [0...31] = {0,2,1,1,2,2,2,20,1,26,1,10,4,4,13,40,2,5,17,17,2,2,20,20,8,22,8,71,26,26,80,242}.


Using the above example and k = 5
Take for example the number 25, in 5 steps I know my result should be 44(76 38 19 29 44). I know that 3^c{b}+d{b} ends up as 49, being c=3 and d=22.
I've thought every way of applying the number 49 to come up with the result of 44, and I can not find it.( nor can I use it to come up with any number in the sequence)

Actually you don't calculate the the number after k steps (it depends on how you count). In fact, you calculate the number after 5 divisions by two, so that would be f_k+c{b} in case you count every step (3x+1 and the following /2 as two separate ones), so in your example it would be the number after 8 steps in the notation of this project.

f_1(25) = 76
f_2(25) = 38 (would be f_1 if you only count divisions)
f_3(25) = 19 (f_2)
f_4(25) = 58
f_5(25) = 29 (f_3)
f_6(25) = 88
f_7(25) = 44 (f_4)
f_8(25) = 22 (would be f_5 if you only count divisions)

Now, remember that in your example a is zero, b=3, c[b] = 3, and d[b] = 22, the whole thing comes out as:

f_5+c[b](25) = f_8(25) = a*3^c[b] + d[b] = 0*3^3 + 22 = 22

Problem solved.

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8700 - Posted: 5 Jul 2010, 8:51:12 UTC - in response to Message 8698.

Brilliant!

I completely understand this now, the wiki doesn't explain this to thoroughly, at least not to a regular old C++/ assembly coder.
I was very close to your solution, after spending hours reading the sentences over and over, as well as win calc in the bit setting mode, to look at patterns.
The 2 steps I had not yet thought of was, it wasn't 5 steps, but k+c[b], as well as multiplying a (shifted over k bits) by 3^c[b] + d[b].

so far I've tested these new bits of information added to my code, and the verifications of the numbers obtained at k+c[b] steps compared to a regular step by step, seem to be true.

I appreciate your help in this, this will substantially speed up my code, all I need to do are code up some very fast assembly routines to deal with the new operations being preformed.

The basis behind my bigint lib(designed for 3x+1 only) is based off what is stated in this section
http://en.wikipedia.org/wiki/Collatz_conjecture#Experimental_evidence

It is also known that {4,2,1} is the only repeating cycle possible with fewer than 35400 terms.

I aim to test numbers that have more terms then this, I assume terms means illiterations, or the number size.

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 8701 - Posted: 5 Jul 2010, 10:39:17 UTC - in response to Message 8700.

http://en.wikipedia.org/wiki/Collatz_conjecture#Experimental_evidence
It is also known that {4,2,1} is the only repeating cycle possible with fewer than 35400 terms.

I aim to test numbers that have more terms then this, I assume terms means illiterations, or the number size.

It means the shortest possible cycle (beside the "normal" 4,2,1,4) consists of at least 35,400 steps if it exists.

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8715 - Posted: 6 Jul 2010, 8:45:07 UTC

If any one is interested, I have it up and running with k=16, which seems to fit nicely in my cpu's cache. It's processing at about ~1.3 billion steps per second on 320bit starting integers, which can expand up to as big as memory allows (8 gig in my case);

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 8716 - Posted: 6 Jul 2010, 10:05:29 UTC - in response to Message 8715.
Last modified: 6 Jul 2010, 10:06:20 UTC

If any one is interested, I have it up and running with k=16, which seems to fit nicely in my cpu's cache. It's processing at about ~1.3 billion steps per second on 320bit starting integers, which can expand up to as big as memory allows (8 gig in my case);

Just for comparison, running the official project applications a HD5870 does about 255 billion steps per second and a 2.4GHz Core2 about 2.37 billion steps per second (the 64bit version I think, but I may be wrong, it was benchmarked a long time ago). But the project currently use only 192bit integers (but detects overflow if it isn't enough). So your implementation appears to be competetive.

By the way, you can go up to k=20 (for k=21 the values in the tables don't fit 32bit integers anymore).

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8717 - Posted: 6 Jul 2010, 10:37:16 UTC - in response to Message 8716.
Last modified: 6 Jul 2010, 10:39:32 UTC

I've tried k=20, but I notice a slowdown compared to k=16, as it doesn't fit into my cpu's cache(AMD phenom), k=16 on my cpu has the fastest speeds. Keep in mind AMD cpu's arent the same as Intels. If the stock cpu apps are using k=20, you might consider testing AMD versions with k=16.
I've also implemented a method of storing all 3 data requirements in a single 8 byte location, as the first access loads at least 8 bytes into the cache, subsequent access bypass memory because it's already loaded, this resulted in a 5% speedup. I'd also take any suggestions you might have on optimizations.

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 8724 - Posted: 6 Jul 2010, 17:17:31 UTC - in response to Message 8717.
Last modified: 6 Jul 2010, 18:41:55 UTC

I've tried k=20, but I notice a slowdown compared to k=16, as it doesn't fit into my cpu's cache(AMD phenom), k=16 on my cpu has the fastest speeds. Keep in mind AMD cpu's arent the same as Intels. If the stock cpu apps are using k=20, you might consider testing AMD versions with k=16.
I've also implemented a method of storing all 3 data requirements in a single 8 byte location, as the first access loads at least 8 bytes into the cache, subsequent access bypass memory because it's already loaded, this resulted in a 5% speedup. I'd also take any suggestions you might have on optimizations.

Actually I've tested it on a Phenom 9750 and k=20 appeared to be the fastest. And I store it even 16 byte aligned (4x32bits).

But to tell the truth, the 16 byte alignment came from the GPU version, where it was just laziness from my side to pack 4 values into a single LUT. Using smaller ones may buy one or two percent performance, but at that time there were lower hanging fruits for performance gains. And on CPUs a 16 byte alignment is probably still better than aligning on 12 byte borders ;)

That packing of 3 values into 8 byte only works up to k=17 and breaks down at k=18, isn't it?

It should be easy for you to shorten your numbers to 192bits and run the project applications (64bit is definitely the fastest one) side by side to your version for performance comparisons. The checked number range can be seen in plain text in the WU files and also the task details, so it should be easy to set up a test with the same checked numbers (and don't forget to compare the results ;).

Edit:

Just found the benchmarks on the Phenom 9750 (running at 2.55 GHz) I did back then for some reduced size WUs:

k runtime [s] size 20 61.97 16 MB 19 63.01 8 MB 18 64.21 4 MB 17 72.66 2 MB 16 68.20 1 MB 15 69.75 512 kB 14 74.97 256 kB 13 80.09 128 kB 12 88.25 64 kB 10 108.86 16 kB (could use 2 byte per value)


So one indeed see some effects from the cache when increasing the size of the lookup table, but the trend to lower times still prevails.
All runs were done for 402653184 consecutive numbers starting with 2361183348087637911912. Total executed steps were 204,260,372,158, i.e. just short of 3.3 billion steps per second on that 2.55 GHz Phenom.

Running Collatz Conjecture (3x+1) application version 0.1 by Gipsel
Reading input file ... done.
Checking 402653184 numbers starting with 2361183348087637911912

CPU: AMD Phenom(tm) 9750 Quad-Core Processor (4 cores/threads) 2.5492 GHz (2ms)

Initializing lookup table (16384 kB) ... done
needed 1467 steps for 2361183348087997950857
204260372158 total executed steps for 402653184 numbers

WU completed.
CPU time: 61.9688 seconds, wall clock time: 61.91 seconds, CPU frequency: 2.54961 GHz

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8725 - Posted: 6 Jul 2010, 17:51:23 UTC - in response to Message 8724.
Last modified: 6 Jul 2010, 18:00:08 UTC

all my memory is aligned at 8 bytes at least, this is all that is needed for a 64bit cpu, but the function I am using aligns at 16 and 64 as well.

I've tried 17,18,19,20 and they are all between 50% and 10% slower then using k = 16. I have no idea why, other than, k = 16 results in about 512k of memory, most of which can fit into my cpu's L2 cache.
Even just 17 causes a 10% loss in speed.

That packing of 3 values into 8 byte only works up to k=17 and breaks down at k=18, isn't it?

actually no, //{c,p,zz,dddd} - This is how my data is stored, d is 32bits, c is 8 bits,p is 8 bits. p is c that has been pre calculated 3 to the power of c. I only access c to add to the step counter. I need to actually make it so it c will subtract from the step count if necessary, do to the fact that applying the F function 5 times to lets say the number 1 for example would result in a overflow into the repeating sequence, when we calculate the real steps as ending at one.

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 8726 - Posted: 6 Jul 2010, 18:12:16 UTC - in response to Message 8725.

That packing of 3 values into 8 byte only works up to k=17 and breaks down at k=18, isn't it?

actually no, //{c,p,zz,dddd} - This is how my data is stored, d is 32bits, c is 8 bits,p is 8 bits. p is c that has been pre calculated 3 to the power of c. I only access c to add to the step counter.

The problem I see is that if c is 19 for instance, 3^19 needs 31 bits to store, the maximum occuring offset (d) also needs 31 bits and c itself 5 bits. If you don't partition your p=3^c in some strange way (Is the zz be used for that? But it will hurt the performance later when you have to reconstruct it for the multiplication with a), I don't see how you fit those 67 bits into 64 bits.

I need to actually make it so it c will subtract from the step count if necessary, do to the fact that applying the F function 5 times to lets say the number 1 for example would result in a overflow into the repeating sequence, when we calculate the real steps as ending at one.

For that I use my fourth entry ;)

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8727 - Posted: 6 Jul 2010, 18:18:35 UTC - in response to Message 8726.
Last modified: 6 Jul 2010, 18:38:13 UTC

Edit: I've noticed my blunder of a mistake, i need to apply the F function K times, not just 5 times, that was a relic from when my table was k= 5 =P I'll edit this after i fix it up

Profile Gipsel
Volunteer moderator
Project developer
Project tester
Send message
Joined: 2 Jul 09
Posts: 279
Credit: 77,516,587
RAC: 76,639
Message 8728 - Posted: 6 Jul 2010, 18:38:32 UTC - in response to Message 8727.
Last modified: 6 Jul 2010, 18:39:52 UTC

but if I only apply F function 5 times, there can only be a max of 5 "odds along the way", c can't be greater then 5, can it? or am I missing some part of information.

But then are you using k=5.
Actually, the maximum c is exactly k. So if you use k=5, then c can't be greater than 5 (as in the wikipedia example).

But for k=16, c can be 16 of course and you will need to store 3^16 (43,046,721) as the maximum number in your lookup table for your p, as you have to calculate a*43,046,721 for a number were the last 16 bits are all set (which means you have to calculate 16 times in a row *(3+1)/2).
This can be shortened by shifting the whole number 16 bits to the right (/2^16), multiplying it by 3^16 and adding a quite large number d to it (I think it is 3^16-1=43,046,720).

Edit:
Oops, I didn't refreshed the page. Didn't see that you have found this already.

cncr04s
Send message
Joined: 2 May 10
Posts: 15
Credit: 3,444,589
RAC: 0
Message 8729 - Posted: 6 Jul 2010, 19:40:50 UTC - in response to Message 8728.
Last modified: 6 Jul 2010, 19:44:09 UTC

Alright, I've fixed up my code, it's running better now, k=19 is what i'm using now, it seems to be the top now. as for k=20, I get an annoying infinite loop involving 2 ending up as 2, and then 2 again.. and so on, I have to add a second check to the code to test for this, it's probably not worth it. Although I don't get 3 billion steps per second with my implementation, it's due to the fact it's designed for really large int's that only fit in memory, instead of a 256bit int which can fit easy into 4 registers. I get about 150 million steps per second on 6400 bit int's.
Next step is to write a verification function.


Post to thread

Message boards : Science : collatz parity help


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.