collatz parity help

log in |

Message boards : Science : collatz parity help

Author | Message |
---|---|

I'm having a lot of trouble understanding the step ahead k steps using a parity sequence. The "parity" section above gives a way to speed up simulation of the sequence. To jump ahead k steps on each iteration (using the f function from that section), break up the current number into two parts, b (the k least significant bits, interpreted as an integer), and a (the rest of the bits as an integer). The result of jumping ahead k steps can be found as: f k(a 2k+b) = a 3^c[b]+d[b]. The c and d arrays are precalculated for all possible k-bit numbers b, where d [b] is the result of applying the f function k times to b, and c [b] is the number of odd numbers encountered on the way. For example, if k=5, you can jump ahead 5 steps on each iteration by separating out the 5 least significant bits of a number and using: c [0...31] = {0,3,2,2,2,2,2,4,1,4,1,3,2,2,3,4,1,2,3,3,1,1,3,3,2,3,2,4,3,3,4,5} d [0...31] = {0,2,1,1,2,2,2,20,1,26,1,10,4,4,13,40,2,5,17,17,2,2,20,20,8,22,8,71,26,26,80,242}. Using the above example and k = 5 Take for example the number 25, in 5 steps I know my result should be 44(76 38 19 29 44). I know that 3^c[b]+d[b] ends up as 49, being c=3 and d=22. I've thought every way of applying the number 49 to come up with the result of 44, and I can not find it.( nor can I use it to come up with any number in the sequence) I don't completely understand this math function as a C++ programmer. I've spent hours and hours on this and I can't figure out this math function, and apply it to some C++ code in order to help my big number tester for collatz sequences. If any one could help me clarify what exactly this function is doing in some C or some psuedo code, I would greatly appreciate it. | |

ID: 8678 · Rating: 0 · rate:
/ Reply
Quote
| |

ID: 8693 · Rating: 0 · rate:
/ Reply
Quote
| |

Allot of neat information on that page, but I can't find any thing specific to the speedup of sequence simulation. | |

ID: 8697 · Rating: 0 · rate:
/ Reply
Quote
| |

I'm having a lot of trouble understanding the step ahead k steps using a parity sequence. Actually you don't calculate the the number after k steps (it depends on how you count). In fact, you calculate the number after 5 divisions by two, so that would be f_k+c{b} in case you count every step (3x+1 and the following /2 as two separate ones), so in your example it would be the number after 8 steps in the notation of this project. f_1(25) = 76 f_2(25) = 38 (would be f_1 if you only count divisions) f_3(25) = 19 (f_2) f_4(25) = 58 f_5(25) = 29 (f_3) f_6(25) = 88 f_7(25) = 44 (f_4) f_8(25) = 22 (would be f_5 if you only count divisions)Now, remember that in your example a is zero, b=3, c[b] = 3, and d[b] = 22, the whole thing comes out as:f_5+c[b](25) = f_8(25) = a*3^c[b] + d[b] = 0*3^3 + 22 = 22 Problem solved. | |

ID: 8698 · Rating: 0 · rate:
/ Reply
Quote
| |

Brilliant! It is also known that {4,2,1} is the only repeating cycle possible with fewer than 35400 terms. I aim to test numbers that have more terms then this, I assume terms means illiterations, or the number size. | |

ID: 8700 · Rating: 0 · rate:
/ Reply
Quote
| |

http://en.wikipedia.org/wiki/Collatz_conjecture#Experimental_evidenceIt is also known that {4,2,1} is the only repeating cycle possible with fewer than 35400 terms. It means the shortest possible cycle (beside the "normal" 4,2,1,4) consists of at least 35,400 steps if it exists. | |

ID: 8701 · Rating: 0 · rate:
/ Reply
Quote
| |

If any one is interested, I have it up and running with k=16, which seems to fit nicely in my cpu's cache. It's processing at about ~1.3 billion steps per second on 320bit starting integers, which can expand up to as big as memory allows (8 gig in my case); | |

ID: 8715 · Rating: 0 · rate:
/ Reply
Quote
| |

If any one is interested, I have it up and running with k=16, which seems to fit nicely in my cpu's cache. It's processing at about ~1.3 billion steps per second on 320bit starting integers, which can expand up to as big as memory allows (8 gig in my case); Just for comparison, running the official project applications a HD5870 does about 255 billion steps per second and a 2.4GHz Core2 about 2.37 billion steps per second (the 64bit version I think, but I may be wrong, it was benchmarked a long time ago). But the project currently use only 192bit integers (but detects overflow if it isn't enough). So your implementation appears to be competetive. By the way, you can go up to k=20 (for k=21 the values in the tables don't fit 32bit integers anymore). | |

ID: 8716 · Rating: 0 · rate:
/ Reply
Quote
| |

I've tried k=20, but I notice a slowdown compared to k=16, as it doesn't fit into my cpu's cache(AMD phenom), k=16 on my cpu has the fastest speeds. Keep in mind AMD cpu's arent the same as Intels. If the stock cpu apps are using k=20, you might consider testing AMD versions with k=16. | |

ID: 8717 · Rating: 0 · rate:
/ Reply
Quote
| |

I've tried k=20, but I notice a slowdown compared to k=16, as it doesn't fit into my cpu's cache(AMD phenom), k=16 on my cpu has the fastest speeds. Keep in mind AMD cpu's arent the same as Intels. If the stock cpu apps are using k=20, you might consider testing AMD versions with k=16. Actually I've tested it on a Phenom 9750 and k=20 appeared to be the fastest. And I store it even 16 byte aligned (4x32bits). But to tell the truth, the 16 byte alignment came from the GPU version, where it was just laziness from my side to pack 4 values into a single LUT. Using smaller ones may buy one or two percent performance, but at that time there were lower hanging fruits for performance gains. And on CPUs a 16 byte alignment is probably still better than aligning on 12 byte borders ;) That packing of 3 values into 8 byte only works up to k=17 and breaks down at k=18, isn't it? It should be easy for you to shorten your numbers to 192bits and run the project applications (64bit is definitely the fastest one) side by side to your version for performance comparisons. The checked number range can be seen in plain text in the WU files and also the task details, so it should be easy to set up a test with the same checked numbers (and don't forget to compare the results ;). Edit: Just found the benchmarks on the Phenom 9750 (running at 2.55 GHz) I did back then for some reduced size WUs: k runtime [s] size
20 61.97 16 MB
19 63.01 8 MB
18 64.21 4 MB
17 72.66 2 MB
16 68.20 1 MB
15 69.75 512 kB
14 74.97 256 kB
13 80.09 128 kB
12 88.25 64 kB
10 108.86 16 kB (could use 2 byte per value) So one indeed see some effects from the cache when increasing the size of the lookup table, but the trend to lower times still prevails. All runs were done for 402653184 consecutive numbers starting with 2361183348087637911912. Total executed steps were 204,260,372,158, i.e. just short of 3.3 billion steps per second on that 2.55 GHz Phenom. Running Collatz Conjecture (3x+1) application version 0.1 by Gipsel | |

ID: 8724 · Rating: 0 · rate:
/ Reply
Quote
| |

all my memory is aligned at 8 bytes at least, this is all that is needed for a 64bit cpu, but the function I am using aligns at 16 and 64 as well. That packing of 3 values into 8 byte only works up to k=17 and breaks down at k=18, isn't it? actually no, //{c,p,zz,dddd} - This is how my data is stored, d is 32bits, c is 8 bits,p is 8 bits. p is c that has been pre calculated 3 to the power of c. I only access c to add to the step counter. I need to actually make it so it c will subtract from the step count if necessary, do to the fact that applying the F function 5 times to lets say the number 1 for example would result in a overflow into the repeating sequence, when we calculate the real steps as ending at one. | |

ID: 8725 · Rating: 0 · rate:
/ Reply
Quote
| |

The problem I see is that if c is 19 for instance, 3^19 needs 31 bits to store, the maximum occuring offset (d) also needs 31 bits and c itself 5 bits. If you don't partition your p=3^c in some strange way (Is the zz be used for that? But it will hurt the performance later when you have to reconstruct it for the multiplication with a), I don't see how you fit those 67 bits into 64 bits. I need to actually make it so it c will subtract from the step count if necessary, do to the fact that applying the F function 5 times to lets say the number 1 for example would result in a overflow into the repeating sequence, when we calculate the real steps as ending at one. For that I use my fourth entry ;) | |

ID: 8726 · Rating: 0 · rate:
/ Reply
Quote
| |

Edit: I've noticed my | |

ID: 8727 · Rating: 0 · rate:
/ Reply
Quote
| |

but if I only apply F function 5 times, there can only be a max of 5 "odds along the way", c can't be greater then 5, can it? or am I missing some part of information. But then are you using k=5. Actually, the maximum c is exactly k. So if you use k=5, then c can't be greater than 5 (as in the wikipedia example). But for k=16, c can be 16 of course and you will need to store 3^16 (43,046,721) as the maximum number in your lookup table for your p, as you have to calculate a*43,046,721 for a number were the last 16 bits are all set (which means you have to calculate 16 times in a row *(3+1)/2). This can be shortened by shifting the whole number 16 bits to the right (/2^16), multiplying it by 3^16 and adding a quite large number d to it (I think it is 3^16-1=43,046,720). Edit: Oops, I didn't refreshed the page. Didn't see that you have found this already. | |

ID: 8728 · Rating: 0 · rate:
/ Reply
Quote
| |

Alright, I've fixed up my code, it's running better now, k=19 is what i'm using now, it seems to be the top now. as for k=20, I get an annoying infinite loop involving 2 ending up as 2, and then 2 again.. and so on, I have to add a second check to the code to test for this, it's probably not worth it. Although I don't get 3 billion steps per second with my implementation, it's due to the fact it's designed for really large int's that only fit in memory, instead of a 256bit int which can fit easy into 4 registers. I get about 150 million steps per second on 6400 bit int's. | |

ID: 8729 · Rating: 0 · rate:
/ Reply
Quote
| |

Post to thread

Message boards :
Science :
collatz parity help

Copyright © 2018 Jon Sonntag; All rights reserved.