Article #008: Collatz Conjecture

Published: 01/21/2023

All of the source code for the conducted research can be found in BooleanCube/collatz-conjecture

Collatz Conjecture

The simple piecewise function, caused such large commotion and has gone unsolved for over 8 decades. The simple 3x + 1 problem was rumored to be a Soviet trap designed to slow down American mathematics and science during the space race between the two nations. It was proven ot be effective because even after 8 decades, mathematicians are still working towards the problem by writing scripts to test numbers and catch any edge cases which break the conjecture.

The piecewise function for the 3x+1 problem looks like this:

If x is even, f(x) = x/2, but if x is odd, f(x) = 3x+1. This function is infinitely recursed upon to form a sequence of numbers.

For Example:
f(3) = 10 ->
f(10) = 5 ->
f(5) = 16 ->
f(16) = 8 ->
f(8) = 4 ->
f(4) = 2 ->
f(2) = 1 ->
f(1) = 4 ->
...this then falls into a never-ending loop of 4 - 2 - 1

Definitions

Dropping Time / Delay = Amount of steps to reach 1 from n in the sequence.
Glide = Amount of steps to reach a number stricly less than n in the sequence.

Convergent Sequence = The sequences reaches 1 eventually.
Divergent Sequence = The sequence infinitely increases.
Cyclic Sequence = The sequence never reaches 1 nor is it increasing towards infinity.

Operations

& (Bitwise AND Operation) = Performs AND Operation over the binary expression of 2 base-10 integers. (101001 & 1 = 1)
>> (Bitwise Right Shift Operation) = Right shifts the binary expression of a base-10 integer. (10110 >> 2 = 101)
n & 1 => Determines whether n is even or odd. If n&1 = 0, n is even, but if n&1 = 1, n is odd.
n >> 1 => Basically divides n by 2. By removing the last digit in the binary expression, all bits were shifted down by 1 and divided the value of each bit by 2. This divides the value of the whole number by 2.

The main reason I used these operations rather than traditional and conventional operations was for faster runtime speeds.

Conjecture Verification

The Collatz conjecture states that the orbit of every number under function f eventually reaches 1. This has been proven to be true for the first 2^68 whole numbers after computational testing and verification. Despite being a large number, this covers almost nothing on the real number line and it is not sufficient to prove the conjecture entirely.

I have written some scripts for Collatz Conjecture Verification in Python3 and C++ to test some numbers on my own.

conjecture_verification.py and conjectureVerification.cpp is just optimized brute-forcing towards the problem with a little bit of Dynamic Programming involved since we use the dropping time/delay (amount of steps to reach 1 from n) of previous values to find the dropping time of our current value. For Example: Lets say D(n) = the dropping time for the value of n.
When we are calculating for D(4), we can find the next number in the sequence which is n=2. If D(2) has already been calculated, we can correctly say that D(4) is equal to D(2) + C, C being the amount of steps to get from n to that already calculated value.

convergence_verification.py uses a much more optimized algorithm which only verifies whether numbers are convergent. This removes the necessity to calculate each number's dropping time. The algorithm also uses sieves (sliding windows) to check smaller ranges over time and build a list of numbers with abnormally large glide values instead of checking the dropping times of all numbers in the sieve. With the threshold limit of 2^8, and a sieve size of 2^20, the convergence algorithm was able to successfully verifiy 2^22 numbers in 3.02 seconds with normal Python 3.10 compilers. Using this algorithm with C++ or PyPy3 compilers could reduce the runtime significantly.

Graphing Visualizations

benfords_law.py attempts to grab all the frequencies of the first digits in the step numbers and adds them to a histogram. For the first 100000 numbers, if you track the frequencies of the first digits of numbers in each step and draw a histogram we see a unique shape. 1 being the most frequent and 9 being the least frequent and there's an exponential curve in between. This curvature is more commonly known as Benford's Law and can be found in many use cases in our daily lives. Sorry for the weird formatting! I couldn't figure out how to fix it...

delay_graph.py graphs the relation between n (x-axis) and n's delay (y-axis). This relationship creates a weird graph which has no distinctive shape and we can't express their relation with just 1 simple math expression because of it's complexity.

glide_graph.py graphs the relation between n (x-axis) and n's glide (y-axis). This relationship shows a very frequent pattern of occuring between powers of 2. A glide of 1 shows up for every even number since it's first move returns a step that is lower than itself, so it occurs every 2^1. Similarly, a glide value of 3 shows up in a pattern of every 2^2. A glide value of 6 shows up in a pattern that occurs every 2^4 and this pattern continues on forever with the glide values.

Glide Patterns

We notice the pattern in glide_graph.py script but the steps in the glide seem almost random. It jumps from 1 to 3 to 6 to 8 to 11 and so on. So, I decided to create a table of every glide value of n (within a range of 0-10000) in order to find any patterns with the glide values. So, I recursed through all numbers in the range and added their glide values to a set (to remove duplicates). After adding all glide values to a set, I sorted the set and indexed every glide values from 1-10000. I found a noticeable pattern in the sorted set of glide values lying within the differences between each glide. The differences between each glide formed a 2-3-2-3-2-3-3-2-3-2-3-3 pattern which is also commonly known as a 12-layer octave pattern (found in piano keys). This proves that all glide values must be finite. For a second, you might think "the conjecture hasn't been proven because even though all glide values are finite the glide step could be divergent." But in fact, that is incorrect because if the number at the glide step is divergent, then the glide for that number is infinite which can be proven to be false. If this does actually prove that real numbers can't be divergent, there is still the possibility of a number being cyclic so this doesn't prove the collatz conjecture. The glide sequence has also been compiled down to this single function: f(n+1) = floor(1 + n + n*log(3)/log(2))

Calculator

If you want to test numbers of your own quickly, the collatz_calculator.py script is what you are looking for. You can calculate and measure information about any number (as large as you want) rapidly. Given a number through input, it will calculate the numbers, delay, glide, residue, strength, level, shape of its path, etc.
The number is most likely divergent if the calculator takes more than 3 seconds to measure all parameters for a number.

Terras Theorem

Statement: Let M be a positive integer. Let D(M) be the fraction of numbers < M that do not have finite stopping time. Then the limit of D(M) for M → ∞ is equal to 0.
Essentially, the Terras Theorem states that the Delay Record as n approaches infinity decreases towards 0. terras_theorem.py returns a number's glide value and the parity vector of its stepping sequence. You can then check these with the Delay Records Table that has already been compiled for large numbers and find the Delay Record.

Arithmetic Mean

arithmetic_mean.py generates all points for each step in the number sequence from n with the x-axis being the step count and the y-axis being the value of the number in the sequence at that step count.
For example, if n=4 the points generated would be: (1,4), (2,2), (3,1)
With these coordinates, we would like to find the average amount of decrease or increase between each step that is taken in the sequence to see if we can prove that every number is bound to decrease. To find the average amount of increase/decrease between each step, we draw a line of best fit for all coordinates in each step and then find the slope of that line. We have to find the slope of this line for multiple numbers to find the average over all values. For the first 10000 natural numbers, the average amount of decrease is calculated to be roughly about -0.13792251144898038

Odd Number Analysis Patterns

I was testing the odd numbers to see whether the majority of odd numbers when multiplied by 3 and added by 1 result in even numbers with more than one prime factors of 2. If an even number has more than one prime factor of 2, then it is divisible by 2 more than once making the number less. Let's name such even numbers as splitting numbers. As I was going through all odd numbers from 1, I found a pattern of numbers that produce splitting numbers when applied into the function. Starting from 1, every other odd number produces a splitting even number.
Sequence of odd numbers that produce splitting even numbers: 1, 5, 9, 13, 17, 19, 21, 25, 29, 33, 37...
1 goes to 4-2-1, 5 goes to 16-4-2-1, 9 goes to 28-14-7, but 7 goes to 22-11 (only divides once)
So, there are an even number of odd numbers that produce splitting even numbers and odd numbers that don't. However, some odd numbers produce very large splitting numbers (i.e. have multiple prime factors of 2). Powers of 2 are the biggest of splitting even numbers because they are built only of prime factors of 2 and will take you down to 1 in one clean sweep. This shows that some odd numbers produce large splitting numbers and this means the general trend is downwards providing strong evidence that it might always reach one. But this still doesn't prove that divergent shapes can't occur. "What if I keep finding odd numbers that don't produce splitting numbers?" That means the trend for that number is indeed upwards.

Install

Some scripts such as delay_graph.py, glide_graph.py and benfords_law.py use the matplotlib package for the graphing user interface which is very useful and simple to use.

To be able to use the matplotlib library with the python scripts, you have to install the package. This can be done through a package installer with the matplotlib package available like pip. Learn how to install a stable version of pip.
You can run either of the following commands to install the matplotlib package onto your virtual environment (venv):

                        
    $ pip install matplotlib
    $ python -m pip install matplotlib
    $ python3 -m pip install matplotlib

If you are using a Python version that comes with your Linux distribution, you can also install the matplotlib package via your distribution's package manager.


    $ sudo apt-get install python3-matplotlib  # Debian / Ubuntu
    $ sudo dnf install python3-matplotlib      # Fedora
    $ sudo yum install python3-matplotlib      # Red Hat
    $ sudo pacman -S python-matplotlib         # Arch Linux

References

https://www.ericr.nl/wondrous/
https://oeis.org/A122437
https://youtu.be/094y1Z2wpJg
https://youtu.be/i4OTNm7bRP8

Written by BooleanCube :]