The Prime Finding Machine

The Problem

Let's assume that you want to find a list of all the prime numbers below 100. How would you do that?

An elementary solution

One might first think to take every number below 100, and divide by all the numbers below it to see if it is prime or not.


2 is prime


3 ÷ 2 = 1.5, 3 is prime


4 ÷ 2 = 2, 4 is composite


5 ÷ 2 = 2.5

5 ÷ 3 = 1.6666...

5 ÷ 4 = 1.25, 5 is prime


6 ÷ 2 = 3, 6 is composite


7 ÷ 2 = 3.5

7 ÷ 3 = 2.3333...

7 ÷ 4 = 1.75

7 ÷ 5 = 1.4

7 ÷ 6 = 1.1666..., 7 is prime


8 ÷ 2 = 4, 8 is composite


9 ÷ 2 = 4.5

9 ÷ 3 = 3, 9 is composite

Optimizations

Clearly, this process is very slow, and generating all the primes below a very large number (for example, 2147483647), would be very difficult. How do we optimize this?

Well clearly, if a number has already been proven composite, we don't need to divide by that number when we are testing. This already eliminates many tests, and makes the process much faster:

2 is prime


3 ÷ 2 = 1.5, 3 is prime


4 ÷ 2 = 2, 4 is composite


5 ÷ 2 = 2.5

5 ÷ 3 = 1.6666...

4 is not prime, dividing by 4 is not needed, 5 is prime


6 ÷ 2 = 3, 6 is composite


7 ÷ 2 = 3.5

7 ÷ 3 = 2.3333...

7 ÷ 5 = 1.4, 7 is prime


8 ÷ 2 = 4, 8 is composite


9 ÷ 2 = 4.5

9 ÷ 3 = 3, 9 is composite

Take a look at the process for 7. It only took three steps, while in the previous process, it took five.


But we can do better.


Notice how if we divide a number by another number that is more than half of it (for example, 5 ÷ 3), we never get a whole number. This makes sense, because 2 is the smallest factor a number can have, meaning that the largest factor it can have is half of the original number. By doing that, we also save up on a few tests:


2 is prime


3 ÷ 2 = 1.5, 3 is prime


4 ÷ 2 = 2, 4 is composite


5 ÷ 2 = 2.5


6 ÷ 2 = 3, 6 is composite


7 ÷ 2 = 3.5

7 ÷ 3 = 2.3333..., 7 is prime


8 ÷ 2 = 4, 8 is composite


9 ÷ 2 = 4.5

9 ÷ 3 = 3, 9 is composite

Let's look at the process for 7 again. This time, we only have TWO tests, down from five at the start.

But how low can we go?

We actually only need to check up to the square root of the number. For example, the square root of 7 is approximately 2.6458. This means that we only need to check prime numbers up to that point, which in this case, is just TWO. In fact, for the first 100 numbers, we only need if the number divides primes up to 7, as the square root of 100 is 10, and the largest prime number less than 10 is 7.


2 is prime


3 is prime


4 ÷ 2 = 2, 4 is composite


5 ÷ 2 = 2.5


6 ÷ 2 = 3, 6 is composite


7 ÷ 2 = 3.5, 7 is prime


8 ÷ 2 = 4, 8 is composite


9 ÷ 2 = 4.5

9 ÷ 3 = 3, 9 is composite


Why do we only need to check up to the square root?

Let's assume we wanted to check if 119 was a prime number. Spoiler alert, it's not. The square root of 119 is about 10.9087, meaning that we only need to check up to the largest prime number less than that, or 7. Why is that?

Let's assume that 119, which I'm telling you isn't prime, can be written as 119 = pq, where p and q are positive integers. And let's assume we are doing the opposite of our claim, and assume that both p and q are greater than the square root of 119, meaning that we'd have to check beyond what I am claiming. But if both p and q are greater than the square root of 119 (let's suppose both p = 11 and q = 11, the smallest prime number larger than our square root), then pq must be greater than 119 (in this case, 121), which doesn't make sense because we know that it equals 119, not larger than 119. This means that you only need to check up to the square root of the number to determine whether or not it is prime.

Look at how much more efficient we made this. But there is one problem we will never solve.


Trial division is still extremely slow.

A completely new solution

Let's think of a completely new solution. Maybe that will allow us to make it faster.

Let's start with the list of numbers from 1 to 100, and indicate 1 is not a prime. In my case, I made it red.

Now, we will filter out all the composites a few at a time. Circle the smallest number that isn't red. In this case, that would be 2. That number is prime. Indicate that it is prime, such as making it green.

Now that we have established that 2 is prime, we can filter out all the multiples of 2 by adding 2 each time, and indicate that they are composite by making them red.

Now we will repeat the process. The smallest "empty number" is 3, so it is prime. We will make it green, and then filter out all the multiples of 3 by making them red. Notice how some multiples of 3, such as 6, have already been made red, but that is okay. If we already filtered it, it doesn't matter if we filtered it out again.

See how this is much, MUCH faster than manually doing trial division? Let's continue the process. The smallest "empty number" is 5, as 4 has already been filtered out. We will make that green, and get rid of the multiples of 5 as well.

We will continue this process until all the numbers have been made either green or red. This is what the list for 100 would look like when it's all completed.

In fact, just like trial division, we only need to check up to the square root of the number, which for 100, is 10. This means that we only needed to go up to 7 before we could say that all the composites have been filtered.

As you can see, this process is much MUCH faster than trial division and is therefore a superior method to finding primes.

This is called the Sieve of Eratosthenes, named after Eratosthenes of Cyrene, a mathematician, geographer, poet, astronomer, and music theorist. Imagine putting all of those professions on a résumé.

I really like it, because it is simple, possibly even simpler than trial division, yet it is much faster.

Yes, it is true that there are much faster algorithms to determine if a specific number is prime (For example, if we only wanted to determine if a very specific large number, such as 2147483647, was prime, the Sieve would actually be slower than trial division), but to find lists of prime numbers, the Sieve is extremely good at doing that.

Because 7 is the largest prime less than 10, once we finished filtering all the multiples of 7, we would have filtered out all the composites less than 100, meaning that any unfilled squares will automatically become green.

Putting the mathematics into code

A problem like this seems like the perfect candidate to turn into code, as computers are very good at doing tasks like this. I have the following Python code written for this process

N = int(input())

isPrime = [False, False]

for i in range(2, N+1):

  isPrime.append(True)

j = 2

while (j <= int(N**0.5)):

  if isPrime[j] == True:

    for k in range(2*j, N+1, j):

      isPrime[k] = False

  j+=1

for l in range(N):

  if isPrime[l] == True:

    print(l)

https://replit.com/@MatthewLiu23/Sieve 

You can also find this code on my code reference page.

To your right is the list of all the prime numbers less than 10 million. Originally, I had the numbers listed on this page, but not only did it make this article monstrously long, it also made my entire website so slow that it would take several minutes for pages to load. So instead, here is a text file, where you can download it instead.

It took my repl.it only a few seconds to find all the prime numbers that you see in the list. Pretty fast, isn't it!