Math behind Security

The Accidental Ancestor — How a Computer for Verifying Numbers Shaped Modern Hashing

2026-04-28T00:00:00+00:00

1954, Hans Peter Luhn filed for a US patent on a Computer for Verifying Numbers. This is one of the earliest examples of using mathematical transformations to verify data integrity, a concept that became a foundation for modern hashes. Today you can find it under names Luhn Algorithm, Luhn Formula or Modulus 10 Algorithm.

How does this algorithm work?

Imagine you have a number. Let’s take

Step 1.

Find a substitution digit.

To do that multiply the original digit by two. If result is 10 or bigger, add the digits. A substitution for the first digit from our example would be

3*2 = 6,

but if the original number was 7, its substitution would be

7*2 = 14 = 1+4 = 5

Step 2.

Replace every digit on an odd position from the left by its substitution.

3846205 -> 6886401

Step 3.

Determine a check digit.

To do that add all the digits from the number from step 2. Take modulo 10 of this sum and subtract it from 10. That’s your check digit.

In our example, check digit is 7:

6+8+8+6+4+0+1 = 33 mod 10 = 3
10-3 = 7

In the original paper Luhn performs modulo 10 operation each time the addition happens

(6+8) mod 10 = 4
(4+8) mod 10 = 2
etc

However, it gives the same result

Step 4.

Append this check digit to the number from step 2.

The check digit should be appended in its original form, therefore on an even position. In our example just like in the original device it’s added on the first from the right or the 8th position.

68864017

And how does verifying work?

To verify the number perform a step 2 on the original number that already includes the check digit. Add all the digits. Take modulo 10 of this sum. The result should equal zero.

Original number with a check digit: 75689034
With substitutions: 55389064

Verification: 5+5+3+8+9+0+6+4 = 40 mod 10 = 0

Otherwise the number is invalid. For example if instead of 6 in the original number we accidentally type 7, the result of verification will be:

 5+5+5+8+9+0+6+4 = 42 mod 10 = 2

Does it really work for all the numbers?

Almost. Luhn Algorithm can catch all the single digit errors. Let’s take a look at this table that shows all 10 possible digits and their substitution:

Original digit	0	1	2	3	4	5	6	7	8	9
Substitution	0	2	4	6	8	1	3	5	7	9

We can see, that the second row is just a permutation of the first one. This means that every original digit gets assigned its unique substitution and therefore each mistype will result in a changed checksum. For example:

Original digits: 2 6
Substitutions: 4 6
Original sum: 10

Mistype digits: 3 6
Substitutions: 6 6
Malicious sum: 12

Luhn Algorithm can also catch all the transposition of neighboring digits errors, except for the transposition of 09 or 90, because as we can see from the table above, their substitutions equal the original values.

So is it a cryptographically secured hash function?

No. Luhn Algorithm was created to protect against accidental errors, not malicious attacks. For example, it can’t detect two digit errors, as many of them result in an unchanged sum:

Original digits: 2 6
Substitutions: 4 6
Original sum: 10

Malicious digits: 6 7
Substitutions: 3 7
Malicious sum: 10

That’s why nowadays, numbers that require verification, like your credit card number or your id number, use additional security or better protected algorithms.

Further notes

A year before, in 1953 Luhn introduced the concept that would later serve as a foundation for the hash tables. In the internal IBM memo, he introduces the idea of using math to organize data into searchable buckets, which is basically what we call hash tables nowadays.

As it was an internal IBM memo, it doesn’t have any public access, therefore all the information can be taken only from the secondary source, an IEEE Spectrum article. Rather than repeat it here, I’d recommend reading the IEEE Spectrum article directly. It explains the idea with a clear telephone number database example.

The mathematics behind Luhn’s concept for such information storage had to be modified later to improve transformation, guarantee even distribution and minimize collisions. Rabin-Karp algorithm that I’ve already written about provides some solutions to these problems.

My sources and further readings:

IEEE Article about Hans Peter Luhn Luhn’s Patent on a Computer for Verifying Numbers Luhn “A new method of recording and searching information”

Searching smarter - Plagiarism Detection with Rolling Hashes

2026-04-25T00:00:00+00:00

1987 Michael O. Rabin and Richard M. Karp introduced rolling hashes and polynomial hashing — concepts that reappear throughout modern cryptography and are widely used for plagiarism detection.

How does it work?

This is what polynomial hash looks like:

H=c1*b^(n-1)+c2*b^(n-2)+...cn*b^(0) mod p,
	where cn = character n of the plaintext
	n = number of characters in the plaintext
	p = suitable prime number
	b = base of the code system (256 etc)

Note: if we leave mod p away, we’ll be left with the same algorithm we use to convert numbers for example from binary to decimal system (for binary b = 2).

But why calculate hash function when we can just straight ahead compare characters?

This is a legitimate way to compare strings called a naive string comparison, but i wouldn’t recommend it for large texts. It will be too long and not as elegant as Rabin-Karp algorithm. Take a look yourself: imagine you have a string out of 17 letters, for example “abbrabraarbababra”, and you need to write a program that will find how many times in this string there’s a 3-letter combination “abr”. If you use a naive string comparison you’ll probably write something like this:

text = "abbrabraarbababra"
pattern = "abr"

for each character i in the text:
   	    if text[i] == 'a':
       	if text[i+1] == 'b':
       	    if text[i+2] == 'r':
           		print "found at position", i

This algorithm checks every single character, some of them even multiple times.

Now Rabin-Karp algorithm uses what’s called rolling hash function that is much faster and more efficient than the naive comparison.

What makes Rolling Hash function faster?

It works pretty simply. Instead of checking every letter in a triple, you treat the triple as a whole. Imagine you’re a detective looking for a criminal (specific face) among suspects. Naive comparison is like checking every facial feature of every suspect one by one: first eyes, then nose, then mouth. Using a hash is like comparing the complete face at once.

And we don’t even have to calculate a hash function for each triple from scratch! Rolling hash function calculates the next hash value based on the old hash value using this formula:

Hnew = (b*(Hold -c1*b^(n-1))+cnew) mod p,

which basically removes first character of the triple and adds the next one.

Is Rabin-Karp algorithm always efficient?

Unfortunately no. But good news: even in the worst case scenario, where all calculated hashes are identical, the efficiency will be the same as by naive comparison. So it’s definitely worth trying.

Wait, why would hashes be identical? Aren’t they all unique?

While it’s our goal to make hashes unique and do our best to avoid collisions, some poorly chosen variables can still lead to the pigeonhole principle (it’s a fancy name for having collisions, you can read more about it here). That’s why even after hash values match, Rabin-Karp algorithm still compares string values to exclude collisions.

But why do collisions happen?

To answer this question, let’s take a look at polynomial hash again. We have two variables that are chosen independently from the plaintext: b and p.

Oh, so collisions happen because b is too small?

Not exactly, the number of collisions (preferably their absence) depends primarily on the chosen p value. The larger p is the less collisions we’ll get. Why? Let’s imagine:

p = 7,

then 1 mod 7 = 1
2 mod 7 = 2
3 mod 7 = 3
...
6 mod 7 = 6
7 mod 7 = 0
	
and...
all over again: 8 mod 7 = 1
9 mod 7 = 2
etc.

Which gives us only seven unique values. Now if we took p = 23, we would have had 23 unique values. And so on. The bigger prime number we take, the less likely we will have repetitions. But also keep in mind that it shouldn’t be too big, otherwise it can lead to performance issues.

Why does p have to be a prime number?

It doesn’t have to, but it again reduces collisions. Imagine p as a number of baskets we can put our numbers in. If we take p=12, then we have 12 baskets. At first, if we do the same modulo operations we did for p=7 in the previous example, we’ll end up with the same distribution. But in real life numbers usually don’t come in as a perfectly sorted row. Worst Case scenario: we get only numbers dividable by 3. So instead of:

1, 2, 3, 4, 5, 6... 11, 12, etc,

we get:

3, 6, 9, 12, 15, 18, etc.

That will lead to an uneven distribution, where baskets 3, 6, 9 and 0 will be full, whereas other baskets will be completely empty. Therefore we need to reduce the number of dividers, to prevent a worst case scenario and achieve an even distribution.

If you are looking for illustrations and more detailed but simple explanation, you can find them here

So if the number of collisions is defined by p, why would we need to use b? Why not just adding character values and mod them?

The problem is, that different strings can result in the same sum. Let’s take a look at this example (instead of ascii we’ll just take the alphabet position of the letter):

be = 2+5 = 7
eb = 5+2 = 7, 
just like cd = 3+4 = 7 

Those b constructions add weight to each value making repetitions less likely.

But what size for b should we choose?

Depends on the alphabet you’re using. Usually the value of b is 256, which covers all ASCII characters, but it can also be smaller or bigger. Just make sure it covers all the characters your alphabet has, otherwise it’ll also lead to collisions. But be careful: if the value of b is too big it can result in overflow.

And how exactly is Rabin-Karp method used in modern cryptography?

Even though the Rabin-Karp algorithm is nowadays used primarily for spell checking and bio-informatics, just like other string matching algorithms, it can also be used in network intrusion detection systems to verify the data packets traveling through the network or in digital forensics to find evidence within digital data. Rabin-Karp algorithm introduced a concept of polynomial hashing, however, cryptographic hashing required fundamentally different construction that we’ll talk about next time.

P.S. This post was partially inspired by The Evolution of Hashing Algorithms. It credits Rabin-Karp as the origin of hashing history, which, as I found out after a deeper research, isn’t quite accurate. Hashing goes back further than 1987, but that’s a story for the next post.

My sources and further readings:

MIT Rabin-Karp Algorithm MIT Pigeonhole Michael O. Rabin Biography Research about String Matching Algorithms Oxford Rabin-Karp in Bioinformatics More about Rabin-Karp Algorithm More about Hash Functions

Short intro

2026-04-22T00:00:00+00:00

How is my data secured? How was it kept safe throughout history? Can I decrypt it without a key?

At school these questions wouldn’t leave me alone, so I sat down and wrote a program that would break a Vigenère cipher (mostly by combining pieces of code I found that were supposed to do it). Later they led me to apply to Technical University of Munich (TUM) to study Computer Science. And eventually, they’re what helped me realize I belong in Mathematics instead.

So this blog is a side project for my TUM Math application. It goes like this: I pick a topic from the intersection of mathematics and cybersecurity, research it properly, and then break it down here the way I used to explain math to my classmates before our Math Abi. No unnecessary complexity, just the idea itself, as clearly as I can.

If you’ve ever wondered why encryption actually works (or how it gets broken) you’re in the right place.