Data Structures7 min readLast updated: Fri Apr 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time)

What is a Hash Collision?

A Hash Function turns data of any size into a fixed string (fingerprint).

  • "Hello" -> Hash() -> a1b2c3
  • "World" -> Hash() -> d4e5f6

The Collision

Since the output is fixed (e.g., 128 bits), there is a limited number of possible hashes.
Since there is an infinite number of possible inputs (every possible book, image, and video), eventually two different inputs must produce the same hash.

"Cat" -> a1b2c3
"Dog" -> a1b2c3 <-- COLLISION!

Why is this bad?

  1. Security: If I can modify a virus file so it has the same hash as a safe Windows file, the antivirus won't detect it.
  2. Data Storage: In a Hash Map (dictionary), collisions slow down data retrieval because the computer has to double-check the data.

Avoiding Collisions

We cannot stop them, but we can make them mathematically impossible to find by using larger hashes (SHA-256 vs MD5).