What is a Hash Collision?
A Hash Function turns data of any size into a fixed string (fingerprint).
"Hello"->Hash()->a1b2c3"World"->Hash()->d4e5f6
The Collision
Since the output is fixed (e.g., 128 bits), there is a limited number of possible hashes.
Since there is an infinite number of possible inputs (every possible book, image, and video), eventually two different inputs must produce the same hash.
"Cat" -> a1b2c3"Dog" -> a1b2c3 <-- COLLISION!
Why is this bad?
- Security: If I can modify a virus file so it has the same hash as a safe Windows file, the antivirus won't detect it.
- Data Storage: In a Hash Map (dictionary), collisions slow down data retrieval because the computer has to double-check the data.
Avoiding Collisions
We cannot stop them, but we can make them mathematically impossible to find by using larger hashes (SHA-256 vs MD5).