File Extensions & Magic Numbers
When you see a file named document.pdf, the .pdf part is the File Extension. It tells you (and the operating system) what kind of data is inside.
However, file extensions are just labels. They are not the file itself.
The Extension Lie
Imagine you have a can of soup. If you peel off the label that says "Tomato Soup" and write "Dog Food" on it, the contents of the can do not change. It is still soup.
The same applies to files.
- Create a text file named
hello.txt. - Rename it to
hello.png. - Try to open it.
Your computer will try to launch an Image Viewer. The Image Viewer will likely crash or show an error saying "Corrupted File." This is because the program expected image data (pixels), but found text data.
The Truth: Magic Numbers
If extensions are unreliable, how do computers actually know what a file is? They use Magic Numbers (also called File Signatures).
The "Magic Number" is a specific sequence of bytes at the very beginning of a file.
- PDF files always start with:
25 50 44 46(which spells%PDFin ASCII). - PNG images always start with:
89 50 4E 47(which spells.PNG). - ZIP files often start with
50 4B(The initials of Phil Katz, the creator of ZIP).
How Validation Works
When you upload an image to a secure website, the server does not trust the extension. A hacker could rename virus.exe to photo.jpg.
Instead, the server reads the first few bytes of the file. If the file claims to be a JPG but the Magic Number starts with MZ (the signature for Windows Executables), the server rejects it immediately.
MIME Types
On the internet (specifically in HTTP), file extensions matter less. When a web server sends a file to your browser, it sends a Content-Type header, also known as a MIME Type.
.html->text/html.jpg->image/jpeg.json->application/json
This tells the browser how to render the data, regardless of what the URL ends with.