Text & Encodings5 min readLast updated: Fri Mar 29 2024 00:00:00 GMT+0000 (Coordinated Universal Time)

Character Sets vs. Fonts

People often confuse "Unicode" (a standard) with "Arial" (a font). They do completely different jobs.

Character Set (The Number)

A Character Set (like Unicode or ASCII) is a database of mappings.
It maps an abstract idea to a number.

  • Idea: "Latin Capital Letter A"
  • Number: 65 (U+0041)

The Character Set does not know what "A" looks like. It doesn't know about serifs, bold weight, or pixels. It just knows that 65 = A.

Font (The Drawing)

A Font (like Helvetica or Times New Roman) is a database of drawings (glyphs).
It maps a number to a vector shape.

  • Input: 65
  • Output: "Draw a diagonal line up, a diagonal line down, and a horizontal bar."

The "Tofu" Problem (□)

If you see a square box (□) instead of a character, it is usually a Font problem, not a Character Set problem.

  1. The computer knows the number (e.g., Emoji U+1F600).
  2. It asks the current Font: "Do you have a drawing for U+1F600?"
  3. The Font says: "No, I only have drawings for English letters."
  4. The computer draws the "missing character" symbol (□) instead.