Data Structures7 min readLast updated: Thu Jan 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time)

CSV and Delimited Data

Long before complex databases like SQL or fancy formats like JSON, there was a brutally simple idea:

“Let’s put data into plain text, and separate values with a special character.”

That special character is called a delimiter.


What is Delimited Data?

Delimited data is a text format where:

  • each row is one record (like one person, one order, one product)
  • each column is one field (name, age, email, etc.)
  • a delimiter separates columns (comma, semicolon, tab…)

The most common format is CSV = Comma Separated Values.

Name,Age,Role
Alice,28,Engineer
Bob,34,Designer
Charlie,22,Intern

A spreadsheet can open this, a database can import it, and a human can read it.


CSV is popular because it’s boring (and boring is powerful)

CSV wins because:

  • It’s human-readable.
  • It’s supported by almost every tool on earth.
  • It’s easy to generate from code.
  • It’s simple to move between systems.

If a big company wants to export data, they often choose CSV because it’s the lowest common denominator.


The biggest CSV lie: “comma separated”

In many countries, the comma is used as a decimal separator (e.g. 3,14).
So spreadsheets sometimes prefer semicolon CSV:

Product;Price;InStock
Keyboard;39,99;true
Mouse;12,50;true
Monitor;199,00;false

That’s why you’ll hear “CSV” used to mean any delimited format, not literally comma.


Visual: delimiter as a “cut line”

Alice,28,Engineer

Field 1
Field 2
Field 3

delimiters

The delimiter is basically a “cut here” line so software knows where a field ends.


The real CSV problems (the ones that break imports)

1) Commas inside text

What if someone’s job title is: Engineer, Backend?

If you write:

Alice,28,Engineer, Backend

Now it looks like 4 columns, not 3.

Solution: quoting.

Name,Age,Role
Alice,28,"Engineer, Backend"

2) Newlines inside text

A note field might contain multiple lines.

CSV allows it, but only if it’s quoted properly:

Name,Note
Alice,"Line one
Line two"

3) Quotes inside text

If the text contains quotes, you escape them by doubling them:

Name,Quote
Alice,"She said ""hello"" to everyone."

CSV vs TSV (tabs)

TSV is tab-separated values. It uses \t instead of commas.

Why some people love TSV:

  • Tabs rarely appear in normal text
  • Less quoting pain

Example TSV (tabs shown as here for clarity):

Name→Age→Role
Alice→28→Engineer
Bob→34→Designer

In real files, those are tab characters.


CSV is not a database (important)

CSV is great for transferring data, but it has limitations:

  • No data types (everything is text unless interpreted)
  • No constraints (no “unique” or “required” columns)
  • No indexes (search is slow for big files)
  • No relationships (no foreign keys)

If your data grows or you need rules, move to a real database (SQL) or a structured format (JSON for nested data).


Practical advice for working with CSV safely

Choose UTF-8

Always save CSV as UTF-8 to avoid “weird character” issues (accents, Arabic, emojis).

Decide delimiter + stick to it

Document whether you use , or ; or tabs.

Include headers

Headers make exports understandable and safer.

Validate column count

If you expect 5 columns and you get 6, something is wrong (usually quoting).


Quick FAQ

Why does Excel sometimes “break” my CSV?
Because Excel guesses delimiter, encoding, and date formats. Importing via “Data → From Text/CSV” gives you more control.

Is CSV safe for huge files?
It works, but parsing can be heavy. For very large datasets, consider Parquet/Arrow or database exports.

Can CSV store nested objects?
Not cleanly. CSV is flat. For nested structure, JSON is better.


Summary

  • CSV is plain text + delimiter
  • The hard parts are quoting, encoding, and tool assumptions
  • If you master those three, CSV becomes one of the most reliable formats you’ll ever use