CSV and Delimited Data
Long before complex databases like SQL or fancy formats like JSON, there was a brutally simple idea:
“Let’s put data into plain text, and separate values with a special character.”
That special character is called a delimiter.
What is Delimited Data?
Delimited data is a text format where:
- each row is one record (like one person, one order, one product)
- each column is one field (name, age, email, etc.)
- a delimiter separates columns (comma, semicolon, tab…)
The most common format is CSV = Comma Separated Values.
Name,Age,Role
Alice,28,Engineer
Bob,34,Designer
Charlie,22,Intern
A spreadsheet can open this, a database can import it, and a human can read it.
CSV is popular because it’s boring (and boring is powerful)
CSV wins because:
- It’s human-readable.
- It’s supported by almost every tool on earth.
- It’s easy to generate from code.
- It’s simple to move between systems.
If a big company wants to export data, they often choose CSV because it’s the lowest common denominator.
The biggest CSV lie: “comma separated”
In many countries, the comma is used as a decimal separator (e.g. 3,14).
So spreadsheets sometimes prefer semicolon CSV:
Product;Price;InStock
Keyboard;39,99;true
Mouse;12,50;true
Monitor;199,00;false
That’s why you’ll hear “CSV” used to mean any delimited format, not literally comma.
Visual: delimiter as a “cut line”
The delimiter is basically a “cut here” line so software knows where a field ends.
The real CSV problems (the ones that break imports)
1) Commas inside text
What if someone’s job title is: Engineer, Backend?
If you write:
Alice,28,Engineer, Backend
Now it looks like 4 columns, not 3.
Solution: quoting.
Name,Age,Role
Alice,28,"Engineer, Backend"
2) Newlines inside text
A note field might contain multiple lines.
CSV allows it, but only if it’s quoted properly:
Name,Note
Alice,"Line one
Line two"
3) Quotes inside text
If the text contains quotes, you escape them by doubling them:
Name,Quote
Alice,"She said ""hello"" to everyone."
CSV vs TSV (tabs)
TSV is tab-separated values. It uses \t instead of commas.
Why some people love TSV:
- Tabs rarely appear in normal text
- Less quoting pain
Example TSV (tabs shown as → here for clarity):
Name→Age→Role
Alice→28→Engineer
Bob→34→Designer
In real files, those are tab characters.
CSV is not a database (important)
CSV is great for transferring data, but it has limitations:
- No data types (everything is text unless interpreted)
- No constraints (no “unique” or “required” columns)
- No indexes (search is slow for big files)
- No relationships (no foreign keys)
If your data grows or you need rules, move to a real database (SQL) or a structured format (JSON for nested data).
Practical advice for working with CSV safely
Choose UTF-8
Always save CSV as UTF-8 to avoid “weird character” issues (accents, Arabic, emojis).
Decide delimiter + stick to it
Document whether you use , or ; or tabs.
Include headers
Headers make exports understandable and safer.
Validate column count
If you expect 5 columns and you get 6, something is wrong (usually quoting).
Quick FAQ
Why does Excel sometimes “break” my CSV?
Because Excel guesses delimiter, encoding, and date formats. Importing via “Data → From Text/CSV” gives you more control.
Is CSV safe for huge files?
It works, but parsing can be heavy. For very large datasets, consider Parquet/Arrow or database exports.
Can CSV store nested objects?
Not cleanly. CSV is flat. For nested structure, JSON is better.
Summary
- CSV is plain text + delimiter
- The hard parts are quoting, encoding, and tool assumptions
- If you master those three, CSV becomes one of the most reliable formats you’ll ever use