I'm writing data to CSV files for customers to consume. I've been asked to make the last column a field which will be empty sometimes. Sometimes every row in the file will have an empty string for the last column. RFC 4180 says that CSV files may not end with a comma, so I'm concerned about breaking parsers. I don't know exactly how different customers will consume the files (e.g. what kinds of parsers they might use).
Example file: header row followed by two data rows
field1,field2,field3
abcdef,ghijkl,
aaaaaa,bbbbbb,cccccc
Is there a standard way of doing this? RFC 4180 mentions double-quoting fields with troublesome characters, but I didn't see it mention empty strings specifically. I'm wondering if a solution like this is likely to be supported by every parser, or whether this isn't necessarily standard:
field1,field2,field3
abcdef,ghijkl,""
aaaaaa,bbbbbb,cccccc
I'm writing data to CSV files for customers to consume. I've been asked to make the last column a field which will be empty sometimes. Sometimes every row in the file will have an empty string for the last column. RFC 4180 says that CSV files may not end with a comma, so I'm concerned about breaking parsers. I don't know exactly how different customers will consume the files (e.g. what kinds of parsers they might use).
Example file: header row followed by two data rows
field1,field2,field3
abcdef,ghijkl,
aaaaaa,bbbbbb,cccccc
Is there a standard way of doing this? RFC 4180 mentions double-quoting fields with troublesome characters, but I didn't see it mention empty strings specifically. I'm wondering if a solution like this is likely to be supported by every parser, or whether this isn't necessarily standard:
field1,field2,field3
abcdef,ghijkl,""
aaaaaa,bbbbbb,cccccc
Share
Improve this question
asked Mar 13 at 17:32
echawkesechawkes
4852 silver badges12 bronze badges
2
- You can safely use the mode in your first example – aborruso Commented Mar 14 at 7:27
- 1 Both solutions are correct in regard to the CSV file format spec, the first one being probably more widely supported – Fravadona Commented Mar 14 at 9:51
2 Answers
Reset to default 2The spec actually doesn't say it can't end in a comma it says:
The last field in the
record must not be followed by a comma
So your example tells the parser there are still 3 fields, it's just that the last one is empty. That being said I've seen both styles: empty or double quotes and unfortunately a parser has to handle both.
Also worth mentioning, is not showing here are the hidden characters such as CRLF (Carriage return and line feed respectively). So even your first example, if you open in notepad++ or the like, and turn on "Show all characters" it may actually look like this:
field1,field2,field3CRLF
abcdef,ghijkl,CRLF
aaaaaa,bbbbbb,cccccc
(NOTE: Linux is likely just to have LF, where Windows will have CRLF).
So again, you're not technically ending the line in a comma and the CR and/or LF tell the parser this record is done, and move to the next line for the next record.
Big picture, you cannot count on all CSV parsers to do even the same thing, let alone the right thing:
Due to lack of a single specification, there are considerable differences among implementations.
I think you can make assumptions about common, popular ones, though, and in my experience, for either of these two inputs:
-- no quote -- -- empty quote --
Col1,Col2,Col3 Col1,Col2,Col3
aaaa,bbbb, aaaa,bbbb,""
zzzz,yyyy,xxxx zzzz,yyyy,xxxx
you can expect a good parser to produce a data structure, like:
[
[ Col1, Col2, Col3 ],
[ aaaa, bbbb, ],
[ zzzz, yyyy, xxxx ],
]
you can also expect that if you leave the trailing comma off:
Col1,Col2,Col3
aaaa,bbbb
zzzz,yyyy,xxxx
then the parser will see only two fields for the first record:
Each line should contain the same number of fields throughout the file.
Some parsers care about this discrepancy by default (e.g., Golang); some parsers can be configured to care (e.g., Deno's jsr:@std/csv, npm:csv-parser). I couldn't find an option in Python's csv module for this.
If that input did pass parsing, the consumer would most likely see some data like:
[
[ Col1, Col2, Col3 ],
[ aaaa, bbbb ],
[ zzzz, yyyy, xxxx ],
]