I used the pandas library to manipulate the original data down to a simple 2 column csv file and rename it to a text file. The file has triple double quotes that I need replaced with single double quotes. Every line of the file is formatted as:
"""QA12345""","""Some Other Text"""
What I need is:
"QA12345","Some Other Text"
This python snippet wipes out the file after it finishes.
with fileinput.FileInput(input_file, inplace=True) as f:
next(f)
for line in f:
line.replace("""""", '"')
It doesn't work with
line.replace('"""', '"')
either.
I've also tried adjusting the input values to be '"Some Other Text"'
and variations ("""
and '\"'
) but nothing seems to work. I believe the triple quote is causing the issue, but I don't know what I have to do to.
I used the pandas library to manipulate the original data down to a simple 2 column csv file and rename it to a text file. The file has triple double quotes that I need replaced with single double quotes. Every line of the file is formatted as:
"""QA12345""","""Some Other Text"""
What I need is:
"QA12345","Some Other Text"
This python snippet wipes out the file after it finishes.
with fileinput.FileInput(input_file, inplace=True) as f:
next(f)
for line in f:
line.replace("""""", '"')
It doesn't work with
line.replace('"""', '"')
either.
I've also tried adjusting the input values to be '"Some Other Text"'
and variations ("""
and '\"'
) but nothing seems to work. I believe the triple quote is causing the issue, but I don't know what I have to do to.
4 Answers
Reset to default 3That file looks to be in RFC 4180 format. ""
is how you write a single "
inside a double-quoted field.
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
"""QA12345""","""Some Other Text"""
is the two values "QA12345"
and "Some Other Text"
. If you manually change the """
to "
you're changing the data in the file.
Read it with the csv
library and it will handle the escaping for you. Then, if you actually want to strip the surrounding quotes and change "QA12345"
to QA12345
, do so on the parsed data and write it back out.
import csv
import re
import sys
writer = csv.writer(sys.stdout)
for row in csv.reader(['"""QA12345""","""Some Other Text"""']):
# ['"QA12345"', '"Some Other Text"']
print(row)
# Strip " off the front and back of each item in the row.
row[:] = [ re.sub(r'^"(.*)"$', r'\1', item) for item in row]
# ['QA12345', 'Some Other Text']
print(row)
# QA12345,Some Other Text
writer.writerow(row)
It doesn't work with
line.replace('"""', '"')
replace
does not replace in-place (say that three times fast). It returns a new string. So it would be line = line.replace('"""', '"')
Judging by the OP's code, I guess the aim are
- Edit file in-place
- keep the first line unmodified
- For the rest of the lines, replace 3 double quotes with a single one
My solution is almost the same, but with print()
:
with fileinput.input(input_file, inplace=True) as stream:
print(next(stream), end="")
for line in stream:
print(line.replace('"""', '"'), end="")
That should give the desired result.
You can use:
import pandas as pd
import numpy as np
data = pd.read_csv("input_file.csv", header=None)
np.savetxt("output_file.txt", data, delimiter=",", fmt="%s")
Which, if the input file contains:
"""QA12345""","""Some Other Text"""
Then the output file will contain:
"QA12345","Some Other Text"
fiddle
The built-in csv module can handle this and is (probably) more lightweight than pandas.
Which means that you could just do this:
import csv
IN = "foo_in.csv"
OUT = "foo_out.csv"
with open(IN, mode="r", newline="") as in_data, open(OUT, mode="w", newline="") as out_data:
writer = csv.writer(out_data, quotechar=None)
for row in csv.reader(in_data):
writer.writerow(row)
line.replace(...)
... if this is all your code, then I don't know why you expect it to do anything. Why usefileinput
anyway? – juanpa.arrivillaga Commented Feb 6 at 22:19inplace
then I suggest not, note, it assumes you write tosys.stdout
(e.g. withprint
or even directly...) which is why your file is blank no matter what you do! I suggest just usingopen
and manually creating the backup file. Frankly, thefileinput
approach is very weird. – juanpa.arrivillaga Commented Feb 6 at 22:24for line in f: print(line.replace('"""', '"'), end='')
and don't forget toprint(next(f), end='')
at the top.... keeping in mind, again, that printing to standard output has been implicitly replaced with your file, but this is a very weird way to go about this, IMO – juanpa.arrivillaga Commented Feb 6 at 22:30pd.read_csv()
? I think it will do what you want. – Barmar Commented Feb 6 at 23:34""""""
is a triple double-quoted empty string.'"""'
is a string with 3 double quotes. – Mark Tolonen Commented Feb 6 at 23:47