最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Using Python to replace triple double quotes with single double quote in CSV - Stack Overflow

programmeradmin1浏览0评论

I used the pandas library to manipulate the original data down to a simple 2 column csv file and rename it to a text file. The file has triple double quotes that I need replaced with single double quotes. Every line of the file is formatted as:

"""QA12345""","""Some Other Text"""

What I need is:

"QA12345","Some Other Text"

This python snippet wipes out the file after it finishes.

with fileinput.FileInput(input_file, inplace=True) as f:
    next(f)
    for line in f:
        line.replace("""""", '"')

It doesn't work with

line.replace('"""', '"') 

either.

I've also tried adjusting the input values to be '"Some Other Text"' and variations (""" and '\"') but nothing seems to work. I believe the triple quote is causing the issue, but I don't know what I have to do to.

I used the pandas library to manipulate the original data down to a simple 2 column csv file and rename it to a text file. The file has triple double quotes that I need replaced with single double quotes. Every line of the file is formatted as:

"""QA12345""","""Some Other Text"""

What I need is:

"QA12345","Some Other Text"

This python snippet wipes out the file after it finishes.

with fileinput.FileInput(input_file, inplace=True) as f:
    next(f)
    for line in f:
        line.replace("""""", '"')

It doesn't work with

line.replace('"""', '"') 

either.

I've also tried adjusting the input values to be '"Some Other Text"' and variations (""" and '\"') but nothing seems to work. I believe the triple quote is causing the issue, but I don't know what I have to do to.

Share Improve this question edited Feb 7 at 0:37 wjandrea 33k9 gold badges69 silver badges97 bronze badges asked Feb 6 at 22:12 ResourceReaperResourceReaper 5772 gold badges10 silver badges27 bronze badges 8
  • 1 you aren't doing anything with line.replace(...)... if this is all your code, then I don't know why you expect it to do anything. Why use fileinput anyway? – juanpa.arrivillaga Commented Feb 6 at 22:19
  • If you are using it just for inplace then I suggest not, note, it assumes you write to sys.stdout (e.g. with print or even directly...) which is why your file is blank no matter what you do! I suggest just using open and manually creating the backup file. Frankly, the fileinput approach is very weird. – juanpa.arrivillaga Commented Feb 6 at 22:24
  • BUT if you want to keep it, you need something like for line in f: print(line.replace('"""', '"'), end='') and don't forget to print(next(f), end='') at the top.... keeping in mind, again, that printing to standard output has been implicitly replaced with your file, but this is a very weird way to go about this, IMO – juanpa.arrivillaga Commented Feb 6 at 22:30
  • 1 Why aren't you using pd.read_csv()? I think it will do what you want. – Barmar Commented Feb 6 at 23:34
  • 2 FYI: """""" is a triple double-quoted empty string. '"""' is a string with 3 double quotes. – Mark Tolonen Commented Feb 6 at 23:47
 |  Show 3 more comments

4 Answers 4

Reset to default 3

That file looks to be in RFC 4180 format. "" is how you write a single " inside a double-quoted field.

  1. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

"""QA12345""","""Some Other Text""" is the two values "QA12345" and "Some Other Text". If you manually change the """ to " you're changing the data in the file.

Read it with the csv library and it will handle the escaping for you. Then, if you actually want to strip the surrounding quotes and change "QA12345" to QA12345, do so on the parsed data and write it back out.

import csv
import re
import sys

writer = csv.writer(sys.stdout)
for row in csv.reader(['"""QA12345""","""Some Other Text"""']):
    # ['"QA12345"', '"Some Other Text"']
    print(row)

    # Strip " off the front and back of each item in the row.
    row[:] = [ re.sub(r'^"(.*)"$', r'\1', item) for item in row]

    # ['QA12345', 'Some Other Text']
    print(row)

    # QA12345,Some Other Text
    writer.writerow(row)

It doesn't work with line.replace('"""', '"')

replace does not replace in-place (say that three times fast). It returns a new string. So it would be line = line.replace('"""', '"')

Judging by the OP's code, I guess the aim are

  1. Edit file in-place
  2. keep the first line unmodified
  3. For the rest of the lines, replace 3 double quotes with a single one

My solution is almost the same, but with print():

with fileinput.input(input_file, inplace=True) as stream:
    print(next(stream), end="")
    for line in stream:
        print(line.replace('"""', '"'), end="")

That should give the desired result.

You can use:

import pandas as pd
import numpy as np

data = pd.read_csv("input_file.csv", header=None)
np.savetxt("output_file.txt", data, delimiter=",", fmt="%s")

Which, if the input file contains:

"""QA12345""","""Some Other Text"""

Then the output file will contain:

"QA12345","Some Other Text"

fiddle

The built-in csv module can handle this and is (probably) more lightweight than pandas.

Which means that you could just do this:

import csv

IN = "foo_in.csv"
OUT = "foo_out.csv"

with open(IN, mode="r", newline="") as in_data, open(OUT, mode="w", newline="") as out_data:
    writer = csv.writer(out_data, quotechar=None)
    for row in csv.reader(in_data):
        writer.writerow(row)
发布评论

评论列表(0)

  1. 暂无评论