I'm working on a java program that imports a csv file into a Postgresql database. Some of the lines in the csv files contain escaped commas, and commas are also the delimiter. When I import this file with mysql it works fine, but with postgresql I get an error saying there's more tokens in the data than there are columns to import it into (.postgresql.util.PSQLException: ERROR: extra data after last expected column
)
How can I get COPY to import these lines with the escaped commas treated as text instead of as delimiters?
My COPY command
"COPY temporary_%s (%s) FROM STDIN with csv"
;
And for comparison, the mysql version works
"LOAD DATA LOCAL INFILE ? "
+ "INTO TABLE temporary_%s "
+ "FIELDS TERMINATED BY ',' ENCLOSED BY '\"' "
+ "LINES TERMINATED BY '\n' "
+ "(%s)";
I've been trying to append something like ESCAPE '\\'
to the postgresql statement but it hasn't appeared to do anything. Solutions that I've seen suggest wrapping every chunk of data in quotes, but I don't have control over the contents of the csv file, and it's quite huge (up to ~8gb) so I'd prefer to not have to do pre-processing on it if that's not necessary.
And the java
BaseConnection physicalConnection =
(BaseConnection) connection.unwrap(Class.forName(".postgresql.core.BaseConnection"));
CopyManager copyManager = new CopyManager(physicalConnection);
FileReader fileReader = new FileReader(fileName);
copyManager.copyIn(String.format(PROCEDURE_POSTGRES, tableName, columns), fileReader);
And a sample line of data with this problem (the \,
)
0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000,1545585645000,0.3.0,1,NULL,NULL
edit to answer qs: The data type for these columns is charvar(255). What I'd like to see is the commas being inserted as data, e.g. in this case 0.2,4
should be the data in the first column.
Full error message (with names scrubbed for privacy)
2025-02-04 12:00:40 2025-02-04 17:00:40.374 UTC [89725] STATEMENT: COPY table_name (***, ***, ***, ***, ***, ***, ***, ***, ***) FROM STDIN with csv"
2025-02-04 12:03:48 2025-02-04 17:03:48.733 UTC [89890] ERROR: extra data after last expected column
2025-02-04 12:03:48 2025-02-04 17:03:48.733 UTC [89890] CONTEXT: COPY table_name, line 10: " 0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000,1..."
also, I found a solution(?) where I don't use CSV
. I don't know why this is working but it is - all the lines are being imported and they look correct. This might turn out to be all I need (doing more testing)
"COPY temporary_%s (%s) FROM STDIN DELIMITER ','";
I'm working on a java program that imports a csv file into a Postgresql database. Some of the lines in the csv files contain escaped commas, and commas are also the delimiter. When I import this file with mysql it works fine, but with postgresql I get an error saying there's more tokens in the data than there are columns to import it into (.postgresql.util.PSQLException: ERROR: extra data after last expected column
)
How can I get COPY to import these lines with the escaped commas treated as text instead of as delimiters?
My COPY command
"COPY temporary_%s (%s) FROM STDIN with csv"
;
And for comparison, the mysql version works
"LOAD DATA LOCAL INFILE ? "
+ "INTO TABLE temporary_%s "
+ "FIELDS TERMINATED BY ',' ENCLOSED BY '\"' "
+ "LINES TERMINATED BY '\n' "
+ "(%s)";
I've been trying to append something like ESCAPE '\\'
to the postgresql statement but it hasn't appeared to do anything. Solutions that I've seen suggest wrapping every chunk of data in quotes, but I don't have control over the contents of the csv file, and it's quite huge (up to ~8gb) so I'd prefer to not have to do pre-processing on it if that's not necessary.
And the java
BaseConnection physicalConnection =
(BaseConnection) connection.unwrap(Class.forName(".postgresql.core.BaseConnection"));
CopyManager copyManager = new CopyManager(physicalConnection);
FileReader fileReader = new FileReader(fileName);
copyManager.copyIn(String.format(PROCEDURE_POSTGRES, tableName, columns), fileReader);
And a sample line of data with this problem (the \,
)
0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000,1545585645000,0.3.0,1,NULL,NULL
edit to answer qs: The data type for these columns is charvar(255). What I'd like to see is the commas being inserted as data, e.g. in this case 0.2,4
should be the data in the first column.
Full error message (with names scrubbed for privacy)
2025-02-04 12:00:40 2025-02-04 17:00:40.374 UTC [89725] STATEMENT: COPY table_name (***, ***, ***, ***, ***, ***, ***, ***, ***) FROM STDIN with csv"
2025-02-04 12:03:48 2025-02-04 17:03:48.733 UTC [89890] ERROR: extra data after last expected column
2025-02-04 12:03:48 2025-02-04 17:03:48.733 UTC [89890] CONTEXT: COPY table_name, line 10: " 0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000,1..."
also, I found a solution(?) where I don't use CSV
. I don't know why this is working but it is - all the lines are being imported and they look correct. This might turn out to be all I need (doing more testing)
"COPY temporary_%s (%s) FROM STDIN DELIMITER ','";
Share
Improve this question
edited Feb 4 at 18:18
b12
asked Feb 4 at 17:29
b12b12
355 bronze badges
9
|
Show 4 more comments
1 Answer
Reset to default 2From here COPY:
File Formats
Text Format
Backslash characters () can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character.
Therefore, using psql
:
\! cat backslash.csv
0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000
.5,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1545585645000
create table csv_import (fld1 varchar, fld2 varchar, fld3 varchar);
\copy csv_import from backslash.csv delimiter ','
COPY 2
select * from csv_import ;
fld1 | fld2 | fld3
-------+------------------------------------------+---------------
0.2,4 | 000088797e02a729c72ff07bb9c5a84ef27e6b7c | 1541592473000
.5 | 000088797e02a729c72ff07bb9c5a84ef27e6b7c | 1545585645000
\copy
being a client(psql
) side instance of the COPY
command per:
https://www.postgresql./docs/current/app-psql.html
Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.
CSV
formatDELIMITER ','
is redundant as that is the default forCSV
. 2)ESCAPE
is for escaping theQUOTE
value which by default forCSV
is"
. 3) What type is0.2\,4
supposed to end up as? 4) What is the complete error message from the Postgres log? Add information as update to question text. – Adrian Klaver Commented Feb 4 at 17:590.2\,4
should be a value for a single column, you must put this content between double quotes " in your csv. Or don't treat it as csv, since it's not csv, what seems to work. – Frank Heikens Commented Feb 4 at 18:25COPY
command and you want to only enter 8 values if I count correctly. 2) Your second form is using thetext
format. Which from here COPY: Backslash characters () can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters.. – Adrian Klaver Commented Feb 4 at 18:28\,
don't. I may have got the column count wrong in transcription while removing columns to simplify the question and get rid of proprietary data, 2.) edit: ahh I think see what you're saying now. So in the CSV format, backslash can't be used to escape data characters but in the text format, it can? – b12 Commented Feb 4 at 18:30"0.2,4"
. You won't be able to load that file withCOPY
. – Laurenz Albe Commented Feb 4 at 21:00