I'm using SQL Server and Management Studio.
I was getting data conversion errors when performing a BULK INSERT
into a table, that I could not resolve and switching to using a format file allowed data to be inserted, but the data inserted was wrong.
I initially had ~2000 rows, but reduced down and can reproduce the same issue with a single row of data.
Here are the database table declarations:
-- Create the Signals table
CREATE TABLE Signals
(
ID INT IDENTITY(1,1) PRIMARY KEY,
SignalName NVARCHAR(255) NOT NULL,
SignalUnit NVARCHAR(50)
);
GO
-- Create the Series table
CREATE TABLE Series
(
ID INT IDENTITY(1,1) PRIMARY KEY,
SeriesType NVARCHAR(255) NOT NULL,
ReferenceSignalID INT,
FOREIGN KEY (ReferenceSignalID) REFERENCES Signals(ID)
);
GO
-- Create the SeriesData table
CREATE TABLE SeriesData
(
ID INT IDENTITY(1,1) PRIMARY KEY,
SeriesID INT,
SignalID INT,
SignalValue FLOAT NULL, -- I queried the precision of this and it is "53"
ReferenceSignalValueID INT NULL,
FOREIGN KEY (SeriesID) REFERENCES Series(ID),
FOREIGN KEY (SignalID) REFERENCES Signals(ID),
FOREIGN KEY (ReferenceSignalValueID) REFERENCES SeriesData(ID)
);
GO
Example CSV content (reduced from ~2000 rows as example):
29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,
There are no headers, but columns represent SeriesID | SignalID | SignalValue | ReferenceSignalValueID
(all blank in this example - to be NULL in the database)
Line ending characters are "CR""LF" - encoding is UTF-8.
I scanned down and cannot see any rogue/hidden/invalid characters, all formats seem to match the target.
SQL code doing the BULK INSERT
:
BULK INSERT SeriesData
FROM 'myPath\myData.csv'
WITH (
--FIELDTERMINATOR = ',',
--ROWTERMINATOR = '\r\n', -- also "\n" has an issue, but more errors
FORMATFILE = 'myPath\SeriesData.fmt',
CODEPAGE = '65001' -- obtained this from AI chats for UTF-8 encoding
--KEEPNULLS -- no change using this
)
The .fmt
file I have contains:
10.0
4
1 SQLINT 0 4 "," 1 SeriesID ""
2 SQLINT 0 4 "," 2 SignalID ""
3 SQLFLT8 0 8 "," 3 SignalValue ""
4 SQLINT 0 4 "\r\n" 4 ReferenceSignalValueID ""
(received this from an AI chat)
The result in the SeriesData
table is like this:
ID SeriesID SignalID ReferenceSignalValueID
----------------------------------------------------------
6166 3355961 0 NULL NULL
6167 3355961 0 NULL NULL
6168 3355961 0 NULL NULL
For every row of the CSV the SeriesID
is always 3355961 instead of 29, the SignalID
is always 0 instead of 953, and the SignalID
is always NULL instead of the actual float value.
The AI chat I was using to debug and learn about how to do this suggested the following .fmt
content (but using this resulted in the same type mismatch error):
10.0
4
1 SQLCHAR 0 12 "," 1 SeriesID ""
2 SQLCHAR 0 12 "," 2 SignalID ""
3 SQLCHAR 0 20 "," 3 SignalValue ""
4 SQLCHAR 0 12 "\r\n" 4 ReferenceSignalValueID ""
Does anyone know how I can get SQL to interpret the values correctly, or is there something else fundamentally wrong?
For further info, previous attempts were using FIELDTERMINATOR
and ROWTERMINATOR
(as commented-out above) and in those cases I was getting an error:
Msg 4864, Level 16, State 1, Line 79
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (SignalID).
This is concerning seeing as SignalID
should be column 2, not 3 - does this suggest something wrong with the CSV or SQL reading of it and the .fmt file approach is not sufficient/distracting from the "real" issue?
I'm using SQL Server and Management Studio.
I was getting data conversion errors when performing a BULK INSERT
into a table, that I could not resolve and switching to using a format file allowed data to be inserted, but the data inserted was wrong.
I initially had ~2000 rows, but reduced down and can reproduce the same issue with a single row of data.
Here are the database table declarations:
-- Create the Signals table
CREATE TABLE Signals
(
ID INT IDENTITY(1,1) PRIMARY KEY,
SignalName NVARCHAR(255) NOT NULL,
SignalUnit NVARCHAR(50)
);
GO
-- Create the Series table
CREATE TABLE Series
(
ID INT IDENTITY(1,1) PRIMARY KEY,
SeriesType NVARCHAR(255) NOT NULL,
ReferenceSignalID INT,
FOREIGN KEY (ReferenceSignalID) REFERENCES Signals(ID)
);
GO
-- Create the SeriesData table
CREATE TABLE SeriesData
(
ID INT IDENTITY(1,1) PRIMARY KEY,
SeriesID INT,
SignalID INT,
SignalValue FLOAT NULL, -- I queried the precision of this and it is "53"
ReferenceSignalValueID INT NULL,
FOREIGN KEY (SeriesID) REFERENCES Series(ID),
FOREIGN KEY (SignalID) REFERENCES Signals(ID),
FOREIGN KEY (ReferenceSignalValueID) REFERENCES SeriesData(ID)
);
GO
Example CSV content (reduced from ~2000 rows as example):
29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,
There are no headers, but columns represent SeriesID | SignalID | SignalValue | ReferenceSignalValueID
(all blank in this example - to be NULL in the database)
Line ending characters are "CR""LF" - encoding is UTF-8.
I scanned down and cannot see any rogue/hidden/invalid characters, all formats seem to match the target.
SQL code doing the BULK INSERT
:
BULK INSERT SeriesData
FROM 'myPath\myData.csv'
WITH (
--FIELDTERMINATOR = ',',
--ROWTERMINATOR = '\r\n', -- also "\n" has an issue, but more errors
FORMATFILE = 'myPath\SeriesData.fmt',
CODEPAGE = '65001' -- obtained this from AI chats for UTF-8 encoding
--KEEPNULLS -- no change using this
)
The .fmt
file I have contains:
10.0
4
1 SQLINT 0 4 "," 1 SeriesID ""
2 SQLINT 0 4 "," 2 SignalID ""
3 SQLFLT8 0 8 "," 3 SignalValue ""
4 SQLINT 0 4 "\r\n" 4 ReferenceSignalValueID ""
(received this from an AI chat)
The result in the SeriesData
table is like this:
ID SeriesID SignalID ReferenceSignalValueID
----------------------------------------------------------
6166 3355961 0 NULL NULL
6167 3355961 0 NULL NULL
6168 3355961 0 NULL NULL
For every row of the CSV the SeriesID
is always 3355961 instead of 29, the SignalID
is always 0 instead of 953, and the SignalID
is always NULL instead of the actual float value.
The AI chat I was using to debug and learn about how to do this suggested the following .fmt
content (but using this resulted in the same type mismatch error):
10.0
4
1 SQLCHAR 0 12 "," 1 SeriesID ""
2 SQLCHAR 0 12 "," 2 SignalID ""
3 SQLCHAR 0 20 "," 3 SignalValue ""
4 SQLCHAR 0 12 "\r\n" 4 ReferenceSignalValueID ""
Does anyone know how I can get SQL to interpret the values correctly, or is there something else fundamentally wrong?
For further info, previous attempts were using FIELDTERMINATOR
and ROWTERMINATOR
(as commented-out above) and in those cases I was getting an error:
Msg 4864, Level 16, State 1, Line 79
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (SignalID).
This is concerning seeing as SignalID
should be column 2, not 3 - does this suggest something wrong with the CSV or SQL reading of it and the .fmt file approach is not sufficient/distracting from the "real" issue?
1 Answer
Reset to default 0Please try the following solution that will work starting from SQL Server 2017 onwards.
Notable points:
- No format file is needed.
- Please pay attention to a db view dbo.vSeriesData to accommodate
presence of IDENTITY column. The
BULK INSERT
loads data into the target table through the view.
e:\Temp\myData.csv file
29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,
SQL
USE tempdb;
GO
DROP VIEW IF EXISTS dbo.vSeriesData;
DROP TABLE IF EXISTS dbo.SeriesData;
GO
CREATE TABLE dbo.SeriesData (
ID INT IDENTITY(1,1) PRIMARY KEY,
SeriesID INT,
SignalID INT,
SignalValue FLOAT NULL,
ReferenceSignalValueID INT NULL
);
GO
CREATE VIEW dbo.vSeriesData
AS
SELECT SeriesID,
SignalID,
SignalValue,
ReferenceSignalValueID
FROM dbo.SeriesData;
GO
BULK INSERT dbo.vSeriesData
FROM N'e:\Temp\myData.csv'
WITH (
FORMAT='CSV' -- starting from SQL Server 2017 onwards
, FIRSTROW = 1
, FIELDTERMINATOR = ','
, ROWTERMINATOR = '\n');
-- test
SELECT * FROM dbo.SeriesData;
Output
ID | SeriesID | SignalID | SignalValue | ReferenceSignalValueID |
---|---|---|---|---|
1 | 29 | 953 | 0 | NULL |
2 | 29 | 953 | 0.0100021362304688 | NULL |
3 | 29 | 953 | 0.0200042724609375 | NULL |
SQLCHAR
for all columns, because you have text not native binary ints in the file. Have you tried, FORMAT='CSV'
– Charlieface Commented 3 hours agoSELECT @@VERSION;
)? – Yitzhak Khabinsky Commented 3 hours ago