最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql server - SQL BULK INSERT Format Issues - Stack Overflow

programmeradmin2浏览0评论

I'm using SQL Server and Management Studio.

I was getting data conversion errors when performing a BULK INSERT into a table, that I could not resolve and switching to using a format file allowed data to be inserted, but the data inserted was wrong.

I initially had ~2000 rows, but reduced down and can reproduce the same issue with a single row of data.

Here are the database table declarations:

-- Create the Signals table
CREATE TABLE Signals 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SignalName NVARCHAR(255) NOT NULL,
    SignalUnit NVARCHAR(50)
);
GO

-- Create the Series table
CREATE TABLE Series 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesType NVARCHAR(255) NOT NULL,
    ReferenceSignalID INT,
    FOREIGN KEY (ReferenceSignalID) REFERENCES Signals(ID)
);
GO

-- Create the SeriesData table
CREATE TABLE SeriesData 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesID INT,
    SignalID INT,
    SignalValue FLOAT NULL, -- I queried the precision of this and it is "53"
    ReferenceSignalValueID INT NULL,
    FOREIGN KEY (SeriesID) REFERENCES Series(ID),
    FOREIGN KEY (SignalID) REFERENCES Signals(ID),
    FOREIGN KEY (ReferenceSignalValueID) REFERENCES SeriesData(ID)
);
GO

Example CSV content (reduced from ~2000 rows as example):

29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,

There are no headers, but columns represent SeriesID | SignalID | SignalValue | ReferenceSignalValueID (all blank in this example - to be NULL in the database)

Line ending characters are "CR""LF" - encoding is UTF-8.

I scanned down and cannot see any rogue/hidden/invalid characters, all formats seem to match the target.

SQL code doing the BULK INSERT:

BULK INSERT SeriesData
FROM 'myPath\myData.csv'
WITH (
    --FIELDTERMINATOR = ',',
    --ROWTERMINATOR = '\r\n', -- also "\n" has an issue, but more errors
    FORMATFILE = 'myPath\SeriesData.fmt',
    CODEPAGE = '65001' -- obtained this from AI chats for UTF-8 encoding
    --KEEPNULLS -- no change using this
)

The .fmt file I have contains:

10.0
4
1       SQLINT              0       4       ","     1     SeriesID               ""
2       SQLINT              0       4       ","     2     SignalID               ""
3       SQLFLT8             0       8       ","     3     SignalValue            ""
4       SQLINT              0       4       "\r\n"  4     ReferenceSignalValueID ""

(received this from an AI chat)

The result in the SeriesData table is like this:

ID      SeriesID    SignalID    ReferenceSignalValueID
----------------------------------------------------------
6166    3355961         0   NULL    NULL
6167    3355961         0   NULL    NULL
6168    3355961         0   NULL    NULL

For every row of the CSV the SeriesID is always 3355961 instead of 29, the SignalID is always 0 instead of 953, and the SignalID is always NULL instead of the actual float value.

The AI chat I was using to debug and learn about how to do this suggested the following .fmt content (but using this resulted in the same type mismatch error):

10.0
4
1       SQLCHAR             0       12      ","     1     SeriesID               ""
2       SQLCHAR             0       12      ","     2     SignalID               ""
3       SQLCHAR             0       20      ","     3     SignalValue            ""
4       SQLCHAR             0       12      "\r\n"  4     ReferenceSignalValueID ""

Does anyone know how I can get SQL to interpret the values correctly, or is there something else fundamentally wrong?

For further info, previous attempts were using FIELDTERMINATOR and ROWTERMINATOR (as commented-out above) and in those cases I was getting an error:

Msg 4864, Level 16, State 1, Line 79
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (SignalID).

This is concerning seeing as SignalID should be column 2, not 3 - does this suggest something wrong with the CSV or SQL reading of it and the .fmt file approach is not sufficient/distracting from the "real" issue?

I'm using SQL Server and Management Studio.

I was getting data conversion errors when performing a BULK INSERT into a table, that I could not resolve and switching to using a format file allowed data to be inserted, but the data inserted was wrong.

I initially had ~2000 rows, but reduced down and can reproduce the same issue with a single row of data.

Here are the database table declarations:

-- Create the Signals table
CREATE TABLE Signals 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SignalName NVARCHAR(255) NOT NULL,
    SignalUnit NVARCHAR(50)
);
GO

-- Create the Series table
CREATE TABLE Series 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesType NVARCHAR(255) NOT NULL,
    ReferenceSignalID INT,
    FOREIGN KEY (ReferenceSignalID) REFERENCES Signals(ID)
);
GO

-- Create the SeriesData table
CREATE TABLE SeriesData 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesID INT,
    SignalID INT,
    SignalValue FLOAT NULL, -- I queried the precision of this and it is "53"
    ReferenceSignalValueID INT NULL,
    FOREIGN KEY (SeriesID) REFERENCES Series(ID),
    FOREIGN KEY (SignalID) REFERENCES Signals(ID),
    FOREIGN KEY (ReferenceSignalValueID) REFERENCES SeriesData(ID)
);
GO

Example CSV content (reduced from ~2000 rows as example):

29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,

There are no headers, but columns represent SeriesID | SignalID | SignalValue | ReferenceSignalValueID (all blank in this example - to be NULL in the database)

Line ending characters are "CR""LF" - encoding is UTF-8.

I scanned down and cannot see any rogue/hidden/invalid characters, all formats seem to match the target.

SQL code doing the BULK INSERT:

BULK INSERT SeriesData
FROM 'myPath\myData.csv'
WITH (
    --FIELDTERMINATOR = ',',
    --ROWTERMINATOR = '\r\n', -- also "\n" has an issue, but more errors
    FORMATFILE = 'myPath\SeriesData.fmt',
    CODEPAGE = '65001' -- obtained this from AI chats for UTF-8 encoding
    --KEEPNULLS -- no change using this
)

The .fmt file I have contains:

10.0
4
1       SQLINT              0       4       ","     1     SeriesID               ""
2       SQLINT              0       4       ","     2     SignalID               ""
3       SQLFLT8             0       8       ","     3     SignalValue            ""
4       SQLINT              0       4       "\r\n"  4     ReferenceSignalValueID ""

(received this from an AI chat)

The result in the SeriesData table is like this:

ID      SeriesID    SignalID    ReferenceSignalValueID
----------------------------------------------------------
6166    3355961         0   NULL    NULL
6167    3355961         0   NULL    NULL
6168    3355961         0   NULL    NULL

For every row of the CSV the SeriesID is always 3355961 instead of 29, the SignalID is always 0 instead of 953, and the SignalID is always NULL instead of the actual float value.

The AI chat I was using to debug and learn about how to do this suggested the following .fmt content (but using this resulted in the same type mismatch error):

10.0
4
1       SQLCHAR             0       12      ","     1     SeriesID               ""
2       SQLCHAR             0       12      ","     2     SignalID               ""
3       SQLCHAR             0       20      ","     3     SignalValue            ""
4       SQLCHAR             0       12      "\r\n"  4     ReferenceSignalValueID ""

Does anyone know how I can get SQL to interpret the values correctly, or is there something else fundamentally wrong?

For further info, previous attempts were using FIELDTERMINATOR and ROWTERMINATOR (as commented-out above) and in those cases I was getting an error:

Msg 4864, Level 16, State 1, Line 79
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (SignalID).

This is concerning seeing as SignalID should be column 2, not 3 - does this suggest something wrong with the CSV or SQL reading of it and the .fmt file approach is not sufficient/distracting from the "real" issue?

Share Improve this question edited 2 hours ago marc_s 755k184 gold badges1.4k silver badges1.5k bronze badges asked 4 hours ago WillWill 436 bronze badges 2
  • 1 Should definitely be SQLCHAR for all columns, because you have text not native binary ints in the file. Have you tried , FORMAT='CSV' – Charlieface Commented 3 hours ago
  • What is your SQL Server version (SELECT @@VERSION;)? – Yitzhak Khabinsky Commented 3 hours ago
Add a comment  | 

1 Answer 1

Reset to default 0

Please try the following solution that will work starting from SQL Server 2017 onwards.

Notable points:

  • No format file is needed.
  • Please pay attention to a db view dbo.vSeriesData to accommodate presence of IDENTITY column. The BULK INSERT loads data into the target table through the view.

e:\Temp\myData.csv file

29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,

SQL

USE tempdb;
GO

DROP VIEW IF EXISTS dbo.vSeriesData;
DROP TABLE IF EXISTS dbo.SeriesData;
GO

CREATE TABLE dbo.SeriesData (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesID INT,
    SignalID INT,
    SignalValue FLOAT NULL,
    ReferenceSignalValueID INT NULL
);
GO

CREATE VIEW dbo.vSeriesData
AS 
SELECT SeriesID,
    SignalID,
    SignalValue,
    ReferenceSignalValueID
FROM dbo.SeriesData;
GO

BULK INSERT dbo.vSeriesData
FROM N'e:\Temp\myData.csv'
WITH (
   FORMAT='CSV' -- starting from SQL Server 2017 onwards
   , FIRSTROW = 1
   , FIELDTERMINATOR = ','
   , ROWTERMINATOR = '\n');

-- test
SELECT * FROM dbo.SeriesData;

Output

ID SeriesID SignalID SignalValue ReferenceSignalValueID
1 29 953 0 NULL
2 29 953 0.0100021362304688 NULL
3 29 953 0.0200042724609375 NULL
发布评论

评论列表(0)

  1. 暂无评论