I have a C# console app on .NET 6.0, using EF Core 6.0.35.
I am making an API call to get data (from Geotab) that I filter, transform and insert into my DB. Each insert is 1000 to 2000 records.
My table in SQL Server 2022 (v16.0.1135.2) looks like this:
CREATE TABLE [dbo].[DiagnosticData]
(
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[DeviceId] [int] NOT NULL,
[DiagnosticId] [int] NOT NULL,
[Value] [float] NOT NULL,
[DateTime] [datetime2](7) NOT NULL,
[UpdateDateTime] [datetime2](7) NOT NULL
)
Corresponding model class:
public class DiagnosticData
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public Int64? Id { get; set; }
[Column("DeviceId")]
public int DeviceId { get; set; }
[Column("DiagnosticId")]
public int DiagnosticId { get; set; }
[Required]
public Double Value { get; set; }
[Required]
public DateTime DateTime { get; set; }
[Required]
public DateTime UpdateDateTime { get; set; } = DateTime.UtcNow;
}
I have tried the following three methods of inserting data:
ONE: Basic EF Core code:
public static async Task BulkInsertAsyncEF(IEnumerable<TEntity> entities)
{
await using (PCContext _context = new PCContext())
{
_context.AddRangeAsync(entities);
await _context.SaveChangesAsync();
}
}
TWO: (not my preferred method, but more as an experiment to test execution time)
public static async Task BulkInsertSqlAsync(IEnumerable<DiagnosticData> diagnosticData)
{
await using (PCContext _context = new PCContext())
{
StringBuilder sb = new StringBuilder();
foreach (DiagnosticData dd in diagnosticData)
{
sb.Append("INSERT INTO DiagnosticData (DeviceId, DiagnosticId, Value, DateTime, UpdateDateTime) values (");
sb.Append(dd.DeviceId);
sb.Append(',');
sb.Append(dd.DiagnosticId);
sb.Append(',');
sb.Append(dd.Value);
sb.Append(",'");
sb.Append(dd.DateTime.ToString());
sb.AppendLine("',getdate());");
}
await _context.Database.ExecuteSqlRawAsync(sb.ToString());
}
}
THREE: Entity Framework Bulk Extensions
public static async Task BulkInsertAsync(IEnumerable<DiagnosticData> entities)
{
await using (PCContext _context = new PCContext())
{
using var transaction = _context.Database.BeginTransaction();
await _context.BulkInsertAsync(entities, options => { options.IncludeGraph = false; options.PreserveInsertOrder = false; });
transaction.Commit();
}
}
They're executed as follows:
IEnumerable<PC.DiagnosticData> diagnosticData = new List<PC.DiagnosticData>();
diagnosticData = statusData.Select(sd => new PC.DiagnosticData()
{
DateTime = sd.DateTime.Value,
DeviceId = devices.First(dv => dv.DeviceId == sd.Device.Id.ToString()).Id,
DiagnosticId = diagnostics.FirstOrDefault(di => di.DiagnosticId == sd.Diagnostic.ToString()).Id,
Value = sd.Data.Value
}
);
// await PC.DiagnosticData.BulkInsertSqlAsync(diagnosticData);
await PC.DiagnosticData.BulkInsertAsync(diagnosticData);
// await PC.DiagnosticData.BulkInsertAsyncEF(diagnosticData);
statusData
holds the response from the Geotab API
- Method one takes 10 to 15 seconds to insert ~1000 records, which is way too slow according to my gut instinct (and this page).
- Method two was faster at ~2 seconds for ~1000 records (still too slow!)
- Method three was about the same as method one.
None of the records will EVER exist in the DB already, so an update will never be necessary.
I have tried on two separate instances of SQL Server and got the same results (one is my local dev machine with no other traffic)
Considering that methods one and three take roughly the same amount of time suggests that it's an issue at the EF level, but what?
I have a C# console app on .NET 6.0, using EF Core 6.0.35.
I am making an API call to get data (from Geotab) that I filter, transform and insert into my DB. Each insert is 1000 to 2000 records.
My table in SQL Server 2022 (v16.0.1135.2) looks like this:
CREATE TABLE [dbo].[DiagnosticData]
(
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[DeviceId] [int] NOT NULL,
[DiagnosticId] [int] NOT NULL,
[Value] [float] NOT NULL,
[DateTime] [datetime2](7) NOT NULL,
[UpdateDateTime] [datetime2](7) NOT NULL
)
Corresponding model class:
public class DiagnosticData
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public Int64? Id { get; set; }
[Column("DeviceId")]
public int DeviceId { get; set; }
[Column("DiagnosticId")]
public int DiagnosticId { get; set; }
[Required]
public Double Value { get; set; }
[Required]
public DateTime DateTime { get; set; }
[Required]
public DateTime UpdateDateTime { get; set; } = DateTime.UtcNow;
}
I have tried the following three methods of inserting data:
ONE: Basic EF Core code:
public static async Task BulkInsertAsyncEF(IEnumerable<TEntity> entities)
{
await using (PCContext _context = new PCContext())
{
_context.AddRangeAsync(entities);
await _context.SaveChangesAsync();
}
}
TWO: (not my preferred method, but more as an experiment to test execution time)
public static async Task BulkInsertSqlAsync(IEnumerable<DiagnosticData> diagnosticData)
{
await using (PCContext _context = new PCContext())
{
StringBuilder sb = new StringBuilder();
foreach (DiagnosticData dd in diagnosticData)
{
sb.Append("INSERT INTO DiagnosticData (DeviceId, DiagnosticId, Value, DateTime, UpdateDateTime) values (");
sb.Append(dd.DeviceId);
sb.Append(',');
sb.Append(dd.DiagnosticId);
sb.Append(',');
sb.Append(dd.Value);
sb.Append(",'");
sb.Append(dd.DateTime.ToString());
sb.AppendLine("',getdate());");
}
await _context.Database.ExecuteSqlRawAsync(sb.ToString());
}
}
THREE: Entity Framework Bulk Extensions
public static async Task BulkInsertAsync(IEnumerable<DiagnosticData> entities)
{
await using (PCContext _context = new PCContext())
{
using var transaction = _context.Database.BeginTransaction();
await _context.BulkInsertAsync(entities, options => { options.IncludeGraph = false; options.PreserveInsertOrder = false; });
transaction.Commit();
}
}
They're executed as follows:
IEnumerable<PC.DiagnosticData> diagnosticData = new List<PC.DiagnosticData>();
diagnosticData = statusData.Select(sd => new PC.DiagnosticData()
{
DateTime = sd.DateTime.Value,
DeviceId = devices.First(dv => dv.DeviceId == sd.Device.Id.ToString()).Id,
DiagnosticId = diagnostics.FirstOrDefault(di => di.DiagnosticId == sd.Diagnostic.ToString()).Id,
Value = sd.Data.Value
}
);
// await PC.DiagnosticData.BulkInsertSqlAsync(diagnosticData);
await PC.DiagnosticData.BulkInsertAsync(diagnosticData);
// await PC.DiagnosticData.BulkInsertAsyncEF(diagnosticData);
statusData
holds the response from the Geotab API
- Method one takes 10 to 15 seconds to insert ~1000 records, which is way too slow according to my gut instinct (and this page).
- Method two was faster at ~2 seconds for ~1000 records (still too slow!)
- Method three was about the same as method one.
None of the records will EVER exist in the DB already, so an update will never be necessary.
I have tried on two separate instances of SQL Server and got the same results (one is my local dev machine with no other traffic)
Considering that methods one and three take roughly the same amount of time suggests that it's an issue at the EF level, but what?
Share Improve this question edited Dec 4, 2024 at 17:22 marc_s 753k183 gold badges1.4k silver badges1.5k bronze badges asked Dec 4, 2024 at 17:11 Adam HeyAdam Hey 1,6812 gold badges23 silver badges30 bronze badges 9 | Show 4 more comments1 Answer
Reset to default 3If you are looking to do bulk inserts with the shortest processing time, you need to use the Bulk Copy operations that are included for the System.Data.SqlClient namespace. Entity Framework is going to build individual insert statements because it's not designed to utilize many native approaches. Individual insert statements will always take more time.
You will need to employ the class called SqlBulkCopy. The approach will be speedy, but the obvious tradeoff is that the code you write will be specific to SQL Server and won't translate to other DBMS systems as Entity Framework is designed to do.
One important detail - you can inject data from a DataTable or a DataReader. The DataTable is RAM-based, whereas the DataReader is from a database-connected source.
Read more here.
BulkInsert
instead ofBulkInsertAsync
? – mjwills Commented Dec 4, 2024 at 22:49