最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c# - EF Bulk Extensions is really slow for me, but why? - Stack Overflow

programmeradmin4浏览0评论

I have a C# console app on .NET 6.0, using EF Core 6.0.35.

I am making an API call to get data (from Geotab) that I filter, transform and insert into my DB. Each insert is 1000 to 2000 records.

My table in SQL Server 2022 (v16.0.1135.2) looks like this:

CREATE TABLE [dbo].[DiagnosticData]
(
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [DeviceId] [int] NOT NULL,
    [DiagnosticId] [int] NOT NULL,
    [Value] [float] NOT NULL,
    [DateTime] [datetime2](7) NOT NULL,
    [UpdateDateTime] [datetime2](7) NOT NULL 
)

Corresponding model class:

public class DiagnosticData 
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public Int64? Id { get; set; }
    [Column("DeviceId")]
    public int DeviceId { get; set; }
    [Column("DiagnosticId")]
    public int DiagnosticId { get; set; }
    [Required]
    public Double Value { get; set; }
    [Required]
    public DateTime DateTime { get; set; }
    [Required]
    public DateTime UpdateDateTime {  get; set; } = DateTime.UtcNow;
}

I have tried the following three methods of inserting data:

ONE: Basic EF Core code:

public static async Task BulkInsertAsyncEF(IEnumerable<TEntity> entities)
{
    await using (PCContext _context = new PCContext())
    {
        _context.AddRangeAsync(entities);
        await _context.SaveChangesAsync();
    }
}

TWO: (not my preferred method, but more as an experiment to test execution time)

public static async Task BulkInsertSqlAsync(IEnumerable<DiagnosticData> diagnosticData)
{
    await using (PCContext _context = new PCContext())
    {
        StringBuilder sb = new StringBuilder();

        foreach (DiagnosticData dd in diagnosticData)
        {
            sb.Append("INSERT INTO DiagnosticData (DeviceId, DiagnosticId, Value, DateTime, UpdateDateTime) values (");
            sb.Append(dd.DeviceId);
            sb.Append(',');
            sb.Append(dd.DiagnosticId);
            sb.Append(',');
            sb.Append(dd.Value);
            sb.Append(",'");
            sb.Append(dd.DateTime.ToString());
            sb.AppendLine("',getdate());");
        }
        await _context.Database.ExecuteSqlRawAsync(sb.ToString());
    }
}

THREE: Entity Framework Bulk Extensions

public static async Task BulkInsertAsync(IEnumerable<DiagnosticData> entities)
{
    await using (PCContext _context = new PCContext())
    {
        using var transaction = _context.Database.BeginTransaction();
        await _context.BulkInsertAsync(entities, options => { options.IncludeGraph = false; options.PreserveInsertOrder = false; }); 
        transaction.Commit();
    }
}

They're executed as follows:

IEnumerable<PC.DiagnosticData> diagnosticData = new List<PC.DiagnosticData>();
diagnosticData = statusData.Select(sd => new PC.DiagnosticData()
{
    DateTime = sd.DateTime.Value,
    DeviceId = devices.First(dv => dv.DeviceId == sd.Device.Id.ToString()).Id,  
    DiagnosticId = diagnostics.FirstOrDefault(di => di.DiagnosticId == sd.Diagnostic.ToString()).Id,
    Value = sd.Data.Value
}
);

// await PC.DiagnosticData.BulkInsertSqlAsync(diagnosticData);
await PC.DiagnosticData.BulkInsertAsync(diagnosticData);
// await PC.DiagnosticData.BulkInsertAsyncEF(diagnosticData);

statusData holds the response from the Geotab API

  • Method one takes 10 to 15 seconds to insert ~1000 records, which is way too slow according to my gut instinct (and this page).
  • Method two was faster at ~2 seconds for ~1000 records (still too slow!)
  • Method three was about the same as method one.

None of the records will EVER exist in the DB already, so an update will never be necessary.

I have tried on two separate instances of SQL Server and got the same results (one is my local dev machine with no other traffic)

Considering that methods one and three take roughly the same amount of time suggests that it's an issue at the EF level, but what?

I have a C# console app on .NET 6.0, using EF Core 6.0.35.

I am making an API call to get data (from Geotab) that I filter, transform and insert into my DB. Each insert is 1000 to 2000 records.

My table in SQL Server 2022 (v16.0.1135.2) looks like this:

CREATE TABLE [dbo].[DiagnosticData]
(
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [DeviceId] [int] NOT NULL,
    [DiagnosticId] [int] NOT NULL,
    [Value] [float] NOT NULL,
    [DateTime] [datetime2](7) NOT NULL,
    [UpdateDateTime] [datetime2](7) NOT NULL 
)

Corresponding model class:

public class DiagnosticData 
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public Int64? Id { get; set; }
    [Column("DeviceId")]
    public int DeviceId { get; set; }
    [Column("DiagnosticId")]
    public int DiagnosticId { get; set; }
    [Required]
    public Double Value { get; set; }
    [Required]
    public DateTime DateTime { get; set; }
    [Required]
    public DateTime UpdateDateTime {  get; set; } = DateTime.UtcNow;
}

I have tried the following three methods of inserting data:

ONE: Basic EF Core code:

public static async Task BulkInsertAsyncEF(IEnumerable<TEntity> entities)
{
    await using (PCContext _context = new PCContext())
    {
        _context.AddRangeAsync(entities);
        await _context.SaveChangesAsync();
    }
}

TWO: (not my preferred method, but more as an experiment to test execution time)

public static async Task BulkInsertSqlAsync(IEnumerable<DiagnosticData> diagnosticData)
{
    await using (PCContext _context = new PCContext())
    {
        StringBuilder sb = new StringBuilder();

        foreach (DiagnosticData dd in diagnosticData)
        {
            sb.Append("INSERT INTO DiagnosticData (DeviceId, DiagnosticId, Value, DateTime, UpdateDateTime) values (");
            sb.Append(dd.DeviceId);
            sb.Append(',');
            sb.Append(dd.DiagnosticId);
            sb.Append(',');
            sb.Append(dd.Value);
            sb.Append(",'");
            sb.Append(dd.DateTime.ToString());
            sb.AppendLine("',getdate());");
        }
        await _context.Database.ExecuteSqlRawAsync(sb.ToString());
    }
}

THREE: Entity Framework Bulk Extensions

public static async Task BulkInsertAsync(IEnumerable<DiagnosticData> entities)
{
    await using (PCContext _context = new PCContext())
    {
        using var transaction = _context.Database.BeginTransaction();
        await _context.BulkInsertAsync(entities, options => { options.IncludeGraph = false; options.PreserveInsertOrder = false; }); 
        transaction.Commit();
    }
}

They're executed as follows:

IEnumerable<PC.DiagnosticData> diagnosticData = new List<PC.DiagnosticData>();
diagnosticData = statusData.Select(sd => new PC.DiagnosticData()
{
    DateTime = sd.DateTime.Value,
    DeviceId = devices.First(dv => dv.DeviceId == sd.Device.Id.ToString()).Id,  
    DiagnosticId = diagnostics.FirstOrDefault(di => di.DiagnosticId == sd.Diagnostic.ToString()).Id,
    Value = sd.Data.Value
}
);

// await PC.DiagnosticData.BulkInsertSqlAsync(diagnosticData);
await PC.DiagnosticData.BulkInsertAsync(diagnosticData);
// await PC.DiagnosticData.BulkInsertAsyncEF(diagnosticData);

statusData holds the response from the Geotab API

  • Method one takes 10 to 15 seconds to insert ~1000 records, which is way too slow according to my gut instinct (and this page).
  • Method two was faster at ~2 seconds for ~1000 records (still too slow!)
  • Method three was about the same as method one.

None of the records will EVER exist in the DB already, so an update will never be necessary.

I have tried on two separate instances of SQL Server and got the same results (one is my local dev machine with no other traffic)

Considering that methods one and three take roughly the same amount of time suggests that it's an issue at the EF level, but what?

Share Improve this question edited Dec 4, 2024 at 17:22 marc_s 753k183 gold badges1.4k silver badges1.5k bronze badges asked Dec 4, 2024 at 17:11 Adam HeyAdam Hey 1,6812 gold badges23 silver badges30 bronze badges 9
  • Try without transaction. – Svyatoslav Danyliv Commented Dec 4, 2024 at 17:18
  • @SvyatoslavDanyliv I have and there is no difference – Adam Hey Commented Dec 4, 2024 at 18:48
  • The insertion code in the EF Code returns the IDs of the inserted entities. This is one of the reasons for its slower operation. In your method Two, you don't do that. Unfortunately, at EF we can't give up on this. – Alexander Petrov Commented Dec 4, 2024 at 20:15
  • 1 Consider upgrading to a newer version. Read, for example, about the improvements in EF7. Inserting multiple rows can be four times faster than on EF Core 6.0! – Alexander Petrov Commented Dec 4, 2024 at 20:18
  • What is the performance you get if you use BulkInsert instead of BulkInsertAsync? – mjwills Commented Dec 4, 2024 at 22:49
 |  Show 4 more comments

1 Answer 1

Reset to default 3

If you are looking to do bulk inserts with the shortest processing time, you need to use the Bulk Copy operations that are included for the System.Data.SqlClient namespace. Entity Framework is going to build individual insert statements because it's not designed to utilize many native approaches. Individual insert statements will always take more time.

You will need to employ the class called SqlBulkCopy. The approach will be speedy, but the obvious tradeoff is that the code you write will be specific to SQL Server and won't translate to other DBMS systems as Entity Framework is designed to do.

One important detail - you can inject data from a DataTable or a DataReader. The DataTable is RAM-based, whereas the DataReader is from a database-connected source.

Read more here.

发布评论

评论列表(0)

  1. 暂无评论