最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c# - Why does this approach with TarWriter use alot of memory with Streams for files? - Stack Overflow

programmeradmin1浏览0评论

I originally posted this question: how-can-i-create-a-tar-file-in-memory-from-several-varbinarymax-columns-stored which was helpful. But I realized we might need to create large .tar files, with files around 2GB each which may result in ~20GB .tar files. And the usage of MemoryStream would not work and I cannot allocate that mush memory on the server.

I also looked at: how-do-you-pack-a-large-file-into-a-tar where the answer implies that the method will not use a lot of memory, and that approach writes to a tar file on disk.

So my modified approach is to read the varbinary columns from SQL Server and create a temporary file on disk that I then stream back to calling client.

My current code:

var tempFileName = Path.GetTempFileName();
await using var tarFileStream = File.Create(tempFileName);

await using (var tarWriter = new TarWriter(tarFileStream, leaveOpen: true))
{
    await using var connection = new SqlConnection("YourConnectionString");
    const string query = @"
SELECT
  FileName,
  FileData
FROM FilesTable;
";
    await using var command = new SqlCommand(query, connection);
    await connection.OpenAsync();
    await using var reader = command.ExecuteReaderAsync(CommandBehavior.SequentialAccess);

    while (await reader.ReadAsync())
    {                            
        var fileName = reader.GetString(0);
        await using var fileDataStream = reader.GetStream(1);

        var entry = new PaxTarEntry(TarEntryType.RegularFile, fileName)
        {
            DataStream = fileDataStream,
        };
        await tarWriter.WriteEntryAsync(entry);
    }
}

tarFileStream.Position = 0;

// Stream tarFileStream to response body..

When I try this approach and during the while-loop I get somewhere around 8GB allocated in LOH for a couple of 1.7GB files stored in SQL Server (FILESTTREAM). I can inspect the memory usage in JetBrains DotMemory and the memory is allocated in an underlying MemoryStream of TarWriter if I'm not mistaken.

Am I missing something or what can cause the large memory consumption? According to Jon Skeets answer I should be able to not consume that much memory?

EDIT 1: Tried to read same file from disk instead of the SQL Server with no problem (no memory consumption basically):

await using var tempFileStream = File.Create(Path.GetTempFileName());
await using (var tarWriter = new TarWriter(tempFileStream, leaveOpen: true))
{
    await using var file = File.Open("C:\\Users\\xyz\\Desktop\\BigFile.txt", FileMode.Open);
    await tarWriter.WriteEntryAsync(new PaxTarEntry(TarEntryType.RegularFile, Guid.NewGuid().ToString())
    {
        DataStream = file
    });
}

So it must be some problem when I set DataStream property when the stream comes from SqlClient.

dotMemory:

EDIT 2: Reproducible example, reading and inserting a 836MB large text file to database (without FILESTREAM). Read it from database with GetStream and write the stream to TarWriter.

using System.Data;
using System.Formats.Tar;
using Microsoft.Data.SqlClient;

const string connectionString = "Integrated Security=true;Data Source=localhost;Initial Catalog=MyTestDatabase;" +
                                "User Id=username;Password=xy;Max pool size=200;Min pool size=10;Connection Timeout=30;" +
                                "Encrypt=false";

await using var connection = new SqlConnection(connectionString);
await using var largeFile = File.Open(@"C:\Users\xyz\Desktop\BigFile.txt", FileMode.Open);

await using var createAndInsertCommand = new SqlCommand(
    """
    CREATE TABLE [dbo].[Temp] ([Id] INT NOT NULL, [Data] VARBINARY (MAX) NOT NULL);
    INSERT INTO dbo.Temp VALUES (@Id, @Data);
    """, 
    connection);
createAndInsertCommand.Parameters.Add("@Id", SqlDbType.Int).Value = 1;
createAndInsertCommand.Parameters.Add("@Data", SqlDbType.VarBinary, size: -1).Value = largeFile;
await createAndInsertCommand.Connection.OpenAsync();
await createAndInsertCommand.ExecuteNonQueryAsync();

// Read from database
await using var command = new SqlCommand("SELECT TOP 1 Data FROM dbo.Temp", connection);
await using var reader = await command.ExecuteReaderAsync(CommandBehavior.SequentialAccess);

if (await reader.ReadAsync())
{
    await using var fileData = reader.GetStream(0);
    
    await using var tempFileStream = File.Create(Path.GetTempFileName());
    await using var tarWriter = new TarWriter(tempFileStream, leaveOpen: true, format: TarEntryFormat.Pax);


    tarWriter.WriteEntry(new PaxTarEntry(TarEntryType.RegularFile, Guid.NewGuid().ToString())
    {
        DataStream = fileData
    });
}

I originally posted this question: how-can-i-create-a-tar-file-in-memory-from-several-varbinarymax-columns-stored which was helpful. But I realized we might need to create large .tar files, with files around 2GB each which may result in ~20GB .tar files. And the usage of MemoryStream would not work and I cannot allocate that mush memory on the server.

I also looked at: how-do-you-pack-a-large-file-into-a-tar where the answer implies that the method will not use a lot of memory, and that approach writes to a tar file on disk.

So my modified approach is to read the varbinary columns from SQL Server and create a temporary file on disk that I then stream back to calling client.

My current code:

var tempFileName = Path.GetTempFileName();
await using var tarFileStream = File.Create(tempFileName);

await using (var tarWriter = new TarWriter(tarFileStream, leaveOpen: true))
{
    await using var connection = new SqlConnection("YourConnectionString");
    const string query = @"
SELECT
  FileName,
  FileData
FROM FilesTable;
";
    await using var command = new SqlCommand(query, connection);
    await connection.OpenAsync();
    await using var reader = command.ExecuteReaderAsync(CommandBehavior.SequentialAccess);

    while (await reader.ReadAsync())
    {                            
        var fileName = reader.GetString(0);
        await using var fileDataStream = reader.GetStream(1);

        var entry = new PaxTarEntry(TarEntryType.RegularFile, fileName)
        {
            DataStream = fileDataStream,
        };
        await tarWriter.WriteEntryAsync(entry);
    }
}

tarFileStream.Position = 0;

// Stream tarFileStream to response body..

When I try this approach and during the while-loop I get somewhere around 8GB allocated in LOH for a couple of 1.7GB files stored in SQL Server (FILESTTREAM). I can inspect the memory usage in JetBrains DotMemory and the memory is allocated in an underlying MemoryStream of TarWriter if I'm not mistaken.

Am I missing something or what can cause the large memory consumption? According to Jon Skeets answer I should be able to not consume that much memory?

EDIT 1: Tried to read same file from disk instead of the SQL Server with no problem (no memory consumption basically):

await using var tempFileStream = File.Create(Path.GetTempFileName());
await using (var tarWriter = new TarWriter(tempFileStream, leaveOpen: true))
{
    await using var file = File.Open("C:\\Users\\xyz\\Desktop\\BigFile.txt", FileMode.Open);
    await tarWriter.WriteEntryAsync(new PaxTarEntry(TarEntryType.RegularFile, Guid.NewGuid().ToString())
    {
        DataStream = file
    });
}

So it must be some problem when I set DataStream property when the stream comes from SqlClient.

dotMemory:

EDIT 2: Reproducible example, reading and inserting a 836MB large text file to database (without FILESTREAM). Read it from database with GetStream and write the stream to TarWriter.

using System.Data;
using System.Formats.Tar;
using Microsoft.Data.SqlClient;

const string connectionString = "Integrated Security=true;Data Source=localhost;Initial Catalog=MyTestDatabase;" +
                                "User Id=username;Password=xy;Max pool size=200;Min pool size=10;Connection Timeout=30;" +
                                "Encrypt=false";

await using var connection = new SqlConnection(connectionString);
await using var largeFile = File.Open(@"C:\Users\xyz\Desktop\BigFile.txt", FileMode.Open);

await using var createAndInsertCommand = new SqlCommand(
    """
    CREATE TABLE [dbo].[Temp] ([Id] INT NOT NULL, [Data] VARBINARY (MAX) NOT NULL);
    INSERT INTO dbo.Temp VALUES (@Id, @Data);
    """, 
    connection);
createAndInsertCommand.Parameters.Add("@Id", SqlDbType.Int).Value = 1;
createAndInsertCommand.Parameters.Add("@Data", SqlDbType.VarBinary, size: -1).Value = largeFile;
await createAndInsertCommand.Connection.OpenAsync();
await createAndInsertCommand.ExecuteNonQueryAsync();

// Read from database
await using var command = new SqlCommand("SELECT TOP 1 Data FROM dbo.Temp", connection);
await using var reader = await command.ExecuteReaderAsync(CommandBehavior.SequentialAccess);

if (await reader.ReadAsync())
{
    await using var fileData = reader.GetStream(0);
    
    await using var tempFileStream = File.Create(Path.GetTempFileName());
    await using var tarWriter = new TarWriter(tempFileStream, leaveOpen: true, format: TarEntryFormat.Pax);


    tarWriter.WriteEntry(new PaxTarEntry(TarEntryType.RegularFile, Guid.NewGuid().ToString())
    {
        DataStream = fileData
    });
}

Share Improve this question edited Feb 6 at 9:51 Filip asked Feb 4 at 14:56 FilipFilip 3571 gold badge9 silver badges19 bronze badges 22
  • Can you post exactly what you see in DotMemory? – canton7 Commented Feb 4 at 14:57
  • Also, any reason that you can't stream the output out directly to the response body, rather than going via a file on disk? – canton7 Commented Feb 4 at 15:01
  • Can you show the bit of code indicated by // Stream tarFileStream to response body.. Also have you removed all possible buffering points stackoverflow/q/78348901/14868997 – Charlieface Commented Feb 4 at 15:19
  • 1 @canton7 I attached two images from dotMemory, but it looks like the images wont show. Streaming directly might be an option, but will need some more refactoring of how we write back to client stream. – Filip Commented Feb 4 at 15:23
  • @Charlieface No I cannot unfortunately. But I do not think that is of importance here. Since the memory gets filled before I return data to client. I have a breakpoint on: tarFileStream.Position = 0; – Filip Commented Feb 4 at 15:23
 |  Show 17 more comments

1 Answer 1

Reset to default 2

The Stream object returned by SQL Server isn't seekable which makes sense. The TarWriter code treats unseekable streams differently though, depending on the TAR format.

Looking at the TarHeader.Write.cs code for PAX I see that all the data is buffered in a MemoryStream if the input stream isn't seekable. The Ustar format doesn't buffer.

Try changing the format to Ustar and using a UstarTarEntry :

await using var tarWriter = new TarWriter(tempFileStream, 
    leaveOpen: true,
    format: TarEntryFormat.Ustar);


tarWriter.WriteEntry(new UstarTarEntry(TarEntryType.RegularFile, Guid.NewGuid().ToString())
{
    DataStream = fileData
});
发布评论

评论列表(0)

  1. 暂无评论