最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

blob - JavaScript FileReader Slice Performance - Stack Overflow

programmeradmin3浏览0评论

I am trying to access the first few lines of text files using the FileApi in JavaScript.

In order to do so, I slice an arbitrary number of bytes from the beginning of the file and hand the blob over to the FileReader.

For large files this takes very long, even though, my understanding currently is that only the first few bytes of the file need to be accessed.

  • Is there some implementation in the background that requires the whole file to be accessed before it can be sliced?
  • Does it depend on the browser implementation of the FileApi?

I currently have tested in both Chrome and Edge (chromium).

Analysis in Chrome using the performance dev tools shows a lot of idle time before the reader.onloadend and no increase in ram usage. This might be however, because the FileApi is implemented in the Browser itself and does not reflect in the JavaScript performance statistics.

My implementation of the FileReader looks something like this:

const reader = new FileReader();

reader.onloadend = (evt) => {
  if (evt.target.readyState == FileReader.DONE) {
    console.log(evt.target.result.toString());
  }
};

// Slice first 10240 bytes of the file
var blob = files.item(0).slice(0, 1024 * 10);

// Start reading the sliced blob
reader.readAsBinaryString(blob);

This works fine but as described performs quite underwhelmingly for large files. I tried it for 10kb, 100mb and 6gb. The time until the first 10kb are logged seems to correlate directly to the file size.

Any suggestions on how to improve performance for reading the beginning of a file?


Edit: Using Response and DOM streams as suggested by @BenjaminGruenbaum does sadly not improve the read performance.

var dest = newWritableStream({​​​​​​​​
    write(str) {​​​​​​​​
        console.log(str);
    }​​​​​​​​,
}​​​​​​​​);
var blob = files.item(0).slice(0, 1024 * 10);

(blob.stream ? blob.stream() : newResponse(blob).body)
// Decode the binary-encoded response to string
  .pipeThrough(newTextDecoderStream())
  .pipeTo(dest)
  .then(() => {​​​​​​​​
      console.log('done');
  }​​​​​​​​);

I am trying to access the first few lines of text files using the FileApi in JavaScript.

In order to do so, I slice an arbitrary number of bytes from the beginning of the file and hand the blob over to the FileReader.

For large files this takes very long, even though, my understanding currently is that only the first few bytes of the file need to be accessed.

  • Is there some implementation in the background that requires the whole file to be accessed before it can be sliced?
  • Does it depend on the browser implementation of the FileApi?

I currently have tested in both Chrome and Edge (chromium).

Analysis in Chrome using the performance dev tools shows a lot of idle time before the reader.onloadend and no increase in ram usage. This might be however, because the FileApi is implemented in the Browser itself and does not reflect in the JavaScript performance statistics.

My implementation of the FileReader looks something like this:

const reader = new FileReader();

reader.onloadend = (evt) => {
  if (evt.target.readyState == FileReader.DONE) {
    console.log(evt.target.result.toString());
  }
};

// Slice first 10240 bytes of the file
var blob = files.item(0).slice(0, 1024 * 10);

// Start reading the sliced blob
reader.readAsBinaryString(blob);

This works fine but as described performs quite underwhelmingly for large files. I tried it for 10kb, 100mb and 6gb. The time until the first 10kb are logged seems to correlate directly to the file size.

Any suggestions on how to improve performance for reading the beginning of a file?


Edit: Using Response and DOM streams as suggested by @BenjaminGruenbaum does sadly not improve the read performance.

var dest = newWritableStream({​​​​​​​​
    write(str) {​​​​​​​​
        console.log(str);
    }​​​​​​​​,
}​​​​​​​​);
var blob = files.item(0).slice(0, 1024 * 10);

(blob.stream ? blob.stream() : newResponse(blob).body)
// Decode the binary-encoded response to string
  .pipeThrough(newTextDecoderStream())
  .pipeTo(dest)
  .then(() => {​​​​​​​​
      console.log('done');
  }​​​​​​​​);

Share Improve this question edited Dec 21, 2021 at 22:07 Mark Schultheiss 34.2k12 gold badges72 silver badges109 bronze badges asked Feb 17, 2021 at 15:19 kacasekacase 2,8592 gold badges24 silver badges32 bronze badges 16
  • 3 Hey, does using a Response and DOM streams help? I am not sure why readAsBinarySring is slow here since using .slice on the blob is supposed to only read the part you want - however what you are describing indicates that indeed it's waiting for the whole file. – Benjamin Gruenbaum Commented Feb 17, 2021 at 15:29
  • @BenjaminGruenbaum reading the file using Response and DOM streams works, but does sadly not improve the read performance for large files. – kacase Commented Feb 17, 2021 at 15:47
  • @BenjaminGruenbaum I added the DOM Stream implementation to the question. – kacase Commented Feb 17, 2021 at 15:53
  • 4 So the FileReader has nothing to do with it? Why not make it clear in the question? For me this really just sounds like your OS takes all this time to touch the file and produce the metadata. Nothing slice() can change I'm afraid. As to why your OS makes the time it takes relative to the file size, I have no clue. Might be worth testing on other environments, with other harddrive, other file system etc. – Kaiido Commented Feb 24, 2021 at 14:00
  • 2 And reading the beginning of a file during loading file, rather than after. – 小聪聪到此一游 Commented Feb 26, 2021 at 3:27
 |  Show 11 more comments

6 Answers 6

Reset to default 0

Just for kicks, here it is with a worker thread and File System Access API

No idea if either of those things help, I have no 6gb files. This will get the reading off the main thread so that does help performance in some sense.

Object.assign(
    new Worker(
        URL.createObjectURL(
            new Blob(
                [
                    `self.onmessage = async (e) =>` +
                    `    void postMessage(` +
                    `        (new FileReaderSync())` +
                    `            .readAsText(` +
                    `                (await e.data.getFile())` +
                    `                    .slice(0,1024*10)` +
                    `            )` +
                    `    );`
                ],
                { type: 'application/javascript' }
            )
        )
    ),
    { onmessage: (e) => void console.log(e.data) }
).postMessage(
    (await window.showOpenFilePicker(
        { mode: 'read', startIn: 'documents' }
    )).pop()
);

edit:

forgot but you need chromium for this to run sorry (tested on edge) also this won't run in a jsfiddle because web worker blah blah security blah blah. You can copy paste it into the console on google though. for some reason the headers don't prevent thins from running. If this actually does help please actually put the worker in its own file (and reformat my artistic negative space triangle out of existence)

Stream the uploaded File, rather then read the entire content.

Following code snippet prints the first 3 lines, of the uploaded text file. It extracted these lines in a streaming manner.

The performance gain achieved is derived from the fact it does only upload and process the portion of the text file. After the first lines have been received, the stream is closed.

async function lineReader(blob, lineCallback) {
    const decoder = new TextDecoder("utf-8");
    const stream = blob.stream();
    // Stream the file content instead of reading the entire file
    const reader = stream.pipeThrough(new TextDecoderStream()).getReader();
    try {
        let buffer = ""; // Buffer to hold incomplete lines

        do {
            const {done, value} = await reader.read();
            if (done) break; // Exit when the file has been fully read

            // Decode the current chunk and append it to the buffer
            buffer += value;

            // Process lines in the buffer
            let lines = buffer.split(/\r\n|\n/);
            buffer = lines.pop(); // Save the last line for the next chunk

            for (const line of lines) {
                if (!lineCallback(line)) return;
            }
        } while (true);

        // Process any remaining text in the buffer
        if (buffer) {
            lineCallback(buffer);
        }
    } finally {
        reader.releaseLock();
    }
}

function printFirstLines(file, nrOfLines) {
    const output = document.getElementById("output");
    let lineCount = 0;
    return lineReader(file, line => {
        output.textContent += `Line #${lineCount}: ${line}\n`;
        ++lineCount;
        return lineCount < nrOfLines;
    });
}

// Event listener for file input
document.getElementById("fileInput").addEventListener("change", (event) => {
        console.log('Start processing...');
        const t0 = Date.now();
        const file = event.target.files[0];
        if (file) {
            if (file.stream) {
                printFirstLines(file, 3).then(() => {
                    const t1 = Date.now();
                    console.log(`Completed in ${t1 - t0} ms.`);
                }, err => {
                    console.error(err.message);
                }); // Print first 3 lines
            } else {
                // You can fall back on the slower blob.body here
                alert("Your browser lacks Blob.stream() support");
            }
        } else {
            alert("No file selected");
        }
    }
);
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Test streaming text reade</title>
</head>
<body>
<h1>Test streaming text reader</h1>
<input type="file" id="fileInput" />
<pre id="output"></pre>
</body>
</html>

I'm not seeing any lag running code below. (dmg file size is ~220MB)

try yourself.

I think that in your case to get the slice of file browser still have to read the whole file to create a slice from original one

Probably subscription to the progress event could help with faster accessing the file contents and then aborting the read process

reader.addEventListener("progress", (event) => {});

According to MDN:

The Blob interface's slice() method creates and returns a new Blob object which contains data from a subset of the blob on which it's called.

The same conclustion we can have from the chrome README for Blob class

how about this!!

function readFirstBytes(file, n) {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => {
      resolve(reader.result);
    };
    reader.onerror = reject;
    reader.readAsArrayBuffer(file.slice(0, n));
  });
}

readFirstBytes('file', 10).then(buffer => {
  console.log(buffer);
});

What you are actually doing is:

  • waiting file fully readed
  • slice the first bytes
  • convert them to a stream
  • then read the steam

FileReader don't seams to have a stream api.

Try to use File instead then code your own stream reader.

  const fileInput = document.getElementById('file');
  fileInput.addEventListener('change', async(event) => {
    console.time('read')
    const fileList = event.target.files;
    const reader = fileList[0]
      .stream()
      .getReader({ mode: "byob" });
    const firstChunk = await reader.read(new Uint8Array(1024))
    console.log({firstChunk})
    await reader.cancel()
    console.timeEnd('read')
  });

That way, you never read the full file and the read time should be constant.

发布评论

评论列表(0)

  1. 暂无评论