I am using this package:
However I can have some very large excel files that could contain 1 million rows.
I tested with 600K rows which is about 15mb excel file and my code is already crashing on localhost.
Is there away to stream it in? I know the documentation says they don't have any sort of streaming api but it talks about buffering?
var reader = new FileReader();
reader.onload = evt => {
const bstr = evt.target.result;
const wb = XLSX.read(bstr, { type: "binary" });
const wsname = wb.SheetNames[0];
const ws = wb.Sheets[wsname];
const data = XLSX.utils.sheet_to_json(ws, { header: "A", defval: "" });
});
};
reader.readAsBinaryString(this.file);
I am using this package: https://www.npmjs./package/xlsx
However I can have some very large excel files that could contain 1 million rows.
I tested with 600K rows which is about 15mb excel file and my code is already crashing on localhost.
Is there away to stream it in? I know the documentation says they don't have any sort of streaming api but it talks about buffering?
var reader = new FileReader();
reader.onload = evt => {
const bstr = evt.target.result;
const wb = XLSX.read(bstr, { type: "binary" });
const wsname = wb.SheetNames[0];
const ws = wb.Sheets[wsname];
const data = XLSX.utils.sheet_to_json(ws, { header: "A", defval: "" });
});
};
reader.readAsBinaryString(this.file);
Share
Improve this question
edited Jul 26, 2018 at 22:23
chobo2
asked Jul 26, 2018 at 21:21
chobo2chobo2
85.9k207 gold badges551 silver badges862 bronze badges
8
- Just go through opened/closed issues. For example this describes what issue has XLS/X. Whole file must be loaded in memory in case of pressed formats.. It seems like backend job.. – bigless Commented Jul 26, 2018 at 23:09
- @bigless - I am been going through closed issues trying to find some work around. I agree that it probably is a backend job but I need to get first display at least 1000 records to the user so they can do some mapping, then I need to somehow send it to asp core api. – chobo2 Commented Jul 26, 2018 at 23:12
- 1 You can show records of files that does not exceeds the limit.. Or do you expect majority of files like 15MB and larger? – bigless Commented Jul 26, 2018 at 23:18
- I think most will be alot smaller than 15MB(maybe like 6mb..which I have not tested yet so they might be an issue as well). I really just need first row, which would be the header column names, as sometimes people don't use the proper names so I what I have is if columns don't match with expected names the user can choose which columns match to my expected names. – chobo2 Commented Jul 26, 2018 at 23:23
- I am not following, I just basically took an example I found and started working with it. My idea achievement would be to take X amount of rows out of the spead sheet as a preview of sorts. Then once everything is all good start sending rows in batches to the server( core api) and saving it to the db, while supporting as big file as possible(otherwise I going to have to make 2 version of this, one that shows previews and one that the file has to be perfect and it is sent to the server and processed their) – chobo2 Commented Jul 26, 2018 at 23:33
3 Answers
Reset to default 3When I had to read data from a very large excel file ( about 50 mb), I converted it to csv on the backend with Excel Interop, which takes shorter time than getting data from Excel file. Then, just get the first n number of lines via stream reader. Which will give you the required data to preview. Send this to the front-end for preview. This is what I would do.
SheetJS docs remends using web workers for processing large datasets
Source: https://docs.sheetjs./docs/demos/bigdata/worker
Parsing and writing large spreadsheets takes time. During the process, if the SheetJS library is running in the web browser, the website may freeze.
Workers provide a way to off-load the hard work so that the website does not freeze during processing. The work is still performed locally. No data is sent to a remote server.
move your code into a web worker
worker.js
/* this callback will run once the main context sends a message */
self.addEventListener('message', (e) => {
var reader = new FileReader();
reader.onload = evt => {
const bstr = evt.target.result;
const wb = XLSX.read(bstr, {type: "binary"});
const wsname = wb.SheetNames[0];
const ws = wb.Sheets[wsname];
const data = XLSX.utils.sheet_to_json(ws, {header: "A", defval: ""});
/* Pass the result data back */
postMessage(data);
}
}, false);
control the worker from you main file
main.js
const worker = new Worker("./worker.js");
const data = {
test: '1'
}
worker.onmessage = (response) => {
/* response recieved from worker */
console.log(response)
};
/* post a message to the worker */
worker.postMessage(data);
Try to use stream reader
var fs = require('fs');
var XLSX = require('xlsx');
function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
var buffers = [];
stream.on('data', function(data) { buffers.push(data); });
stream.on('end', function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);