最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to read very large (> 1GB) tar.gz files in Node.js? - Stack Overflow

programmeradmin1浏览0评论

I have never had to do this before so this is probably something really basic, but I thought I'd ask anyways.

What is the right way to read a very large file in Node.js? Say the file is just too large to read all at once. Also say the file could e in as a .zip or .tar.gz format.

First question, is it best to depress the file first and save it to disk (I'm using Stuffit on the Mac to do this now), and then work with that file? Or can you read the IO stream straight from the pressed .zip or .tar.gz version? I guess you'd need to know the format of the content in the pressed file, so you probably have to depress (just found out this .tar.gz file is actually a .dat file)...

Then the main issue is, how do I read this large file in Node.js? Say it's a 1GB XML file, where should I look to get started in parsing it? (Not, how to parse XML, but if you're reading the large file line-by-line, how do you parse something like XML which needs to know the context of previous lines).

I have seen fs.createReadStream, but I'm afraid to mess around with it... don't want to explode my puter. Just looking for some pointers in the right direction.

I have never had to do this before so this is probably something really basic, but I thought I'd ask anyways.

What is the right way to read a very large file in Node.js? Say the file is just too large to read all at once. Also say the file could e in as a .zip or .tar.gz format.

First question, is it best to depress the file first and save it to disk (I'm using Stuffit on the Mac to do this now), and then work with that file? Or can you read the IO stream straight from the pressed .zip or .tar.gz version? I guess you'd need to know the format of the content in the pressed file, so you probably have to depress (just found out this .tar.gz file is actually a .dat file)...

Then the main issue is, how do I read this large file in Node.js? Say it's a 1GB XML file, where should I look to get started in parsing it? (Not, how to parse XML, but if you're reading the large file line-by-line, how do you parse something like XML which needs to know the context of previous lines).

I have seen fs.createReadStream, but I'm afraid to mess around with it... don't want to explode my puter. Just looking for some pointers in the right direction.

Share Improve this question edited Jun 18, 2012 at 2:29 Lance Pollard asked Jun 18, 2012 at 2:20 Lance PollardLance Pollard 79.6k98 gold badges332 silver badges608 bronze badges 3
  • 2 What do you want to do with it? – Jeremy Rodi Commented Jun 18, 2012 at 2:37
  • How about, assume it's a very large CSV and I just want to create a database record for each line. – Lance Pollard Commented Jun 18, 2012 at 2:44
  • You have two issues, 1. Is there a streaming zip file reader for Node, and 2. Is there a streaming XML reader (that can use the first stream as input). Not sure what options are out there but that might help you search... – Joe Commented Jun 18, 2012 at 2:47
Add a ment  | 

2 Answers 2

Reset to default 9

there is built-in zlib module for stream depression and sax for stream XML parsing

var fs = require('fs');
var zlib = require('zlib');
var sax = require('sax');

var saxStream = sax.createStream();
// add your xml handlers here

fs.createReadStream('large.xml.gz').pipe(zlib.createUnzip()).pipe(saxStream);

We can also zip the directory something like the following :

var spawn = require('child_process').spawn;
var pathToArchive = './very_large_folder.tar.gz';
var pathToFolder = './very_large_folder';

var tar = spawn('tar', ['czf', pathToArchive, pathToFolder]);
tar.on('exit', function (code) {
        if (code === 0) {
                console.log('pleted successfully');
        } else {
                console.log('error');
        }
});

This worked nicely :)

发布评论

评论列表(0)

  1. 暂无评论