最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Node.js slicing a very large buffer running out of memory - Stack Overflow

programmeradmin3浏览0评论

I have a very large base64 encoded string that needs to be read into a byte (Uint8) array then split that byte array into chunks of a specified size them base64 encode those chunks separately. Using the function below works but calling .slice or .toString increases the memory on the heap with each call because (I believe) it's making a copy of the buffer. On particularly large base64Encoded strings the application will run out of space on the heap. What can be done to split this out into specified sizes and base64 encode them without running out of memory?

const process = function (reallyLargeBase64EncodedString, splitSize){

var theBuffer = Buffer.from(reallyLargeBase64EncodedString, 'base64');

//var tempBuffer = new Buffer(splitSize);
for (var start = 0; start < theBuffer.length; start += splitSize) {
    //for(var z = 0; z < splitSize; z++){
        //tempBuffer.writeUInt8( theBuffer[start+z],z);
    //}
    //var base64EncodedVal = tempBuffer.toString('base64');
    //var base64EncodedVal = theBuffer.buffer.toString('base64', start, start+splitSize);
    var base64EncodedVal = theBuffer.slice(start,start+splitSize).toString('base64'); 
    //do stuff with the base64 encoded value
}

};

I have a very large base64 encoded string that needs to be read into a byte (Uint8) array then split that byte array into chunks of a specified size them base64 encode those chunks separately. Using the function below works but calling .slice or .toString increases the memory on the heap with each call because (I believe) it's making a copy of the buffer. On particularly large base64Encoded strings the application will run out of space on the heap. What can be done to split this out into specified sizes and base64 encode them without running out of memory?

const process = function (reallyLargeBase64EncodedString, splitSize){

var theBuffer = Buffer.from(reallyLargeBase64EncodedString, 'base64');

//var tempBuffer = new Buffer(splitSize);
for (var start = 0; start < theBuffer.length; start += splitSize) {
    //for(var z = 0; z < splitSize; z++){
        //tempBuffer.writeUInt8( theBuffer[start+z],z);
    //}
    //var base64EncodedVal = tempBuffer.toString('base64');
    //var base64EncodedVal = theBuffer.buffer.toString('base64', start, start+splitSize);
    var base64EncodedVal = theBuffer.slice(start,start+splitSize).toString('base64'); 
    //do stuff with the base64 encoded value
}

};

Share Improve this question asked Oct 7, 2016 at 13:16 Four_0h_ThreeFour_0h_Three 6121 gold badge12 silver badges27 bronze badges 6
  • 1 No, slice does not copy the memory. toString does. What exactly is the "stuff" you're doing with the strings? – Bergi Commented Oct 7, 2016 at 13:20
  • Right now inserting them into a database. Hmmm it's the toString that's the problem, perhaps changing that row from a string to a blob and just inserting the byte array directly will be better. – Four_0h_Three Commented Oct 7, 2016 at 13:46
  • Is the database insertion asynchronous? You might want to make it sequential so that not all the strings are created in memory at once. – Bergi Commented Oct 7, 2016 at 13:47
  • I was actually doing the insert with that last line, just took the database code out for the example. – Four_0h_Three Commented Oct 7, 2016 at 13:51
  • If you have a "very large buffer" in Node.JS, then you're doing something wrong. – OrangeDog Commented Oct 7, 2016 at 13:56
 |  Show 1 more ment

1 Answer 1

Reset to default 9

I would remend using node's streaming interface to deal with something that large. If your base64 encoded string is ing from a file or a network request, you can pipe directly from the input into a base64 decode stream like base64-stream.

In order to chunk the data and re-encode each chunk, you will have to write your own transform stream (stream that goes in between an input and an output). This will look something like

// NOTE: the following code has been tested in node 6.
// since it relies on the new Buffer api, it must be run in 5.10+
var Transform = require('stream').Transform;

class ChunkEncode extends Transform {
    constructor(options){
        super(options);
        this.splitSize = options.splitSize;
        this.buffer = Buffer.alloc(0);
    }

    _transform(chunk, encoding, cb){
        // chunk is a Buffer;
        this.buffer = Buffer.concat([this.buffer, chunk]);
        while (this.buffer.length > this.splitSize){
            let chunk = this.buffer.slice(0, this.splitSize);
            // Encode and write back to the stream.
            this.push(chunk.toString('base64')) 
            // throw in a newline for visibility.
            this.push('\n');
            // chop off `splitSize` from the start of our buffer.
            this.buffer = this.buffer.slice(this.splitSize);
        }
    }
}

Then you should be able to do something like

 var fs     = require('fs');
 var base64 = require('base64-stream');

 fs.createReadStream('./long-base64-string')
 .pipe(base64.decode())
 .pipe(new ChunkEncode({splitSize : 128}))
 .pipe(process.stdout) 

this will log to standard out, but you could just as easily write to a file or a network stream. If you need to manipulate the data further, you could create a write stream, which would allow you to do something with each chunk of data as it es in.

发布评论

评论列表(0)

  1. 暂无评论