I generate a ~200'000-element array of objects (using object literal notation inside map
rather than new Constructor()
), and I'm saving a JSON.stringify
'd version of it to disk, where it takes up 31 MB, including newlines and one-space-per-indentation level (JSON.stringify(arr, null, 1)
).
Then, in a new node process, I read the entire file into a UTF-8 string and pass it to JSON.parse
:
var fs = require('fs');
var arr1 = JSON.parse(fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}));
Node memory usage is about 1.05 GB according to Mavericks' Activity Monitor! Even typing into a Terminal feels laggier on my ancient 4 GB RAM machine.
But if, in a new node process, I load the file's contents into a string, chop it up at element boundaries, and JSON.parse
each element individually, ostensibly getting the same object array:
var fs = require('fs');
var arr2 = fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}).trim().slice(1,-3).split('\n },').map(function(s) {return JSON.parse(s+'}');});
node is using just ~200 MB of memory, and no noticeable system lag. This pattern persists across many restarts of node: JSON.parse
ing the whole array takes a gig of memory while parsing it element-wise is much more memory-efficient.
Why is there such a huge disparity in memory usage? Is this a problem with JSON.parse
preventing efficient hidden class generation in V8? How can I get good memory performance without slicing-and-dicing strings? Must I use a streaming JSON parse