I'm extremely new to Node and JS. I have a large TSV file (1.5gb) that I need to read in and parse into either an array or JSON object. How would I go about doing that? I don't get an error when I try the code below but it doesn't even enter into it.
var d3 = require("d3-dsv");
d3.tsvParse("amazon_reviews_us_Mobile_Apps_v1_00.tsv", function(error, data)
{
var sum = 0;
data.forEach(function(d)
{
d.helpful_votes += d.helpful_votes;
sum += d.helpful_votes;
});
console.log("Total Helpful Votes: " + sum);
});
Any help would be appreciated.
I'm extremely new to Node and JS. I have a large TSV file (1.5gb) that I need to read in and parse into either an array or JSON object. How would I go about doing that? I don't get an error when I try the code below but it doesn't even enter into it.
var d3 = require("d3-dsv");
d3.tsvParse("amazon_reviews_us_Mobile_Apps_v1_00.tsv", function(error, data)
{
var sum = 0;
data.forEach(function(d)
{
d.helpful_votes += d.helpful_votes;
sum += d.helpful_votes;
});
console.log("Total Helpful Votes: " + sum);
});
Any help would be appreciated.
Share Improve this question asked Oct 7, 2020 at 3:38 RouxRoux 131 silver badge5 bronze badges 7-
Two problems: it should be
d3.tsv
, notd3.tsvParse
, which works only with strings. Also, D3 v5 and above uses Fetch API, meaning it should bed3.tsv(url).then(etc...)
. – Gerardo Furtado Commented Oct 7, 2020 at 3:55 -
@GerardoFurtado I have tried both of these.
d3.tsv
gives me function does not exist andd3.tsv(url).then
gives me fetch is undefined, even when I installed the d3-fetch module. – Roux Commented Oct 7, 2020 at 22:39 - What's your D3 version? – Gerardo Furtado Commented Oct 7, 2020 at 22:52
-
@GerardoFurtado 2.0.0. I installed it by using
npm install d3-dsv
– Roux Commented Oct 7, 2020 at 22:56 - Are you sure? This is 8 years old! – Gerardo Furtado Commented Oct 7, 2020 at 23:14
2 Answers
Reset to default 3You need to find a module that provides a streaming parser for a TSV file, meaning that it doesn't load the whole file into memory. You can use readline if your parser is synchronous:
const {createInterface} = require("rl");
const {createReadStream} = require("fs");
createInterface({input: createReadStream("amazon_reviews_us_Mobile_Apps_v1_00.tsv")})
.on('line', (data) => doSomethingWith(data.split("\t")))
.on('end', () => doSomethingWhenDone())
You wrote that you want to parse that file and change it to an array or object of some sort. You'll still need to be looking at your memory, but you could use my scramjet
which will allow you to transform the data anyway you like:
const {StringStream} = require("scramjet");
const {createReadStream, createWriteStream} = require("fs");
StringStream.from(createReadStream("amazon_reviews_us_Mobile_Apps_v1_00.tsv"))
// read the file
.CSVParse({delimiter: "\t"})
// parse as csv
.map((entry) => doSomething(entry))
// whatever you return here it will be changed
// this can be asynchronous too, so you can do requests...
.toJSONArray()
.pipe(createWriteStream("somefile.json"))
Let me know what are you trying to achieve besides counting. I'll edit the answer.
BTW, for just counting votes the solution by @hugo-elhaj-lahsen is also good, I'm not sure why it was downvoted.
Use d3.tsv
with the promise-based API. Since your file is very large, one optimisation we can do is instead of doing a for-each on each element after they get parsed by D3, use the loop done at parsing time via the initialization function:
var d3 = require("d3-dsv");
var sum = 0
d3.tsvParse("amazon_reviews_us_Mobile_Apps_v1_00.tsv", data => {
sum += d.helpful_votes;
return d // Since this is the parser, need to return the parsed object at the end
}).then(data => {
console.log("Total helpful votes", sum)
})