I have a CSV file can contain around million records, how can I remove columns starting with _ and generate a resulting csv
For the sake of simplicity, consider i have the below csv
Sr.No Col1 Col2 _Col3 Col4 _Col5
1 txt png 676766 win 8787
2 jpg pdf 565657 lin 8787
3 pdf jpg 786786 lin 9898
I would want the output to be
Sr.No Col1 Col2 Col4
1 txt png win
2 jpg pdf lin
3 pdf jpg lin
Do i need to read the entire file to achive this or is there a better approach to do this.
const csv = require('csv-parser');
const fs = require('fs');
fs.createReadStream('data.csv')
.pipe(csv())
.on('data', (row) => {
// generate a new csv with removing specific column
})
.on('end', () => {
console.log('CSV file successfully processed');
});
Any help on how can i achieve this would be helpful.
Thanks.
I have a CSV file can contain around million records, how can I remove columns starting with _ and generate a resulting csv
For the sake of simplicity, consider i have the below csv
Sr.No Col1 Col2 _Col3 Col4 _Col5
1 txt png 676766 win 8787
2 jpg pdf 565657 lin 8787
3 pdf jpg 786786 lin 9898
I would want the output to be
Sr.No Col1 Col2 Col4
1 txt png win
2 jpg pdf lin
3 pdf jpg lin
Do i need to read the entire file to achive this or is there a better approach to do this.
const csv = require('csv-parser');
const fs = require('fs');
fs.createReadStream('data.csv')
.pipe(csv())
.on('data', (row) => {
// generate a new csv with removing specific column
})
.on('end', () => {
console.log('CSV file successfully processed');
});
Any help on how can i achieve this would be helpful.
Thanks.
Share Improve this question edited Jul 1, 2020 at 11:10 opensource-developer asked Jul 1, 2020 at 10:40 opensource-developeropensource-developer 3,0685 gold badges48 silver badges109 bronze badges4 Answers
Reset to default 3To anyone who stumbles on the post
I was able to transform the csv's using below code using fs
and csv
modules.
await fs.createReadStream(m.path)
.pipe(csv.parse({delimiter: '\t', columns: true}))
.pipe(csv.transform((input) => {
delete input['_Col3'];
console.log(input);
return input;
}))
.pipe(csv.stringify({header: true}))
.pipe(fs.createWriteStream(transformedPath))
.on('finish', () => {
console.log('finish....');
}).on('error', () => {
console.log('error.....');
});
Source: https://gist.github./donmccurdy/6cbcd8cee74301f92b4400b376efda1d
Try this with csv lib
const csv = require('csv');
const fs = require('fs');
const csvString=`col1,col2
value1,value2`
csv.parse(csvString, {columns: true})
.pipe(csv.transform(({col1,col2}) => ({col1}))) // remove col2
.pipe(csv.stringify({header:true}))
.pipe(fs.createWriteStream('./file.csv'))
Actually you can handle that by using two npm packages.
https://www.npmjs./package/csvtojson to convert your library to JSON format
then use this https://www.npmjs./package/json2csv
with the second library. If you know what are the exact fields you want. you can pass parameters to specifically select the fields you want.
const { Parser } = require('json2csv');
const fields = ['field1', 'field2', 'field3'];
const opts = { fields };
try {
const parser = new Parser(opts);
const csv = parser.parse(myData);
console.log(csv);
} catch (err) {
console.error(err);
}
Or you can modify the JSON object manually to drop those columns
With this function I acplished the column removal from a CSV
removeCol(csv, col) {
let lines = csv.split("\n");
let headers = lines[0].split(",");
let colNameToRemove = headers.find(h=> h.trim() === col);
let index = headers.indexOf(colNameToRemove);
let newLines = [];
lines.map((line)=>{
let fields = line.split(",");
fields.splice(index, 1)
newLines.push(fields)
})
let arrData = '';
for (let index = 0; index < newLines.length; index++) {
const element = newLines[index];
arrData += element.join(',') + '\n'
}
return arrData;
}