最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Unable to read accented characters from csv file stream in node - Stack Overflow

programmeradmin0浏览0评论

To start off. I am currently using npm fast-csv which is a nice CSV reader/writer that is pretty straightforward and simple. What Im attempting to do is use this in conjunction with iconv to process "accented" character and non-ASCII characters and either convert them to an ASCII equivalent or remove them depending on the character.

My current process Im doing with fast-csv is to bring in a chunk for processing (es in as one row) via a read stream, pause the read stream, process the data, pipe the data to a write stream and then resume the read stream using a callback. Fast-csv currently knows where to separate the chunks based on the format of the data ing in from the readstream.

The entire process looks like this:

var stream = fs.createReadStream(inputFileName);
function csvPull(source) {
    csvWrite = csv.createWriteStream({ headers: true });
    writableStream = fs.createWriteStream(outputFileName);
    csvStream = csv()
        .on("data", function (data) {
            csvStream.pause();
            processRow(data, function () {
                csvStream.resume();
            });
        })
        .on("end", function () {
            console.log('END OF CSV FILE');
        });
    csvWrite.pipe(writableStream);
    source.pipe(csvStream);
}
csvPull(stream);

The problem I am currently running into is that Im noticing that for some reason, when my javascript piles, it does not inherently recognise non-ASCII characters, so I am resorting to having to use npm iconv-lite to encode the data stream as it es in to something usable. However, this presents a bigger issue as fast-csv will no longer know where to split the chunks (rows) due to the now encoded data. This is a problem due to the sizes of the CSVs I will be working with; it will not be an option to load the entire CSV into the buffer to then decode.

Are there any suggestions on how I might get around this without writing my own CSV parser into my code?

To start off. I am currently using npm fast-csv which is a nice CSV reader/writer that is pretty straightforward and simple. What Im attempting to do is use this in conjunction with iconv to process "accented" character and non-ASCII characters and either convert them to an ASCII equivalent or remove them depending on the character.

My current process Im doing with fast-csv is to bring in a chunk for processing (es in as one row) via a read stream, pause the read stream, process the data, pipe the data to a write stream and then resume the read stream using a callback. Fast-csv currently knows where to separate the chunks based on the format of the data ing in from the readstream.

The entire process looks like this:

var stream = fs.createReadStream(inputFileName);
function csvPull(source) {
    csvWrite = csv.createWriteStream({ headers: true });
    writableStream = fs.createWriteStream(outputFileName);
    csvStream = csv()
        .on("data", function (data) {
            csvStream.pause();
            processRow(data, function () {
                csvStream.resume();
            });
        })
        .on("end", function () {
            console.log('END OF CSV FILE');
        });
    csvWrite.pipe(writableStream);
    source.pipe(csvStream);
}
csvPull(stream);

The problem I am currently running into is that Im noticing that for some reason, when my javascript piles, it does not inherently recognise non-ASCII characters, so I am resorting to having to use npm iconv-lite to encode the data stream as it es in to something usable. However, this presents a bigger issue as fast-csv will no longer know where to split the chunks (rows) due to the now encoded data. This is a problem due to the sizes of the CSVs I will be working with; it will not be an option to load the entire CSV into the buffer to then decode.

Are there any suggestions on how I might get around this without writing my own CSV parser into my code?

Share Improve this question asked Oct 21, 2015 at 15:09 JSArrakisJSArrakis 7991 gold badge9 silver badges22 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 7

Try reading your file with binary for the encoding option. I had to read few csv with some accented characters and it worked fine with that.

var stream = fs.createReadStream(inputFileName, { encoding: 'binary' });

Unless I misunderstand, you should be able to fix this by setting the encoding on the stream to utf-8 (docs).

for the first line:

var stream = fs.createReadStream(inputFileName, {encoding: 'utf8'});

and if needed:

writableStream = fs.createWriteStream(outputFileName, {defaultEncoding: 'utf8'});
发布评论

评论列表(0)

  1. 暂无评论