I have a .txt file that is space-delimited and it contains dupes. I want to remove the dupes but not finding it an easy task.
The file contains: orange orange apple apple pear
At first, I was getting an error with the txt extension. I updated the main to contain
const fs = require('fs');
require.extensions['.txt'] = function (module, filename) {
module.exports = fs.readFileSync(filename, 'utf8');
That helped with the errors and I was able to create a const
after that.
const fruitList = require('../support/fruitList.txt');
However, I am still unable to remove dupes. I tried neek
and that was not working either.
I have a .txt file that is space-delimited and it contains dupes. I want to remove the dupes but not finding it an easy task.
The file contains: orange orange apple apple pear
At first, I was getting an error with the txt extension. I updated the main to contain
const fs = require('fs');
require.extensions['.txt'] = function (module, filename) {
module.exports = fs.readFileSync(filename, 'utf8');
That helped with the errors and I was able to create a const
after that.
const fruitList = require('../support/fruitList.txt');
However, I am still unable to remove dupes. I tried neek
and that was not working either.
4 Answers
Reset to default 10You can use a set to remove duplicates in your set.
let fruitList = ["orange", "orange", "apple", "apple", "pear"];
let fruitSet = new Set(fruitList); // {"orange", "apple", "pear"}
//convert back to array
const newArray = [...fruitSet];//["orange", "apple", "pear"]
An important thing is try to catch any errors thrown by readFileSync
to find the source of your problem as to why your file isn't being read. Depending on how your data is formatted you'll usually want to catch all delimiters like tabs, spaces and newlines. The code below uses a regex in split to do that and put all your values in an array. Then the following line uses index to chuck out duplicates. try this:
const fs = require('fs')
try {
let data = fs.readFileSync('test.txt', 'utf8')
// split data by tabs, newlines and spaces
data = data.toString().split(/[\n \t ' ']/)
// this will remove duplicates from the array
const result = data.filter((item, pos) => data.indexOf(item) === pos)
console.log(result)
} catch (e) {
console.log('Error:', e.stack)
}
Set to spread is a considerably faster method than filter to extract duplicates as shown in Juan's answer:
let data = 'orange orange apple apple pear orange orange apple apple pear orange orange apple apple pear orange orange apple apple pear orange orange apple apple pear orange orange apple apple pear orange orange apple apple pear'
data = data.toString().split(/[\n \t ' ']/)
console.time('method1')
const firstArr = data.filter((item, pos, arr) => arr.indexOf(item) === pos)
console.timeEnd('method1')
console.time('method2')
const secondArr = [...new Set(data)]
console.timeEnd('method2')
console.log('method1', firstArr, 'method2', secondArr)
You can do it in a single line:
const fruitList = [...new Set(require('../support/fruitList.txt'))];
See thorough discussion in this question
I have just written a function for my gulp configuration to remove duplicated lines. In my case to separate replicas I use \n to split the lines in an array which a can already process. To rover the text in the file, just join with the same \n the text in to the very same file. You can use other separator for example space \s or some simbol as - ',' or ';' ect. to form your text in array and remove duplicated items.
import fs from 'fs';
export async function removeDuplicates() {
const filePath = './src/ads.txt';
try {
const
data = fs.readFileSync(filePath, 'utf-8'),
lines = data.split('\n'),
uniqueLines = Array.from(new Set(lines)),
result = uniqueLines.join('\n');
fs.writeFileSync(filePath, result, 'utf-8');
console.log('Duplicated lines have been successfully removed.');
} catch (error) {
console.error('Error while processing the operation: ', error);
}
}
export { removeDuplicates as rmvreplicas };
After writing the function in the gulp file you can starti it executing the line:
$ gulp rmvreplicas