I'm trying to get an array of words from a string like this:
"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."
The array is supposed to look like this:
[
"exclamation",
"question",
"quotes",
"apostrophe",
"wasn't"
"couldn't",
"didn't"
]
Currently I'm using this expression:
sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" ");
The problem is, it removes apostrophes from words like "wasn't", turning it into "wasnt".
I can't figure out how to keep the apostrophes in words such as that.
Any help would be greatly appreciated!
var sentence = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" "));
I'm trying to get an array of words from a string like this:
"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."
The array is supposed to look like this:
[
"exclamation",
"question",
"quotes",
"apostrophe",
"wasn't"
"couldn't",
"didn't"
]
Currently I'm using this expression:
sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" ");
The problem is, it removes apostrophes from words like "wasn't", turning it into "wasnt".
I can't figure out how to keep the apostrophes in words such as that.
Any help would be greatly appreciated!
var sentence = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" "));
Share
Improve this question
edited May 18, 2024 at 5:23
MysteryPancake
asked Apr 8, 2018 at 13:18
MysteryPancakeMysteryPancake
1,5151 gold badge19 silver badges52 bronze badges
4
-
1
What should happen to
"\"Couldn't do\""
or"'Couldn't do'"
? – Bergi Commented Apr 8, 2018 at 13:21 - 1 Try splitting on whitespace and then remove punctuation in the start and end of each individual words. – Bergi Commented Apr 8, 2018 at 13:22
- @Bergi I'm trying to only get the words, so in both of those cases it would be "couldn't" and "do" – MysteryPancake Commented Apr 8, 2018 at 13:22
- @DarrenSweeney I'm not replacing spaces with no spaces, only the characters I don't want. The current expression works, it just removes the apostrophes as well. – MysteryPancake Commented Apr 8, 2018 at 13:23
2 Answers
Reset to default 4That would be tricky to work around your own solution but you could consider apostrophes this way:
sentence = `"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."`;
console.log(
sentence.match(/\w+(?:'\w+)*/g)
);
Note: changed quantifier from ?
to *
to allow multiple '
in a word.
@revo's answer looks good, here's another option that should work too:
const input = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(input.toLowerCase().match(/\b[\w']+\b/g));
Explanation:
\b
matches at the beginning/end of a word,[\w']+
matches anything that's either letters, digits, underscores or quotes (to omit underscores, you can use[a-zA-Z0-9']
instead),/g
tells the regex to capture all occurrences that match that pattern (not just the first one).