最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - Javascript: Remove string punctuation and split into words? - Stack Overflow

programmeradmin4浏览0评论

I'm trying to get an array of words from a string like this:

"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."

The array is supposed to look like this:

[
  "exclamation",
  "question",
  "quotes",
  "apostrophe",
  "wasn't"
  "couldn't",
  "didn't"
]

Currently I'm using this expression:

sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" ");

The problem is, it removes apostrophes from words like "wasn't", turning it into "wasnt".

I can't figure out how to keep the apostrophes in words such as that.

Any help would be greatly appreciated!

var sentence = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" "));

I'm trying to get an array of words from a string like this:

"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."

The array is supposed to look like this:

[
  "exclamation",
  "question",
  "quotes",
  "apostrophe",
  "wasn't"
  "couldn't",
  "didn't"
]

Currently I'm using this expression:

sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" ");

The problem is, it removes apostrophes from words like "wasn't", turning it into "wasnt".

I can't figure out how to keep the apostrophes in words such as that.

Any help would be greatly appreciated!

var sentence = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" "));

Share Improve this question edited May 18, 2024 at 5:23 MysteryPancake asked Apr 8, 2018 at 13:18 MysteryPancakeMysteryPancake 1,5151 gold badge19 silver badges52 bronze badges 4
  • 1 What should happen to "\"Couldn't do\"" or "'Couldn't do'"? – Bergi Commented Apr 8, 2018 at 13:21
  • 1 Try splitting on whitespace and then remove punctuation in the start and end of each individual words. – Bergi Commented Apr 8, 2018 at 13:22
  • @Bergi I'm trying to only get the words, so in both of those cases it would be "couldn't" and "do" – MysteryPancake Commented Apr 8, 2018 at 13:22
  • @DarrenSweeney I'm not replacing spaces with no spaces, only the characters I don't want. The current expression works, it just removes the apostrophes as well. – MysteryPancake Commented Apr 8, 2018 at 13:23
Add a ment  | 

2 Answers 2

Reset to default 4

That would be tricky to work around your own solution but you could consider apostrophes this way:

sentence = `"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."`;
console.log(
    sentence.match(/\w+(?:'\w+)*/g)
);

Note: changed quantifier from ? to * to allow multiple ' in a word.

@revo's answer looks good, here's another option that should work too:

const input = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(input.toLowerCase().match(/\b[\w']+\b/g));

Explanation:

  • \b matches at the beginning/end of a word,
  • [\w']+ matches anything that's either letters, digits, underscores or quotes (to omit underscores, you can use [a-zA-Z0-9']instead),
  • /g tells the regex to capture all occurrences that match that pattern (not just the first one).
发布评论

评论列表(0)

  1. 暂无评论