The current REGEX I'm using is the following one:
var sentences = fulltext.match(/[^\.!\?]+[\.!\?]+/g);
That returns an array with the sentences split INCLUDING the spaces (I need all the characters). Problem is, it does not work with ellipsis "..." and I guess neither it does with other unconventional forms of punctuation.
How can I fix my REGEX to match this and other forms of punctuation?
Is there any noob friendly example driven guide to REGEX out there?
The current REGEX I'm using is the following one:
var sentences = fulltext.match(/[^\.!\?]+[\.!\?]+/g);
That returns an array with the sentences split INCLUDING the spaces (I need all the characters). Problem is, it does not work with ellipsis "..." and I guess neither it does with other unconventional forms of punctuation.
How can I fix my REGEX to match this and other forms of punctuation?
Is there any noob friendly example driven guide to REGEX out there?
Share Improve this question edited Jan 26, 2014 at 1:53 BenMorel 36.7k51 gold badges205 silver badges336 bronze badges asked Jan 25, 2014 at 22:54 BelohlavekBelohlavek 1673 silver badges13 bronze badges 2-
2
Ellipsis also have their own character / code point -- U+2026 or
\u2026
-- that are distinct from 3 consecutive.
s (U+002E). – Jonathan Lonowski Commented Jan 25, 2014 at 22:58 - possible duplicate of Javascript regular expression for punctuation (international)? – Jonathan Lonowski Commented Jan 25, 2014 at 23:06
2 Answers
Reset to default 5Unicode of ellipsis is \u2026
.
So you can use \u2026
to match an ellipsis .
Code :
var fulltext= "First sentence… Second sentence. ";
fulltext.match(/([^.?!;\u2026]+[.?!;\u2026]+)/g);
OUTPUT
["First sentence…", " Second sentence."]
DEMO and Explanation
You can just add the ellipsis (and any other punctuation characters) to your character sets.
var input = "First sentence… Second sentence. ";
input.match(/[^\.\?!;…]+[\.\?!;…]+/g);
Result:
["First sentence…", " Second sentence."]