javascript - Regex matching emoticons

We are working on a project where we want users to be able to use both emoji syntax (like :smile:, :heart:, :confused:,:stuck_out_tongue:) as well as normal emoticons (like :), <3, :/, :p)

I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:

normal strings or URL's - http://example
within the emoji syntax - :pencil:

How can I find these emoticon character sequences but not when other characters are near them?

The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:

(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)

You can play with a demo of it in action here:

I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:

normal strings or URL's - http://example.com
within the emoji syntax - :pencil:

How can I find these emoticon character sequences but not when other characters are near them?

The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:

(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)

You can play with a demo of it in action here: http://regexr.com/3a8o5

Share Improve this question asked Jan 21, 2015 at 21:21 Chris Barr 34k28 gold badges102 silver badges152 bronze badges

Why not split it up in multiple regexes? Also, what you could do is match with boundaries, for example /\b:\)\b/ – elclanrs Commented Jan 21, 2015 at 21:24
If I remember correctly, both Twemoji and Emojione provide JS code to do that with their image sets and there are dozens of implementations of the same thing on Github, NPM, bower etc. – Crissov Commented May 7, 2017 at 12:05

Add a comment |

4 Answers 4

Sorted by: Reset to default 8

Match emoji first (to take care of the :pencil: example) and then check for a terminating whitespace or newline:

(\:\w+\:|\<[\/\\]?3|[\(\)\\\D|\*\$][\-\^]?[\:\;\=]|[\:\;\=B8][\-\^]?[3DOPp\@\$\*\\\)\(\/\|])(?=\s|[\!\.\?]|$)

This regex matches the following (preferring emoji) returning the match in matching group 1:

:( :) :P :p :O :3 :| :/ :\ :$ :* :@
:-( :-) :-P :-p :-O :-3 :-| :-/ :-\ :-$ :-* :-@
:^( :^) :^P :^p :^O :^3 :^| :^/ :^\ :^$ :^* :^@
): (: $: *:
)-: (-: $-: *-:
)^: (^: $^: *^:
<3 </3 <\3
:smile: :hug: :pencil:

It also supports terminal punctuation as a delimiter in addition to white space.

You can see more details and test it here: https://regex101.com/r/aM3cU7/4

Make a positive look-ahead for a space

([\:\<]-?[)(|\\/pP3D])(?:(?=\s))
 |       |      |         |
 |       |      |         |
 |       |      |         |-> match last separating space
 |       |      |-> match last part of the emot
 |       |-> it may have a `-` or not 
 |-> first part of the emoticon

Since you're using javascript, and you don't have access to look arounds:

/([\:\<]-?[)|\\/pP3D])(\s|$)/g.exec('hi :) ;D');

And then just splice() the resulting array out of its last entry (that's most probably a space)

I assume these emoticons will commonly be used with spaces before and after. Then \s might be what you're looking for, as it represents a white space.

Then your regex would become

\s+(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)\s

You want regex look-arounds regarding spacing. Another answer here suggested a positive look-ahead, though I'd go double-negative:

(?<!\S)(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)

While JavaScript doesn't support (?<!pattern), look-behind can be mimicked:

test_string.replace(/(\S)?(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)/,
                    function($0, $1) { return $1 ? $0 : replacement_text; });

All I did was prefix your code with (?<!\S) in front and suffix with(?!\S) in back. The prefix ensures you do not follow a non-whitespace character, so the only valid leading entries are spaces or nothing (start of line). The suffix does the same thing, ensuring you are not followed by a non-whitespace character. See also this more thorough regex walk-through.

One of the comments to the question itself was suggesting \b (word boundary) markers. I don't recommend these. In fact, this suggestion would do the opposite of what you want; \b:/ will indeed match http:// since there is a word boundary between the p and the :. This kind of reasoning would suggest \B (not a word boundary), e.g. \B:/\B. This is more portable (it works with pretty much all regex parsers while look-arounds do not), and you can choose it in that case, but I prefer the look-arounds.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Regex matching emoticons - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)