最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Regex matching emoticons - Stack Overflow

programmeradmin0浏览0评论

We are working on a project where we want users to be able to use both emoji syntax (like :smile:, :heart:, :confused:,:stuck_out_tongue:) as well as normal emoticons (like :), <3, :/, :p)

I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:

  • normal strings or URL's - http://example
  • within the emoji syntax - :pencil:

How can I find these emoticon character sequences but not when other characters are near them?

The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:

(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)

You can play with a demo of it in action here:

We are working on a project where we want users to be able to use both emoji syntax (like :smile:, :heart:, :confused:,:stuck_out_tongue:) as well as normal emoticons (like :), <3, :/, :p)

I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:

  • normal strings or URL's - http://example.com
  • within the emoji syntax - :pencil:

How can I find these emoticon character sequences but not when other characters are near them?

The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:

(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)

You can play with a demo of it in action here: http://regexr.com/3a8o5

Share Improve this question asked Jan 21, 2015 at 21:21 Chris BarrChris Barr 34k28 gold badges102 silver badges152 bronze badges 2
  • Why not split it up in multiple regexes? Also, what you could do is match with boundaries, for example /\b:\)\b/ – elclanrs Commented Jan 21, 2015 at 21:24
  • If I remember correctly, both Twemoji and Emojione provide JS code to do that with their image sets and there are dozens of implementations of the same thing on Github, NPM, bower etc. – Crissov Commented May 7, 2017 at 12:05
Add a comment  | 

4 Answers 4

Reset to default 8

Match emoji first (to take care of the :pencil: example) and then check for a terminating whitespace or newline:

(\:\w+\:|\<[\/\\]?3|[\(\)\\\D|\*\$][\-\^]?[\:\;\=]|[\:\;\=B8][\-\^]?[3DOPp\@\$\*\\\)\(\/\|])(?=\s|[\!\.\?]|$)

This regex matches the following (preferring emoji) returning the match in matching group 1:

:( :) :P :p :O :3 :| :/ :\ :$ :* :@
:-( :-) :-P :-p :-O :-3 :-| :-/ :-\ :-$ :-* :-@
:^( :^) :^P :^p :^O :^3 :^| :^/ :^\ :^$ :^* :^@
): (: $: *:
)-: (-: $-: *-:
)^: (^: $^: *^:
<3 </3 <\3
:smile: :hug: :pencil:

It also supports terminal punctuation as a delimiter in addition to white space.

You can see more details and test it here: https://regex101.com/r/aM3cU7/4

Make a positive look-ahead for a space

([\:\<]-?[)(|\\/pP3D])(?:(?=\s))
 |       |      |         |
 |       |      |         |
 |       |      |         |-> match last separating space
 |       |      |-> match last part of the emot
 |       |-> it may have a `-` or not 
 |-> first part of the emoticon

Since you're using javascript, and you don't have access to look arounds:

/([\:\<]-?[)|\\/pP3D])(\s|$)/g.exec('hi :) ;D');

And then just splice() the resulting array out of its last entry (that's most probably a space)

I assume these emoticons will commonly be used with spaces before and after. Then \s might be what you're looking for, as it represents a white space.

Then your regex would become

\s+(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)\s

You want regex look-arounds regarding spacing. Another answer here suggested a positive look-ahead, though I'd go double-negative:

(?<!\S)(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)

While JavaScript doesn't support (?<!pattern), look-behind can be mimicked:

test_string.replace(/(\S)?(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)/,
                    function($0, $1) { return $1 ? $0 : replacement_text; });

All I did was prefix your code with (?<!\S) in front and suffix with(?!\S) in back. The prefix ensures you do not follow a non-whitespace character, so the only valid leading entries are spaces or nothing (start of line). The suffix does the same thing, ensuring you are not followed by a non-whitespace character. See also this more thorough regex walk-through.

One of the comments to the question itself was suggesting \b (word boundary) markers. I don't recommend these. In fact, this suggestion would do the opposite of what you want; \b:/ will indeed match http:// since there is a word boundary between the p and the :. This kind of reasoning would suggest \B (not a word boundary), e.g. \B:/\B. This is more portable (it works with pretty much all regex parsers while look-arounds do not), and you can choose it in that case, but I prefer the look-arounds.

发布评论

评论列表(0)

  1. 暂无评论