最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Wildcards in regular expression - Stack Overflow

programmeradmin3浏览0评论

I want to create a regular expression that finds the word tjuv (thief in swedish), which can be assembled with other words (see below for examples) and/or e in different conjugations.

Examples:

  • cykeltjuv
  • biltjuv
  • tjuvarna
  • inbrottstjuvs

The one below works for tjuv and tjuvs (a thief's), but what about the other conjugations as well as binations with other words?

/tjuv(?:s){0,1}/ig

Now that I've learned you a little swedish it's fair that you learn me some regular expressions ;-)

EDIT: To be more specific, there's actually no case I can think of that shouldn't match with the word tjuv.

What I am doing is searching through phrases where the word tjuv exists, for example (translated to english):

1. När en familj kom hem från en utlandssemester upptäckte de att en inbrottstjuv
   hade varit i farten. <- MATCH!

2. På juldagen hade en cykeltjuv varit framme och stulit en cykel. <- MATCH


3. Violer är blå och rosor är röda <- No 'tjuv' and therefor no match

I want to create a regular expression that finds the word tjuv (thief in swedish), which can be assembled with other words (see below for examples) and/or e in different conjugations.

Examples:

  • cykeltjuv
  • biltjuv
  • tjuvarna
  • inbrottstjuvs

The one below works for tjuv and tjuvs (a thief's), but what about the other conjugations as well as binations with other words?

/tjuv(?:s){0,1}/ig

Now that I've learned you a little swedish it's fair that you learn me some regular expressions ;-)

EDIT: To be more specific, there's actually no case I can think of that shouldn't match with the word tjuv.

What I am doing is searching through phrases where the word tjuv exists, for example (translated to english):

1. När en familj kom hem från en utlandssemester upptäckte de att en inbrottstjuv
   hade varit i farten. <- MATCH!

2. På juldagen hade en cykeltjuv varit framme och stulit en cykel. <- MATCH


3. Violer är blå och rosor är röda <- No 'tjuv' and therefor no match
Share Improve this question edited Jan 17, 2013 at 21:45 holyredbeard asked Jan 17, 2013 at 21:32 holyredbeardholyredbeard 21.2k32 gold badges111 silver badges174 bronze badges 6
  • /tjuv(?:s){0,1}/ig is much too plicated - use /tjuvs?/ig instead. – speakr Commented Jan 17, 2013 at 21:36
  • First is bike thief? What about just /.*tjuv.*/ ? – Bryan Glazer Commented Jan 17, 2013 at 21:36
  • @BryanGlazer: So it's actually THAT easy? – holyredbeard Commented Jan 17, 2013 at 21:37
  • That would match tjuv surrounded by anything. Do you need something more specific? – Bryan Glazer Commented Jan 17, 2013 at 21:38
  • What kinds of strings should not match? Not matching is the real meat of regex. – Evan Davis Commented Jan 17, 2013 at 21:38
 |  Show 1 more ment

3 Answers 3

Reset to default 9

I think this is what you want, the word "tjuv" with other letters before and/or ahead:

/[a-z]*tjuv[a-z]*/ig

See it here on Regexr

But [a-z] is a character class covering only the ASCII characters a to z (Case independent because of the i modifier). But I think swedish has also some characters that are not included in that range.

So either you

  • add the missing characters to the character class

or

  • dependend on your regex flavour you can use \p{L} instead.

    \p{L} is a Unicode code point, matching every letter in any language. Would then look like:

      /\p{L}*tjuv\p{L}*/ig
    

i dont think that

/.*tjuv.*/ 

is good. it matches all text. This is better:

\w*(tjuv)\w*

this matches all words from your list. (and all words i with "tjuv" in it)

As far as I understand the question, you are looking for words that contain any string before and/or after tjuv. In regular expressions, you normally can use the dot . to denote an arbitrary character. Therefore tjuv. matches tjuvA, tjuvX, tjuvs, ... If you want an arbitrary number of such characters, use the star *. With tjuvs.* you can match tjuvABC, tjuvs, tjuv (then the star expands to zero characters!), ...

So I think /.*tjuv.*/ could be something you want. However, here . also matches white space characters, so the regexp also matches something xxxtjuvyyy somethingelse, which might not be what you want.

It might be good to see some words that should match (or should not match). More than that, it would be a good idea to specify what programming language you are using.

发布评论

评论列表(0)

  1. 暂无评论