I want to create a regular expression that finds the word tjuv (thief in swedish), which can be assembled with other words (see below for examples) and/or e in different conjugations.
Examples:
- cykeltjuv
- biltjuv
- tjuvarna
- inbrottstjuvs
The one below works for tjuv and tjuvs (a thief's), but what about the other conjugations as well as binations with other words?
/tjuv(?:s){0,1}/ig
Now that I've learned you a little swedish it's fair that you learn me some regular expressions ;-)
EDIT: To be more specific, there's actually no case I can think of that shouldn't match with the word tjuv.
What I am doing is searching through phrases where the word tjuv exists, for example (translated to english):
1. När en familj kom hem från en utlandssemester upptäckte de att en inbrottstjuv
hade varit i farten. <- MATCH!
2. På juldagen hade en cykeltjuv varit framme och stulit en cykel. <- MATCH
3. Violer är blå och rosor är röda <- No 'tjuv' and therefor no match
I want to create a regular expression that finds the word tjuv (thief in swedish), which can be assembled with other words (see below for examples) and/or e in different conjugations.
Examples:
- cykeltjuv
- biltjuv
- tjuvarna
- inbrottstjuvs
The one below works for tjuv and tjuvs (a thief's), but what about the other conjugations as well as binations with other words?
/tjuv(?:s){0,1}/ig
Now that I've learned you a little swedish it's fair that you learn me some regular expressions ;-)
EDIT: To be more specific, there's actually no case I can think of that shouldn't match with the word tjuv.
What I am doing is searching through phrases where the word tjuv exists, for example (translated to english):
1. När en familj kom hem från en utlandssemester upptäckte de att en inbrottstjuv
hade varit i farten. <- MATCH!
2. På juldagen hade en cykeltjuv varit framme och stulit en cykel. <- MATCH
3. Violer är blå och rosor är röda <- No 'tjuv' and therefor no match
Share
Improve this question
edited Jan 17, 2013 at 21:45
holyredbeard
asked Jan 17, 2013 at 21:32
holyredbeardholyredbeard
21.2k32 gold badges111 silver badges174 bronze badges
6
-
/tjuv(?:s){0,1}/ig
is much too plicated - use/tjuvs?/ig
instead. – speakr Commented Jan 17, 2013 at 21:36 -
First is bike thief? What about just
/.*tjuv.*/
? – Bryan Glazer Commented Jan 17, 2013 at 21:36 - @BryanGlazer: So it's actually THAT easy? – holyredbeard Commented Jan 17, 2013 at 21:37
-
That would match
tjuv
surrounded by anything. Do you need something more specific? – Bryan Glazer Commented Jan 17, 2013 at 21:38 - What kinds of strings should not match? Not matching is the real meat of regex. – Evan Davis Commented Jan 17, 2013 at 21:38
3 Answers
Reset to default 9I think this is what you want, the word "tjuv" with other letters before and/or ahead:
/[a-z]*tjuv[a-z]*/ig
See it here on Regexr
But [a-z]
is a character class covering only the ASCII characters a to z (Case independent because of the i
modifier). But I think swedish has also some characters that are not included in that range.
So either you
- add the missing characters to the character class
or
dependend on your regex flavour you can use
\p{L}
instead.\p{L}
is a Unicode code point, matching every letter in any language. Would then look like:/\p{L}*tjuv\p{L}*/ig
i dont think that
/.*tjuv.*/
is good. it matches all text. This is better:
\w*(tjuv)\w*
this matches all words from your list. (and all words i with "tjuv" in it)
As far as I understand the question, you are looking for words that contain any string before and/or after tjuv
. In regular expressions, you normally can use the dot .
to denote an arbitrary character. Therefore tjuv.
matches tjuvA
, tjuvX
, tjuvs
, ...
If you want an arbitrary number of such characters, use the star *
. With tjuvs.*
you can match tjuvABC
, tjuvs
, tjuv
(then the star expands to zero characters!), ...
So I think /.*tjuv.*/
could be something you want. However, here .
also matches white space characters, so the regexp also matches something xxxtjuvyyy somethingelse
, which might not be what you want.
It might be good to see some words that should match (or should not match). More than that, it would be a good idea to specify what programming language you are using.