最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to chech Bosnian-specific characters in RegEx? - Stack Overflow

programmeradmin0浏览0评论

I have this Regular Expression pattern, which is quite simple and it validates if the provided string is "alpha" (both uppercase and lowercase):

var pattern = /^[a-zA-Z]+$/gi;

When I trigger pattern.test('Zlatan Omerovic') it returns true, however if I:

pattern.test('Zlatan Omerović');

It returns false and it fails my validation.

In Bosnian language we have these specific characters:

š đ č ć ž

And uppercased:

Š Đ Č Ć Ž

Is it possible to validate these characters (both cases) with JavaScript regular expression?

I have this Regular Expression pattern, which is quite simple and it validates if the provided string is "alpha" (both uppercase and lowercase):

var pattern = /^[a-zA-Z]+$/gi;

When I trigger pattern.test('Zlatan Omerovic') it returns true, however if I:

pattern.test('Zlatan Omerović');

It returns false and it fails my validation.

In Bosnian language we have these specific characters:

š đ č ć ž

And uppercased:

Š Đ Č Ć Ž

Is it possible to validate these characters (both cases) with JavaScript regular expression?

Share Improve this question asked Apr 12, 2013 at 22:05 user1386320user1386320 8
  • Yes, what have you tried? Protip: just add those between the square brackets. – Fabrício Matté Commented Apr 12, 2013 at 22:06
  • @FabrícioMatté - excatly that, what you see in the question :) – user1386320 Commented Apr 12, 2013 at 22:07
  • I meant, it looks like you just copypasta'd some regex that validates alphabetical characters but ok. If you look into the meaning of those square brackets - a character class - you'd know how to fix such regex. – Fabrício Matté Commented Apr 12, 2013 at 22:09
  • @FabrícioMatté: The character class a-z could well enpass š to a Bosnian. It doesn't in JavaScript, but that doesn't make it illogical from a non-English perspective. – T.J. Crowder Commented Apr 12, 2013 at 22:13
  • @T.J.Crowder I believe JS's character classes' ranges are ASCII code based, no? In that case a-z represents characters 97-122 (and 65-90 with the case-insensitive flag) only. Or these are UTF-8 based, not sure. – Fabrício Matté Commented Apr 12, 2013 at 22:15
 |  Show 3 more ments

3 Answers 3

Reset to default 9

Sure, you can just add those characters to the list of characters your matching. Also, since you're doing a case insensitive match (the i flag), you don't need the uppercase characters.

var pattern = /^[a-zšđčćž ]+$/gi;

Fiddle here: http://jsfiddle/ryanbrill/KB74b/

Here's an alternate pattern, which uses the unicode representation, which might be better (embedding the characters won't work if the file isn't saved with the proper encoding, for instance)

var pattern = /^[a-z\u0161\u0111\u010D\u0107\u017E ]+$/gi;

http://jsfiddle/ryanbrill/KB74b/2/

a-zA-Z means exactly that, and in an English-centric way: abcdefghijklmnopqrstuvwxyz. Sadly, with JavaScript's regular expressions, if you want to test other alphabetic characters, you have to specify them specifically. JavaScript doesn't have a locale-sensitive "alpha" definition. To include non-English alphabetic characters, you have to include them on purpose. You can either do that literally (for instance, by including š in the regular expression), or using Unicode escape sequences (such as \u0161). If the additional Bosnian alphabetic characters in question have a contiguous range, you can use the - notation with them as well, but it has to be separate from the a-z, which is defined in English terms.

To include in test result the first (S-based) symbol of your five I did:

var pattern = /^[a-zA-Z\u0160-\u0161]+$/g;

Try to add all the symbols you need this way ;)

发布评论

评论列表(0)

  1. 暂无评论