I can't get this javascript function to work the way I want...
// matches a String that contains kanji and/or kana character(s)
String.prototype.isKanjiKana = function(){
return !!this.match(/^[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}
it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.
I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.
thank you in advance for any help!
I can't get this javascript function to work the way I want...
// matches a String that contains kanji and/or kana character(s)
String.prototype.isKanjiKana = function(){
return !!this.match(/^[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}
it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.
I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.
thank you in advance for any help!
Share Improve this question edited Apr 4, 2015 at 19:01 tchrist 80.4k31 gold badges131 silver badges184 bronze badges asked Sep 8, 2011 at 7:57 MikeleMikele 631 silver badge7 bronze badges6 Answers
Reset to default 9The right answer is not to hardcode ranges. Never ever put magic numbers in your code! That is a maintenance nightmare. It is hard to read, hard to write, hard to debug, hard to maintain. How do you know you got the numbers right? What happens when they add new ones? No, do not use magic numbers. Please.
The right answer is to use named Unicode scripts, which are a fundemental aspect of every Unicode code point:
[\p{Han}\p{Hiragana}\p{Katakana}]
That requires the XRegExp
plugin for Javascript.
The real problem is that Javascript regexes on their own are too primitive to support Unicode properties — and therefore, to support Unicode. Maybe that was once an acceptable compromise 15 years ago, but today it is nothing less than intolerably negligent, as you yourself have discovered.
You will also miss a few Common
code points specified as kana in the new Script Extensions
property, but probably no matter. You could just add \p{Common}
to the set above.
Now that Unicode property escapes are part of the ES (2018) spec, the following regex can be used natively if the JS engine supports this feature (expanding on @tchrist's answer):
/[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u
If you want to exclude punctuation from being matched:
/(?!\p{Punctuation})[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u
/[\u3000-\u303f]|[\u3040-\u309f]|[\u30a0-\u30ff]|[\uff00-\uffef]|[\u4e00-\u9faf]|[\u3400-\u4dbf]/
- Japanese style punctuation:
[\u3000-\u303f]
- Hiragana:
[\u3040-\u309f]
- Katakana:
[\u30a0-\u30ff]
- Roman characters + half-width katakana:
[\uff00-\uffef]
- Kanji:
[\u4e00-\u9faf]|[\u3400-\u4dbf]
String.prototype.isKanjiKana = function(){
return !!this.match(/[\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF]/);
}
Don't anchor it to beginning and end of string with $^
and the +
is useless in this case.
Why not just this? It will return true when it contains at least one Kanji.
/[一-龯]/.test(str)
/[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]/