javascript - Strange length of accent as "é" string return 2

I have a strange problem that I can't explain. I'm trying to manipulate a string with an accent as "é". This string comes from the name of an image from an input file type.

What I can not understand is why my string when I parse with for the accented character is split into two character. Here is an example to better understand:

My é is divided into two character like this e & ́.

"é".length
=> 2

It's possible that utf8 is involved ?

I really don't understand anything at all !

I have a strange problem that I can't explain. I'm trying to manipulate a string with an accent as "é". This string comes from the name of an image from an input file type.

What I can not understand is why my string when I parse with for the accented character is split into two character. Here is an example to better understand:

My é is divided into two character like this e & ́.

"é".length
=> 2

It's possible that utf8 is involved ?

I really don't understand anything at all !

Share Improve this question asked Sep 2, 2013 at 17:24 hypee 7186 silver badges20 bronze badges

6 Which browser are you using? It returns 1 in my chrome. – MD Sayem Ahmed Commented Sep 2, 2013 at 17:26
Im some rar cases it is possible to write this letter with two charactors. I read this in the context of LaTeX. – rekire Commented Sep 2, 2013 at 17:27
2 Your character also returns 1 on Firefox. – Kevin Ji Commented Sep 2, 2013 at 17:29

Add a comment |

2 Answers 2

Sorted by: Reset to default 12

They are called Combining Diacritical Marks. They are a "piece" of Unicode... Some combinable diacritics that can be "chained" on any character. Clearly the length of the string in that case is 2 (because there is the e and the '. The precomposed characters like àéèìòù have been left for compatibility, but now any character can be accented :-) Clearly 99% of the programmers don't know it, and 99.9% of the programs support it very badly. I'm quite sure they could be used as an attack vector somewhere (but I'm not paranoid :-) )

I'll even add that even Skeet in 2009 wasn't sure on how they worked: http://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/

You see, I couldn't remember whether combining characters came before or after base characters

:-) :-)

Instead of UTF-8, it's more likely combining diacritical marks involved.

>>> "e\u0301"
"é"
>>> "e\u0301".length
2

Javascript's strings are usually encoded as UTF-16, so it could contain the whole single "é" (U+00e9) in 1 code unit.

But characters outside of the BMP (those with code point beyond U+FFFF) will return 2, as they are encoded into 2 UTF-16 code units.

>>> "

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Strange length of accent as "é" string return 2 - Stack Overflow

2 Answers 2

`与本文相关的文章`

`评论列表(0)`