I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.
I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.
Share Improve this question edited May 19, 2012 at 16:14 creemama 6,6653 gold badges22 silver badges26 bronze badges asked May 19, 2012 at 13:28 ChrisChris 3133 silver badges16 bronze badges2 Answers
Reset to default 4The notation ü
has nothing particular to do with UTF-8. The use of character references is a mon way of avoiding the need to use UTF-8; they can be used with any encoding, but if you use UTF-8, you don’t need them.
The notation ü
is an HTML notation, not JavaScript. Whether it gets interpreted by HTML rules when it appears inside your JavaScript code depends on the context (like JavaScript inside an HTML document vs. separate JavaScript file). This problem is best avoided by using either characters as such or by using JavaScript notations for characters.
For example, ü
means the same as ü
, i.e. U+00FC, ü (u with diaeresis). The JavaScript notation, for use inside string literals, for this is \u00fc
(\u
followed by exactly four hexadecimal digits). E.g., the following sets the value to “Für”:
document.getElementById('myInputField').value = "F\u00fcr";
Your using whats called HTML entities to encode characters which it not the same as UTF-8, but of course a UTF-8 string can include HTML entities.
I think the problem is that tag attributes can't include HTML entities so you have to use some other encoding when assigning the text input value attribute. I think you have two options:
Decode the HTML entity on the client side. A quite ugly solution to piggyback on the decoder available in the browser (im using jQuery in the example, but you probably get the point).
inputElement.value = $("<p/>").html("FÖr").text();
Another option, which is think is nicer, is to not send HTML entities in the server response but instead use proper UTF-8 encoding for all characters which should work fine when put into text nodes or tag attributes. This assumes the HTML page uses UTF-8 encoding of course.