I'm currently working on a C++ project on macOS, using Clang as my compiler. I've encountered a problem related to the locale settings when dealing with wide characters. Here is a simplified version of my code:
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
locale zhLocale("");
wcin.imbue(zhLocale);
wcout.imbue(zhLocale);
wstring input;
getline(wcin, input);
wcout << input << endl;
return 0;
}
and the input is:
你好
output:
你你你好
During debugging, it is found that the input variable becomes L"\U00000002\U00000002你你你好"
In launch and debug I see input was wrong
and this is my envionment variables:
$ clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
I would appreciate it if anyone could help me figure out what's going wrong and how to fix it. Is this a bug in Clang's handling of locale settings on macOS, or am I doing something wrong in my code?
I tried the correct code(I think), and I expect the output equals to input and the correct program behavior
I'm currently working on a C++ project on macOS, using Clang as my compiler. I've encountered a problem related to the locale settings when dealing with wide characters. Here is a simplified version of my code:
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
locale zhLocale("");
wcin.imbue(zhLocale);
wcout.imbue(zhLocale);
wstring input;
getline(wcin, input);
wcout << input << endl;
return 0;
}
and the input is:
你好
output:
你你你好
During debugging, it is found that the input variable becomes L"\U00000002\U00000002你你你好"
In launch and debug I see input was wrong
and this is my envionment variables:
$ clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
I would appreciate it if anyone could help me figure out what's going wrong and how to fix it. Is this a bug in Clang's handling of locale settings on macOS, or am I doing something wrong in my code?
I tried the correct code(I think), and I expect the output equals to input and the correct program behavior
Share Improve this question asked yesterday Craven MuellerCraven Mueller 111 bronze badge New contributor Craven Mueller is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 3- Does it work on case you avoid imbuing that locale with empty name? – Öö Tiib Commented 20 hours ago
- @ÖöTiib it works, but it works as cin, that means in the debugger I see the input is not indexed by Chinese character such as ['你', '好']. that means I can't traverse each Chinese character. in debugger the input is [L'\U0000fffd', L'\U00000001', L'\U00000006', L'\0', L'\n']. – Craven Mueller Commented 15 hours ago
- What means "works as cin"? Can you edit your question to add description of the new situation? I do not know from what specification you took std::locale(""), so the code as posted is confusing. – Öö Tiib Commented 21 mins ago
1 Answer
Reset to default 0Wide characters on MacOS are four bytes; you might be expecting two bytes.
Switch to UTF-8 if that is at all possible.