最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c++ - Clang locale issue on macOS when handling wide characters - Stack Overflow

programmeradmin5浏览0评论

I'm currently working on a C++ project on macOS, using Clang as my compiler. I've encountered a problem related to the locale settings when dealing with wide characters. Here is a simplified version of my code:

#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
    locale zhLocale("");
    wcin.imbue(zhLocale);
    wcout.imbue(zhLocale);

    wstring input;
    getline(wcin, input);
    wcout << input << endl;

    return 0;
}

and the input is:

你好

output:

你你你好

During debugging, it is found that the input variable becomes L"\U00000002\U00000002你你你好"

In launch and debug I see input was wrong

and this is my envionment variables:

$ clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

$ locale                                        
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

I would appreciate it if anyone could help me figure out what's going wrong and how to fix it. Is this a bug in Clang's handling of locale settings on macOS, or am I doing something wrong in my code?

I tried the correct code(I think), and I expect the output equals to input and the correct program behavior

I'm currently working on a C++ project on macOS, using Clang as my compiler. I've encountered a problem related to the locale settings when dealing with wide characters. Here is a simplified version of my code:

#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
    locale zhLocale("");
    wcin.imbue(zhLocale);
    wcout.imbue(zhLocale);

    wstring input;
    getline(wcin, input);
    wcout << input << endl;

    return 0;
}

and the input is:

你好

output:

你你你好

During debugging, it is found that the input variable becomes L"\U00000002\U00000002你你你好"

In launch and debug I see input was wrong

and this is my envionment variables:

$ clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

$ locale                                        
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

I would appreciate it if anyone could help me figure out what's going wrong and how to fix it. Is this a bug in Clang's handling of locale settings on macOS, or am I doing something wrong in my code?

I tried the correct code(I think), and I expect the output equals to input and the correct program behavior

Share Improve this question asked yesterday Craven MuellerCraven Mueller 111 bronze badge New contributor Craven Mueller is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 3
  • Does it work on case you avoid imbuing that locale with empty name? – Öö Tiib Commented 20 hours ago
  • @ÖöTiib it works, but it works as cin, that means in the debugger I see the input is not indexed by Chinese character such as ['你', '好']. that means I can't traverse each Chinese character. in debugger the input is [L'\U0000fffd', L'\U00000001', L'\U00000006', L'\0', L'\n']. – Craven Mueller Commented 15 hours ago
  • What means "works as cin"? Can you edit your question to add description of the new situation? I do not know from what specification you took std::locale(""), so the code as posted is confusing. – Öö Tiib Commented 21 mins ago
Add a comment  | 

1 Answer 1

Reset to default 0

Wide characters on MacOS are four bytes; you might be expecting two bytes.

Switch to UTF-8 if that is at all possible.

发布评论

评论列表(0)

  1. 暂无评论