最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

unicode - When a character in a web page is copied (using CTRL+C) from within a browser, what gets stored in the clipboard: the

programmeradmin0浏览0评论

Let's say you're on Windows 11 (version 24H2) and using Chrome (version 132). Now, let's say you have a web page open in Chrome and you copy (using Ctrl+C) the following character to the clipboard:

Ϟ

This is character code 990 (U+03DE Greek Letter Koppa), and in UTF-8 it is represented by the byte sequence CF 9E.

What gets stored in the clipboard? The character code or byte sequence based on the encoding of the web page?

I have a hunch it's the character code, since that would make whatever you paste it into be encoding-agnostic (as long as the destination uses Unicode), but I wanted to ask StackOverflow to be sure.

Let's say you're on Windows 11 (version 24H2) and using Chrome (version 132). Now, let's say you have a web page open in Chrome and you copy (using Ctrl+C) the following character to the clipboard:

Ϟ

This is character code 990 (U+03DE Greek Letter Koppa), and in UTF-8 it is represented by the byte sequence CF 9E.

What gets stored in the clipboard? The character code or byte sequence based on the encoding of the web page?

I have a hunch it's the character code, since that would make whatever you paste it into be encoding-agnostic (as long as the destination uses Unicode), but I wanted to ask StackOverflow to be sure.

Share Improve this question edited Jan 31 at 20:59 Remy Lebeau 598k36 gold badges503 silver badges848 bronze badges asked Jan 31 at 18:09 user3163495user3163495 3,6825 gold badges38 silver badges56 bronze badges 2
  • 2 It depends on the browser and the operating system – Daniel A. White Commented Jan 31 at 18:13
  • @DanielA.White I updated my question specifying the browser and operating system. – user3163495 Commented Jan 31 at 20:22
Add a comment  | 

1 Answer 1

Reset to default 3

It depends on the browser implementation.

Text can be stored on the Windows clipboard in many different formats, such as standard formats like CF_TEXT + CF_LOCALE for ANSI text, or CF_UNICODETEXT for Unicode text, etc. As well as custom formats, such as CF_HTML, etc.

Apps are encouraged to store as many different formats as is feasible for their purpose, but especially Unicode for text. Any app that then pastes from the clipboard can look at what format(s) are available and decide which one(s) it wants to use. If HTML makes the most sense, it can use that. If Unicode makes the most sense, it can use that instead. And so on.

There are tools/APIs available that let you view what is actually on the clipboard.

For example, when I copy the Ϟ character using Chrome 132, my clipboard gets these formats:

  • CF_HTML
  • CF_UNICODETEXT
  • "Chromium internal source URL"
  • CF_LOCALE (holding LANGID=0x0409)
  • CF_TEXT
  • CF_OEMTEXT

But, when I copy the same character using FireFox 134, my clipboard gets these formats:

  • "DataObject"
  • "text/html"
  • CF_HTML
  • "text/_moz_htmlcontext"
  • "text/_moz_htmlinfo"
  • CF_UNICODETEXT
  • CF_TEXT
  • "text/x-moz-url-priv"
  • "Ole Private Data"
  • CF_LOCALE (holding LANGID=0x0409)
  • CF_OEMTEXT

So, to answer your question:

What gets stored in the clipboard? The character code or byte sequence based on the encoding of the web page?

What gets stored is:

  • UTF-16 Unicode text
  • AND Localized ANSI text, using the user's locale
  • AND UTF-8 encoded HTML
  • AND other formats

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论