unicode - When a character in a web page is copied (using CTRL+C) from within a browser, what gets stored in the clipboard: the

Let's say you're on Windows 11 (version 24H2) and using Chrome (version 132). Now, let's say you have a web page open in Chrome and you copy (using Ctrl+C) the following character to the clipboard:

Ϟ

This is character code 990 (U+03DE Greek Letter Koppa), and in UTF-8 it is represented by the byte sequence CF 9E.

What gets stored in the clipboard? The character code or byte sequence based on the encoding of the web page?

I have a hunch it's the character code, since that would make whatever you paste it into be encoding-agnostic (as long as the destination uses Unicode), but I wanted to ask StackOverflow to be sure.

Let's say you're on Windows 11 (version 24H2) and using Chrome (version 132). Now, let's say you have a web page open in Chrome and you copy (using Ctrl+C) the following character to the clipboard:

Ϟ

This is character code 990 (U+03DE Greek Letter Koppa), and in UTF-8 it is represented by the byte sequence CF 9E.

What gets stored in the clipboard? The character code or byte sequence based on the encoding of the web page?

I have a hunch it's the character code, since that would make whatever you paste it into be encoding-agnostic (as long as the destination uses Unicode), but I wanted to ask StackOverflow to be sure.

Share Improve this question edited Jan 31 at 20:59 Remy Lebeau 598k36 gold badges503 silver badges848 bronze badges asked Jan 31 at 18:09 user3163495 3,6825 gold badges38 silver badges56 bronze badges

2 It depends on the browser and the operating system – Daniel A. White Commented Jan 31 at 18:13
@DanielA.White I updated my question specifying the browser and operating system. – user3163495 Commented Jan 31 at 20:22

Add a comment |

1 Answer 1

Sorted by: Reset to default 3

It depends on the browser implementation.

Text can be stored on the Windows clipboard in many different formats, such as standard formats like CF_TEXT + CF_LOCALE for ANSI text, or CF_UNICODETEXT for Unicode text, etc. As well as custom formats, such as CF_HTML, etc.

Apps are encouraged to store as many different formats as is feasible for their purpose, but especially Unicode for text. Any app that then pastes from the clipboard can look at what format(s) are available and decide which one(s) it wants to use. If HTML makes the most sense, it can use that. If Unicode makes the most sense, it can use that instead. And so on.

There are tools/APIs available that let you view what is actually on the clipboard.

For example, when I copy the Ϟ character using Chrome 132, my clipboard gets these formats:

CF_HTML
CF_UNICODETEXT
"Chromium internal source URL"
CF_LOCALE (holding LANGID=0x0409)
CF_TEXT
CF_OEMTEXT

But, when I copy the same character using FireFox 134, my clipboard gets these formats:

"DataObject"
"text/html"
CF_HTML
"text/_moz_htmlcontext"
"text/_moz_htmlinfo"
CF_UNICODETEXT
CF_TEXT
"text/x-moz-url-priv"
"Ole Private Data"
CF_LOCALE (holding LANGID=0x0409)
CF_OEMTEXT

So, to answer your question:

What gets stored in the clipboard? The character code or byte sequence based on the encoding of the web page?

What gets stored is:

UTF-16 Unicode text
AND Localized ANSI text, using the user's locale
AND UTF-8 encoded HTML
AND other formats

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

unicode - When a character in a web page is copied (using CTRL+C) from within a browser, what gets stored in the clipboard: the

1 Answer 1

It depends on the browser implementation.

与本文相关的文章

评论列表(0)