COMPOUND_TEXT versus UTF8_STRING
Keith Packard
keithp at keithp.com
Thu Sep 23 09:37:13 PDT 2004
Around 9 o'clock on Sep 23, Markus Kuhn wrote:
> There is certainly no harm done by encouraging in the next ICCCM version
> the recipients of all the properties where STRING and COMPOUND_TEXT are
> allowed today to also accept UTF8_STRING, in addition to the existing
> STRING and COMPOUND_TEXT ones.
Yes, it seems reasonable to prepare applications to accept UTF-8 in these
strings.; that's well within the scope of the existing ICCCM wording, and
given the current Xlib implementation, it's already supported by most
applications today.
> In addition, there is little harm done in using UTF8_STRING whenever the
> text to be transmitted contains at least one character for which STRING
> and COMPOUND_TEXT provide no encoding (think of Ethiopian or Vietnamese
> window titles).
Owen Taylor pointed out that Bruno Haible added UTF-8 support to the Xlib
COMPOUND_TEXT code. However, this does not mean that applications will
actually understand the resulting UTF-8 sequences; it requires that the
receiving application be running a compatible version of Xlib.
> Unless the recipient understands UTF-8 (and therefore
> probably also implements already UTF8_STRING), the data will be
> meaningless to them anyway.
I note that the existing Xlib property conversion functions handle
UTF8_STRING in parallel with COMPOUND_TEXT meaning that any applications
using the X.org Xlib to handle COMPOUND_TEXT will transparently handle
UTF8_STRING already.
It appears to me that we have two reasonable directions to go:
1) Assume all applications use X.org Xlib functions to handle text
values, in which case they will handle UTF-8 as either
COMPOUND_TEXT or UTF8_STRING (and in which case we should
just use UTF8_STRING).
2) Accept that some applications are not using the X.org Xlib
text handling functions, in which case we cannot use UTF-8
in any form for property values (neither as UTF8_STRING nor
even COMPOUND_TEXT with UTF-8 sequences).
In case 1), we're all set -- just start using UTF8_STRING values for the
standard ICCCM properties and expect that applications will "just work".
In case 2), I see the EWMH as the obvious solution -- set the existing
ICCCM properties using STRING (or, if you must, COMPOUND_TEXT without
UTF-8 sequences) and place the actual data in the EWMH properties.
What I don't see the need for is support for UTF-8 sequences in
COMPOUND_TEXT format strings -- given the necessary Xlib support exists
only in a library which transparently handles UTF8_STRING format
properties, there's little reason to add the COMPOUND_TEXT wrapper.
Unfortunately, once we go with the EWMH as the standard, I see no way of
getting out of that; we've essentially said that there are only two
possible encodings for TEXT properties -- STRING and COMPOUND_TEXT
(without UTF-8 sequences).
-keith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg/attachments/20040923/5c24eca7/attachment.pgp>
More information about the xorg
mailing list