Re: XML_Char


Subject: Re: XML_Char
From: Mike Nordell (tamlin@algonet.se)
Date: Mon Dec 18 2000 - 01:51:34 CST


Vlad Harchev wrote:
> Mike, it seems you are wrong.

Nope. Perhaps I didn't express myself clear enough, but wrong? Nope. :-)

> Whether each "symbol" of UTF-8 string takes one
> or more bytes - it doesn't matter for C compiler.

Oh yea? Ever tried to get the count of chars of an UTF-8 string at compile
time? :-P
The size of char[] and wchar_t[] can be deduced at compile time. UTF-8 is a
more "dynamic" data type and doesn't fit the low level stuff. Actually, I
think there should have been developed a reference C library implementation
before even releasing this data type since it's incompatible with just about
everything.

> The type of UTF8 string is char[] for C compiler

Speaking of compilers, now you're wrong. You can *never* express an UTF-8
string in a char[], you need an unsigned char[], that is inherently
incompatible with a C/C++ string literal.

> since it doesn't care about what's stored in that array.

Well, the compiler doesn't care, of course not. I was speaking from a
conceptually POV. I care, and I'd imagine that all maintainers-to-be also
cares. It just happened to coincide with the fact that the compiler *can't*
implicitly (without emitting diagnostic(s)) convert a C++ string literal
into an "unsigned char*".

But since I now respond to this issue, I might add that I think libxml2 is
more correct in typedefing XML_Char to unsigned char, making it inherently
incompatible with a C/C++ string literal.

But back to the issue that started this thread. Why do we even use functions
that only allow XML_Char* as input when we mostly give them C++ string
literals (which are const char[])? Not only is all this casting bad, ugly,
wrong and code-bloat. It's conceptually wrong.

/Mike - please don't cc



This archive was generated by hypermail 2b25 : Mon Dec 18 2000 - 01:50:01 CST