Re: XML_Char


Subject: Re: XML_Char
From: Vlad Harchev (hvv@hippo.ru)
Date: Mon Dec 18 2000 - 03:19:19 CST


On Mon, 18 Dec 2000, Mike Nordell wrote:

> Vlad Harchev wrote:
> > Mike, it seems you are wrong.
>
> Nope. Perhaps I didn't express myself clear enough, but wrong? Nope. :-)

 As usual :)
 
> > Whether each "symbol" of UTF-8 string takes one
> > or more bytes - it doesn't matter for C compiler.
>
> Oh yea? Ever tried to get the count of chars of an UTF-8 string at compile
> time? :-P

 Fortunately didn't :)

> The size of char[] and wchar_t[] can be deduced at compile time. UTF-8 is a
> more "dynamic" data type and doesn't fit the low level stuff. Actually, I
> think there should have been developed a reference C library implementation
> before even releasing this data type since it's incompatible with just about
> everything.
>
> > The type of UTF8 string is char[] for C compiler
>
> Speaking of compilers, now you're wrong. You can *never* express an UTF-8
> string in a char[], you need an unsigned char[], that is inherently
> incompatible with a C/C++ string literal.

 I doubt that C++ spec guarantees such problems - it seems it declares this
aspect as implemenation-dependant. At least I didn't have problems using chars
with value > 128 in arrays declared as char[] and never had problems with them
with any compilers I had. So it's safe for UTF8 too. May be just compiler's
authors are smart.

> > since it doesn't care about what's stored in that array.
>
> Well, the compiler doesn't care, of course not. I was speaking from a
> conceptually POV. I care, and I'd imagine that all maintainers-to-be also
> cares. It just happened to coincide with the fact that the compiler *can't*
> implicitly (without emitting diagnostic(s)) convert a C++ string literal
> into an "unsigned char*".

 Yes, I agree with this. But most compilers are rather smart to allow
disabling particular warnings, so it doesn't hurt that much.

> But since I now respond to this issue, I might add that I think libxml2 is
> more correct in typedefing XML_Char to unsigned char, making it inherently
> incompatible with a C/C++ string literal.
>
> But back to the issue that started this thread. Why do we even use functions
> that only allow XML_Char* as input when we mostly give them C++ string
> literals (which are const char[])? Not only is all this casting bad, ugly,
> wrong and code-bloat. It's conceptually wrong.

 May be we should provide an overloaded wrappers that will just cast their
args to XML_Char* and call original functions?
 
> /Mike - please don't cc
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Mon Dec 18 2000 - 04:02:52 CST