Re: XML_Char


Subject: Re: XML_Char
From: Vlad Harchev (hvv@hippo.ru)
Date: Mon Dec 18 2000 - 08:12:28 CST


On Mon, 18 Dec 2000, Mike Nordell wrote:

> Vlad Harchev wrote:
> > > Nope. Perhaps I didn't express myself clear enough, but wrong? Nope. :-)
> >
> > As usual :)
>
> Is there any other way? :-O ;-)

 It doesn't matter :)

> > > Speaking of compilers, now you're wrong. You can *never* express an
> UTF-8
> > > string in a char[], you need an unsigned char[], that is inherently
> > > incompatible with a C/C++ string literal.
> >
> > I doubt that C++ spec guarantees such problems - it seems it declares
> this
> > aspect as implemenation-dependant.
>
> That would be the day, when the C++ spec guaranteed *problems* not promises.
> ;-)
>
> AFAIK, it doesn't explicitly mention using negative char values in string
> literals since that is by itself an "implementation specific" thing. Sure, I
> can create a string literal containing "char"-negative values, e.g.
> "foo\xa2bar" which would contain an embedded 162 value that would be usable
> on at least 93% of current platforms *if "casted" to an unsigned char*, but
> would it be (conceptually) resonable to assign this literal to a char*? No,
> since the usual implementation would evaluate its fourth element to the
> value -94 (or thereabout, I didn't bother to check).

 You are trying to say that using strings with chars with value > 128 is not
portable? I hope not.
 Most uses of character arrays' members are in functions located in system
libraries, that should be prepared for chars with value > 128. If they are not
- they are broken. Also, any strings.h of any modern system contains
overloaded declarations of all functions in 3 varieties:
        char*
        unsigned char*
        signed char* e.g. for strlen - so that programmer doesn't have to cast
anything. Alas, not every system follows this nice rule.

> > At least I didn't have problems using chars
> > with value > 128 in arrays declared as char[] and never had problems with
> them
> > with any compilers I had. So it's safe for UTF8 too. May be just
> compiler's
> > authors are smart.
>
> It's not the act of pointing a char* at memory that contains char-negative
> values that is the problem, it's reading from that memory using the char
> data type that is.

 I agree it's inconvenient.
 
> [...]
> > > I was speaking from a
> > > conceptually POV. I care, and I'd imagine that all maintainers-to-be
> also
> > > cares. It just happened to coincide with the fact that the compiler
> *can't*
> > > implicitly (without emitting diagnostic(s)) convert a C++ string literal
> > > into an "unsigned char*".
> >
> > Yes, I agree with this. But most compilers are rather smart to allow
> > disabling particular warnings, so it doesn't hurt that much.
>
> "Most compilers" and "disable ... warnings" doesn't really cut it when we're
> talking about XP code. Either we comply with C++ or we don't. In this case
> you suggest we don't, and I strongly disagree.

 Yes, XP forces us to play a fair game.

> > > But back to the issue that started this thread. Why do we even use
> functions
> > > that only allow XML_Char* as input when we mostly give them C++ string
> > > literals (which are const char[])? Not only is all this casting bad,
> ugly,
> > > wrong and code-bloat. It's conceptually wrong.
> >
> > May be we should provide an overloaded wrappers that will just cast their
> > args to XML_Char* and call original functions?
>
> Yes. I think this is the cleanest (i.e. most non-intrusive) suggestion to
> date. Inline wrappers that does the casting.
>
> Btw, I have once again admit I made a small error. OK, I didn't look at the
> standard while writing that particular part.
>
> A string literal is not of type "const char[]" but of type "char[]". This is
> an (unfortunate) C inheritance, where "foo"[2] = '\0'; is legal. :-(
>
> And while at bad code, did you know the following was legal?
>
> int index = 2;
> char* pFoo = "foo";
> index[pFoo] = '\0';
>
> Try it, you might be surprised it compiles. :-)

 Yes, I knew that it's legal and that string litteral is indeed char[], not
'const char[]', but I didn't want to shame you :)

 Gcc even has a switch - -fwriteable-strings AFAIR that allows code above to
work without segfault.

 But I will be boring as usual - let's fix bugs and flaws first :)
 As usual, I don't have time for this :)

> /Mike - please don't cc
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Mon Dec 18 2000 - 08:37:23 CST