Re: POW -- which locales Just Work?


Subject: Re: POW -- which locales Just Work?
From: Vlad Harchev (hvv@hippo.ru)
Date: Thu Mar 01 2001 - 12:45:21 CST


On Thu, 1 Mar 2001, Sam TH wrote:

> On Thu, Mar 01, 2001 at 08:27:07PM +0400, Vlad Harchev wrote:
> > On Thu, 1 Mar 2001, Karl Ove Hufthammer wrote:
> >
> > > Saving in 'UTF-8' or 'UTF-16' *is* much better than using a other
> > > charsets. Not because of AbiWord, but because *other* programs may be
> > > reading AbiWord documents. People implementing XML parsers (which is used
> > > by several programs, e.g. XSLT engines) don't want to implement hundreds
> > > of character encodings, as this will 1) be much work, 2) increase the
> > > size/bloat, and 3) be unnecessary.
> >
> > Hmm, if xml parser supports only utf8 or utf-16, it's broken.
>
> This is incorrect. The XML Reccomendation requires merely that UTF-8
> and UTF-16 be supported.
>
> > People should
> > stick to libxml then. Also, a trivial sed script and iconv can be used to
> > convert xml file in any encoding to valid xml file utf8 encoded.
> >
>
> Well, since we don't intend to give up expat, we have the following
> options:
>
> 1. Continue current practice. This discussion suggests that current
> practice is broken for non-Latin1 encodings.

 I think current practice is the most preferable (modulos exorting CJK chars
as utf8 rather than &#HHHH;).
 
> 2. Encode in UTF-8. Vlad suggests that this is bad for single-byte
> encodings that are not Latin1.
 
 And utf8 or not - doesn't matter for latin1 locales of course.

> 3. Provide a way for Expat to handle other encodings. This requires
> using the expat functions for unknown encodings. See the expat.h
> header file for more info.
>
> 1 is broken. 2 may be broken, but less so than 1. 3 requires
> coding.
>
> Absent the coding required for 3, I think we should switch to 2. Of
> course, 3 is preferable.

 Sam, sorry I don't understand what you mean here - may be I was unclear in my
message. 3) is ALREADY implemented and is there since 0.7.12 and works fine!
And of course libxml handles ALL encodings that system iconv() knows, so it
seems everything is in the perfect state right now and shouldn't be changed!
 
 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Thu Mar 01 2001 - 13:21:53 CST