Re: POW -- which locales Just Work?


Subject: Re: POW -- which locales Just Work?
From: Vlad Harchev (hvv@hippo.ru)
Date: Thu Mar 01 2001 - 10:33:34 CST


On Thu, 1 Mar 2001, Sam TH wrote:

 Hi,

> On Thu, Mar 01, 2001 at 05:31:55PM +0100, Karl Ove Hufthammer wrote:
> > ----- Original Message -----
> > From: "ha shao" <hashao@chinese.com>
> > To: <abiword-dev@abisource.com>
> > Sent: Thursday, March 01, 2001 2:36 PM
> > Subject: Re: POW -- which locales Just Work?
> >
> > > On Thu, Mar 01, 2001 at 04:09:52PM +0400, hvv@hippo.ru wrote:
> >
> > > > From this sentence one may think that saving in unicode is a better
> > approach
> > > > than saving in native charset. It's wrong - since the charset is specified
> > in
> > > > the xml header, storing documents in any charset will work fine (as long as
> > > > importing system's iconv understands that encoding).
> >
> > Saving in 'UTF-8' or 'UTF-16' *is* much better than using a other charsets. Not
> > because of AbiWord, but because *other* programs may be reading AbiWord
> > documents. People implementing XML parsers (which is used by several programs,
> > e.g. XSLT engines) don't want to implement hundreds of character encodings, as
> > this will 1) be much work, 2) increase the size/bloat, and 3) be unnecessary.
>
> Yes. Expat supports only four encodings: UTF-8, UTF-16, ISO-8859-1,
> and US-ASCII. We have to save in one of these. Right now we do
> 8859-1. I really think it would be benificial to do UTF-8, so that
> things just work for more character encodings.

 Yes, but expat has an interface to "teach" it any 8bit encoding. I've added a
special support for importing xml in ARBITRARY single-byte encoding (i.e. not
mutlibyte encoding) when using expat - that functionality was present in 1st
version of my 18n patch, and was successfully integrated in AW. So, importing
xmls in native russian encodings works fine with AW compiled with expat too.
Same for any other 8bit encodings. See
XAP_EncodingManager::XAP_XML_UnknownEncodingHandler(...)
Of course, chars that can't be represented by a character in native encoding
are exported as &#HHHH; - so nothing is lost.
     
 So, nothing to worry, nothing to change/fix.

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Thu Mar 01 2001 - 11:23:48 CST