Re: POW -- which locales Just Work?


Subject: Re: POW -- which locales Just Work?
From: Sam TH (sam@uchicago.edu)
Date: Thu Mar 01 2001 - 12:00:33 CST


On Thu, Mar 01, 2001 at 08:27:07PM +0400, Vlad Harchev wrote:
> On Thu, 1 Mar 2001, Karl Ove Hufthammer wrote:
>
> > Saving in 'UTF-8' or 'UTF-16' *is* much better than using a other
> > charsets. Not because of AbiWord, but because *other* programs may be
> > reading AbiWord documents. People implementing XML parsers (which is used
> > by several programs, e.g. XSLT engines) don't want to implement hundreds
> > of character encodings, as this will 1) be much work, 2) increase the
> > size/bloat, and 3) be unnecessary.
>
> Hmm, if xml parser supports only utf8 or utf-16, it's broken.

This is incorrect. The XML Reccomendation requires merely that UTF-8
and UTF-16 be supported.

> People should
> stick to libxml then. Also, a trivial sed script and iconv can be used to
> convert xml file in any encoding to valid xml file utf8 encoded.
>

Well, since we don't intend to give up expat, we have the following
options:

1. Continue current practice. This discussion suggests that current
practice is broken for non-Latin1 encodings.

2. Encode in UTF-8. Vlad suggests that this is bad for single-byte
encodings that are not Latin1.

3. Provide a way for Expat to handle other encodings. This requires
using the expat functions for unknown encodings. See the expat.h
header file for more info.

1 is broken. 2 may be broken, but less so than 1. 3 requires
coding.

Absent the coding required for 3, I think we should switch to 2. Of
course, 3 is preferable.

> For non-latin1 languages, e.g. russian, conversion to utf8 doubles file in
> size, and makes it uneditable by plain editors. There are much more
> editors that don't support utf8 than xml parsers that don't support all
> encodings understood by iconv(3).
>
> So, PLEASE don't stick to utf8 for all locales.

           
        sam th
        sam@uchicago.edu
        http://www.abisource.com/~sam/
        GnuPG Key:
        http://www.abisource.com/~sam/key




This archive was generated by hypermail 2b25 : Thu Mar 01 2001 - 11:56:17 CST