Re: Unicode UCS-2 importer


Subject: Re: Unicode UCS-2 importer
From: Sam TH (sam@uchicago.edu)
Date: Tue May 15 2001 - 09:24:56 CDT


On Tue, May 15, 2001 at 11:23:17PM +1000, Andrew Dunbar wrote:
> Tomas Frydrych wrote:
> >
> > Hi Andrew,
> >
> > I have had a quick look at the importer and overall it looks good,
> > but I would be much happier if we could do without the goto
> > construct, something like
> >
> > if( error = _writeHeader(fp) == UT_OK)
> > {
> > error = _parseFile(fp);
> > }
> > fclose(fp);
> > return error;
>
> Actually my code is derived from the Text and UTF-8 importers so
> maybe all three need to be fixed?
>
> > Also, if the file does not contain a BOM marker, the exporter
> > assumes the file is little-endian. Should we not, assume the
> > opposite, since, if I am not mistaken, Unicode is bigendian by
> > definition.
>
> Well I'm not sure what the long term solution would be but
> nobody has replied to my Unicode text import post yet.
> The immediate goal is to be able to import Windows Notepad and
> MS Word Unicode files and they use UCS-2 little-endian.
>
> A better solution would either require a parameter to specify
> endianness or two separate importers and exporters. One for
> UCS-2 big-endian and one for UCS-2 little endian. Ugly.
>
> I'm actually beginning to think we should just have one Text
> importer and one Text exporter and since they use iconv
> already we should just treat ISO, UTF-8, UCS-2 little and
> big endian as an encoding parameter. That'll cut down on
> code duplication too. Any ideas?

This is an excellent idea, the fewer importers we have, the better.
(Actually, my goal would be to have no choices in the open dialog, and
have AbiWord just do the right thing automatically). However,
according to unicode.org, there are both UTF-16 big endian and little
endian encodings, both of which are valid, and which don't have to
have a BOM. That's in addition to documents w/ a BOM.

Is there still a way we can do everything automatically?
           
sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/
OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key
DeCSS: http://samth.dynds.org/decss




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:04 CDT