Re: Unicode UCS-2 importer


Subject: Re: Unicode UCS-2 importer
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Tue May 15 2001 - 08:23:17 CDT


Tomas Frydrych wrote:
>
> Hi Andrew,
>
> I have had a quick look at the importer and overall it looks good,
> but I would be much happier if we could do without the goto
> construct, something like
>
> if( error = _writeHeader(fp) == UT_OK)
> {
> error = _parseFile(fp);
> }
> fclose(fp);
> return error;

Actually my code is derived from the Text and UTF-8 importers so
maybe all three need to be fixed?

> Also, if the file does not contain a BOM marker, the exporter
> assumes the file is little-endian. Should we not, assume the
> opposite, since, if I am not mistaken, Unicode is bigendian by
> definition.

Well I'm not sure what the long term solution would be but
nobody has replied to my Unicode text import post yet.
The immediate goal is to be able to import Windows Notepad and
MS Word Unicode files and they use UCS-2 little-endian.

A better solution would either require a parameter to specify
endianness or two separate importers and exporters. One for
UCS-2 big-endian and one for UCS-2 little endian. Ugly.

I'm actually beginning to think we should just have one Text
importer and one Text exporter and since they use iconv
already we should just treat ISO, UTF-8, UCS-2 little and
big endian as an encoding parameter. That'll cut down on
code duplication too. Any ideas?

Andrew.

> > Here is my first attempt at an importer for UCS-2.
> >
> > This is the native Unicode encoding on Windows 2000 and newer
> > as produced by Notepad and MSWord. It handles little-endian
> > and big-endian UCS-2 and Microsoft's BOMs.
> >

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:04 CDT