Re: Patch: Multi-encoding Text import/export


Subject: Re: Patch: Multi-encoding Text import/export
From: Sam TH (sam@uchicago.edu)
Date: Sat May 19 2001 - 13:26:06 CDT


On Sat, May 19, 2001 at 11:39:13PM +0500, Vlad Harchev wrote:
> On Sat, 19 May 2001, Sam TH wrote:
>
> Hi,
>
> > On Sat, May 19, 2001 at 06:19:21PM +1000, Andrew Dunbar wrote:
> > > I consider this a pretty important change.
> > >
> > > It allows you to import a text file no matter if
> > > it's an old 8-bit encoding, UTF-8, or UCS-2 as is
> > > used in Windows and Mac OSX.
> > >
> > > It also allows you to export to any of these text
> > > formats - though changes are needed to the rest of
> > > AbiWord to fully support this.
> > >
> > > This also means we will no longer need separate
> > > UTF-8 and UCS-2 importers and exporters and any
> > > .txt file will "just work" - perfect for church
> > > secretaries (:
> > >
> > > Please somebody have a serious look at this!
> > > Feedback much appreciated.
> >
> > This looks really good. A couple quick comments:
> >
> > - _recognizeUCS/UTF8 should definitely be members of class.
> > IE_Imp_Text_Sniffer is probably the best choice.
> >
> > - All the new functions need doxygen comments.
> >
> > Those two you should fix before someone commits this. They shouldn't
> > be too hard.
> >
> > The third thing is that UTF8 can be various-endian as well, so you
> > probably want to detect that.
>
> You are plain wrong here. UTF8 is a sequence of bytes (and the ability
> to recognize offset from start of sequence is the key feature of utf8 - utf8
> can't be endian).
>

Doh. I was wrong. Sorry.
           
sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/
OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key
DeCSS: http://samth.dynds.org/decss




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT