Re: Patch: Multi-encoding Text import/export


Subject: Re: Patch: Multi-encoding Text import/export
From: Sam TH (sam@uchicago.edu)
Date: Sat May 19 2001 - 09:52:01 CDT


On Sat, May 19, 2001 at 06:19:21PM +1000, Andrew Dunbar wrote:
> I consider this a pretty important change.
>
> It allows you to import a text file no matter if
> it's an old 8-bit encoding, UTF-8, or UCS-2 as is
> used in Windows and Mac OSX.
>
> It also allows you to export to any of these text
> formats - though changes are needed to the rest of
> AbiWord to fully support this.
>
> This also means we will no longer need separate
> UTF-8 and UCS-2 importers and exporters and any
> .txt file will "just work" - perfect for church
> secretaries (:
>
> Please somebody have a serious look at this!
> Feedback much appreciated.

This looks really good. A couple quick comments:

- _recognizeUCS/UTF8 should definitely be members of class.
  IE_Imp_Text_Sniffer is probably the best choice.

- All the new functions need doxygen comments.

Those two you should fix before someone commits this. They shouldn't
be too hard.

The third thing is that UTF8 can be various-endian as well, so you
probably want to detect that.

Question: does our current UTF8 export use a byte-order mark? If not,
it probably should.

Other than that, this is excellent.
           
sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/
OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key
DeCSS: http://samth.dynds.org/decss




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT