Re: Patch: Multi-encoding Text import/export


Subject: Re: Patch: Multi-encoding Text import/export
From: Vlad Harchev (hvv@hippo.ru)
Date: Sun May 20 2001 - 14:19:15 CDT


On Mon, 21 May 2001, Andrew Dunbar wrote:

> Vlad Harchev wrote:
> >
> > On Sun, 20 May 2001, Andrew Dunbar wrote:
> >
> > > Vlad Harchev wrote:
> > > >
> > > > On Sun, 20 May 2001, Andrew Dunbar wrote:
> > > >
> > > > > Sam TH wrote:
> > > > > > Other than that, this is excellent.
> > > > >
> > > > > Thanks! I've found that we must make the Text Encoding a per-
> > > > > document feature instead of based entirely on the locale.
> > > > > I need to know how to add an "encoding" field to AbiWord's
> > > > > document class - this will also be very useful for at least
> > > > > the HTML and RTF importers and exporters - probably more.
> > > >
> > > > I think they are needed. Both RTF and HTML formats pretty precisely specify
> > > > encoding (RTF - in some backward way) - so it's not necessary. The only use is
> > > > if someone exported file by (or wants to export for importing into) some
> > > > widely spread non-following specs app. I don't know ones that satisfy both
> > > > conditions :)
> > >
> > > Sorry Vlad. I don't understand if you're saying adding this is
> > > a good or a bad idea. I think it's an essential idea so we can
> > > load an HTML document in Shift-JIS encoding and save it as a plain
> > > text file in EUC-JP encoding on a machine with an English locale.
> > >
> > > Just the kind of thing I use MS Word for now...
> >
> > Why user might want to save HTMLs in some particular encoding? HTMLs can be
> > put on the web in any encoding (if it's mentioned in the header) - and any
> > compliant and reasonable browser will be able to show them regardless of
> > encoding. The only case is - the user is web developer and needs to have
> > HTML in some particular encoding (for hand-editing) - but such people should
> > also have tools for converting texts between various encodings.
> > So, church secretary shouldn't bother about knowning what encoding is. Just
> > save in utf8 most of the time (or allow to select text encoding to write
> > HTML in using abiword preferences file - without any GUI to save
> > programmer's efforts) (and probably better write generic portable
> > utility to convert text between arbitrary encodings, that won't relate to
> > AbiWord project).
>
> HTML is only one example. It may be company policy that a certain
> encoding is used. Handheld devices with embedded browsers may only
> support certain encodings. RTF also has the concept of a native
> encoding. If I load a document in one filetype and it's in a certain

 In fact, the native encoding of RTF (as specified by \ansicpg) is ignored by
word itself - it uses current font charset (specified by \fcharset) when
interpreting non-unicode characters (i.e. the ones, not specified with
\uNNNN).

> encoding it's reasonable to expect that saving in a different filetype
> will preserve the encoding. And what about plain text? Users might
> like to
> load a Japanese web page and save as plain text for use with legacy
> software that doesn't yet support UTF-8.
>
> I'm not a church secretary I suppose but I do handle a lot of text
> in many encodings and many formats. I'm a Microsoft hater and find
> MS Word excellent for this job - I want a free alternative.

 I think some generic utility should better be written instead of following
MS's way - that utility will just convert raw stream from one encoding to
another. Of course unix has dozens of such utilities - but other platforms are
lacking it.
 
> Anyway I'm willing to implement this feature - I just need some
> pointers on how/where to add a new field to the internal document
> class.

 I don't understand why new field to existing document class is needed.
 I think we should a special exporter/importer that allows to specify encoding
of the document under a name of base exporter/importer with something appended
(e.g. "HTML" and "HTML (encoded)" - as in OpenOffice). If user selects
"..(encoded)" version ,dialog pops up and asks what encoding to use).

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT