Text import


Subject: Text import
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun May 13 2001 - 08:20:02 CDT


Hi all. I've just noticed the Text importer has been changed again so
that a file with a .txt extension will always be loaded according to
the locale even if it's a UTF-8 file. UTF-8 files are therefore forced
to have a .utf8 extension.

Though it does seem logical that a .txt file might be defined to not
ever be UTF-8, real life does not match this assumption. On my machine
I have many Unicode files - both UTF-8 and UCS-2 but *all* of these
files have a .txt extension. I am running Windows 2000.

On W2K and presumable WXP, everything now works with Unicode, including
basics such as Notepad. Notepad provides the option to save as "ANSI",
"Unicode", "Unicode big endian", and "UTF-8". No matter which format
you choose, the file will have a .txt extension.

We need to discuss and plan for files with a .txt extension to be
other than ASCII or ISO 8859.

* What are the file naming conventions for Mac, Be, *nix for:
  * ASCII/ISO/ANSI
  * Unicode:
    * UTF-8
    * UCS-2

One possibility is to add a "Unicode" importer/exporter with platform-
dependent code which expects UTF-8 on *nix, UCS-2 on Windows, ??? on
others.

We need to keep in mind that Windows users are not aware of "UCS-2"
since Windows always uses the term "Unicode", but that they may be
aware of UTF-8 as an alternative, cross-platform encoding.

Please let us discuss these issues. In the meantime shall I work on
UCS-2 importers/exporters which expect .ucs2 filenames?

Thanks for your attention.

Andrew Dunbar.

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:04 CDT