File types (was Re: msword doc bug)


Subject: File types (was Re: msword doc bug)
From: Paul Rohr (paul@abisource.com)
Date: Wed Jan 19 2000 - 12:25:40 CST


[ This has segued into a design discussion, so I've pruned abiword-user from
the cc list. ]

There are a total of five discrete issues to be addressed here:

1. What happens when saving a Word document
--------------------------------------------
This is an issue for existing documents, where the user has two choices
under the File menu:

  Save -- try to resave in whatever format we imported it as
  Save As -- choose a format, then save (defaults to import format)

For any format which we know how to import, but not export -- this is
currently true only for Word, but the problem is more general -- our current
behavior is user-hostile. What we *should* do is notify the user that we
can't save in that format, and either:

  - warn them we're switching formats (and let them cancel), or
  - send them to the Save As dialog so they can pick a format themselves

Instead, what we currently do is *change* formats to our .abw default (which
is defensible), but without telling them (ugh) and without changing the file
type (double-ugh). Which leads us to #2 and #4...

2. Setting file types
----------------------
Any modern GUI OS has realized that it's a lot easier to generate a
user-friendly desktop experience if you can double-click on files to open
them in the right application. To do this, though, the OS needs to know how
to automatically associate each individual file with a particular
application and/or file type.

There are currently at least four different ways this is done, depending on
the OS:

  MacOS resource fork has magic cookies for filetype & creating app
  Windows each filename has a suffix, and the OS binds suffixes to apps
  BeOS uses MIME types for this purpose, but I'm fuzzy on the details
  Unix (none of the above)

Without devolving into a flamewar about which alternative is "better", the
point is that it's important for us to reliably set any such
platform-specific indication of file type in the appropriate way for that
OS. (On Windows, for example, that means adding the right suffix by
default.)

3. Making AbiWord files double-clickable
-----------------------------------------
In addition, we also need to do some platform-specific work to register
ourselves with the OS as being capable of handling double-clicks on those
files. For example, Jeff implemented this functionality for Windows in the
following file:

  abi/src/af/xap/win/xap_Win32Slurp.cpp

I suspect further work will need to be done to generate similar
functionality on our other platforms.

4. Sniffing file types
-----------------------
My claim is that it's not only OK, but actually preferable, to use some sort
of native file type indicator (NFTI) to figure out how to interpret a file
when opening it. If that's what the OS told users, that's what we should
try first, too.

However, regardless of the OS, at some point the NFTI (suffix, resource
fork, MIME type, or whatever) is either missing or wrong. This problem is
most pervasive on Unix, where the precedent of having NFTIs hasn't really
gotten started.

In this case, we need a fallback strategy for figuring out what's in the
document so we can open it properly. Those of us who used to write Web
browsers for a living called this process "sniffing". You actually open the
file, look at some number of 100 bytes at the head of the file, and then
guess which importer to use.

Since our import/export architecture allows an arbitrary number of
importers, any patches to implement sniffing should distribute that logic
among each importer (instead of doing it all in one place). For example,
see how the current suffix-guessing logic gets implemented in:

  abi/src/wp/impexp/xp/ie_imp.c

An obvious way to implement sniffing is to open the file once and pass a
copy of the first 10 or 100 bytes or so to each importer in turn via an API
like the existing fpRecognizeSuffix(). Which leads us to #5...

5. Make our format more sniffable
----------------------------------
Our native file format puts XML-style comments *before* the <abiword> tag
which contains the contents of the document (which means it's quite a ways
into the file).

Since most simple sniffers don't look very far into a file, it'd probably be
a Good Thing if we changed our exporter to put those comments immediately
*after* the <abiword> tag instead.

If we're using expat properly, I doubt that this should break file format
compatibility in any serious ways for existing users. In any event, since
those comments get dropped by the importer and readded by the exporter, the
workaround is trivial -- open the document in AbiWord and resave.

bottom line
-----------
We'd welcome patches which address any of these five issues, but I'd like to
suggest that people focus first on #1, which should be enough to fix
Elizabeth's problem.

Paul



This archive was generated by hypermail 2b25 : Wed Jan 19 2000 - 12:20:27 CST