Re: spellcheck

Eric W. Sink (eric@abisource.com)
Thu, 8 Apr 1999 19:24:31 -0600


>The second thing I'd like to see is dynamic loading of the american(or
>other).hash file... Is there a reason why abiword doesn't figure the
>size/endian/etc through either a configuration option (.abiwordrc) or at
>load time??

Nope -- our current implementation is simply lame.

When we decided to use ispell, librarified, as our spell check engine, we
didn't do enough homework to realize that ispell's dictionary files are
ultra-non-portable. They're dependent on int size, endian-ness, and even
the compiler struct packing conventions. It's gruesome, and we consider it
unacceptable.

The relatively new 'aspell' is an appealing prospect, but the code makes
heavy use of C++ templates, to which we have religious objections. We take
enough flack for using C++. We're very comfortable with the small,
easy-to-understand, highly-portable subset of C++ to which we have
constrained ourselves. Cross-platform is a big priority of the project,
and there are some important platforms which do not have a decent
implementation of C++ templates, and probably never will.

And so, an obvious solution, other than rewriting the spell check engine,
has not appeared.

We need a spell check library which is portable, in C or a suitably boring
subset of C++, which uses dictionary files in such a way that the file
itself can be used on any platform.

Let's see if I understand the dynamic loading idea you're envisioning: we
establish a naming convention for the different variants of the dictionary
hash file and AbiWord determines which one it needs at runtime, looking for
it by name. Other variants wouldn't actually have to be installed, as long
as AbiWord can find the one it wants.

A solution of this kind would certainly be a step forward from what we've
got, which is just plain broken. However, it strikes me as a hack to work
around the fact that we don't have a decent spell checking library. Not
that I'm saying we shouldn't do it -- in fact, it's the kind of hack I
would probably do, but you'd find sheepish apologies in the code comments.
:-)

FYI: Our dictionary management code is lame in other ways as well. We
intend to introduce language tags in the document structure to facilitate
multilingual documents. For example, a document could have 1 paragraph of
English, 1 paragraph in Spanish, and 1 in French, each tagged
appropriately, and the spell checker should use the proper dictionary file
for each one. We haven't done this yet, but it needs to be done.

I hope this note gives you the information you need.

Eric W. Sink, Software Craftsman
eric@abisource.com



This archive was generated by hypermail 1.03b2.