Re: spellcheck

Shaw Terwilliger (sterwill@postman.abisource.com)
Thu, 8 Apr 1999 19:36:37 -0500


On Thu, Apr 08, 1999 at 05:02:11PM -0700, Darren O. Benham wrote:
> The second thing I'd like to see is dynamic loading of the american(or
> other).hash file... Is there a reason why abiword doesn't figure the
> size/endian/etc through either a configuration option (.abiwordrc) or at
> load time??

We'd like to load them dynamically in the future, but right now we've
just integrated some ispell code which I don't know well enough to
say whether this will be easy or hard. ispell is a very old, much
extended piece of code.

As for the latter question, have you ever tried writing code to
read 8 different combinations of endianness, word size, and
alphabet out of arbitrary files? Not too horrible, but have you
ever looked at ispell's code? :) More to the point, ispell
seems to encode lots of compiler-isms into its hash file, and as
it reads out whole structs at a time, the resultant hash file
must have structs padded out for your compiler. We can't predict
that at all. Converting the endianness of elements larger than
one byte isn't too hard, but again, ispell is mighty crufty.
Word size is... a pain. I should look through the code more,
but I don't suspect it would be trivial to make ispell use non-machine
data sizes for everything it does.

Beyond word sizes, endianness, and alignment, there is the
issue of alphabets and languages in dictionaries. The ispell
build process is made to let the user pick the language he or
she will be using, then include the word lists appropriate for
the language and applications of ispell on that system, and
then build the program and dictionary hashes. For some languages,
more than the default 26 characters are needed. For example,
Danish extends the dictionaries to 52 character alphabets.
Supporting these at run-time means making AbiWord know how to
load both kinds through ispell. Again, I don't think this is
trivial without essentially duplicating the ispell code base
for each of these alphabet combinations. There are probably
more hash "options" encoded in the files I can't remember.

There are other spell checking solutions out there, and we're
looking at them (most notably aspell). Any other suggestions are
welcome, and if you don't find my excuses for not hacking ispell
sufficient, you're always welcome to give me patches if you
feel like hacking on it. :)

-- 
Shaw Terwilliger


This archive was generated by hypermail 1.03b2.