Re: AbiSource, Ispell, Aspell and beyond...


Subject: Re: AbiSource, Ispell, Aspell and beyond...
From: Paul Rohr (paul@abisource.com)
Date: Fri Feb 25 2000 - 20:01:13 CST


At 10:04 PM 2/22/00 -0500, Kevin Atkinson wrote:
>Do you maintain the session and personal word lists? How do you plan on
>handling storing session dictionaries with a document. What about
>allowing the user to edit the personal dictionary.

You're talking about add, ignore, and ignore all, right? They all work now
from both the context menu and the dialog. (You *have* used AbiWord,
right?)

As for persistence, like other word processors we currently don't store the
"ignore all" list, but it wouldn't be hard to add that to the document
format.

The personal dictionary is stored as a vanilla UTF8 file which is editable
in AbiWord. (Or it will be as soon as we add the code to turn off spell
checking while the dictionary is being edited.) This is really cool. As an
editing UI, it beats the *pants* off any dialog-based UI I've ever seen for
the same functionality, because you can scroll and find in a big viewport,
instead of marching through some teeny little widget.

>> 4. We can feed it words in UCS2 and not have to take a huge performance
hit
>> on the lookup.
>
>By this you mean?

We store all our content internally as Unicode (UCS2), which means that
depending on what charset the ispell hash uses, we often need to take a
speed hit for charset-conversion on a per-word basis. I suspect that this
pales in comparison to the work needed to actually look up words, though.

>> 5. We get fast, accurate results.
>
>Do you mean suggestion wise? Aspell may take a while (up to a second)
>to come up with suggestions so its best to first check if the word
>exists and then have aspell come up with the suggestions in the
>background when you are spell checking the document as the user types.

Actually, the operation which really needs to be fast is looking up the word
in the first place, since we're doing that in response to editing
operations, and don't want typing to slow down.

Suggestions can be slower, since we only look them up on demand. However, a
full second of wall time to raise the context menu would probably make the
GUI feel sluggish.

>I do NOT recommend this. You should use the system dictionaries if they
>are available. Also Aspell CAN NOT use a simple word list. Double,
>aspell dictionaries have more than byte order problems and should be
>compiled for each archicile just like the binaries are....

This is a distribution issue for us.

1. We *have* to ship dictionaries. Yes, it's certainly a Good Thing to use
pre-installed dictionaries if they're available. One of the reasons we
added the system.profile was so that folks like Darren (who packages AbiWord
for Debian) can specify distro-specific stuff like this.

However, over 50% of our users don't have *any* usable dictionaries
installed on their systems -- remember how many platforms we support? -- and
so we have to ship something for them to use. Otherwise the product won't
Just Work as soon as they run it.

2. Having to build and distribute multiple dictionaries (one per language)
is already more than we've managed to handle to date. I don't even want to
think about the combinatorial explosion of building (and testing) a whole
bunch of architecture-specific variants. Gack! What a nightmare. Anyone
who thinks it isn't hasn't ever tried.

>> It looks like your proposed API is intended to solve a whole different
class
>> of problems. Am I missing something here?
>
>It is intended to be GENERIC not AbiSource specific. You can use the
>parts of it that you need. The other parts you can ignore.

I'm sorry. I'm still confused. Is the your entire proposed API intended to
be public, or are a lot of those details implementation-specific? I
couldn't figure out which parts I was supposed to zero in on.

I suppose I'm showing my ignorance here, but that sure looks like a large
API for what feels like a pretty simple task. Perhaps my confusion stems
from the fact that you're trying to implement a bunch of ispell
functionality we don't use?

So I guess I'll repeat the question -- what class of problems are you trying
to solve with this API? I've tried to explain what our needs are from a
spell engine, and now I'd like to understand your proposal.

Thanks.

Paul,
API minimalist



This archive was generated by hypermail 2b25 : Fri Feb 25 2000 - 19:55:42 CST