Re: AbiSource, Ispell, Aspell and beyond...


Subject: Re: AbiSource, Ispell, Aspell and beyond...
From: Kevin Atkinson (kevinatk@home.com)
Date: Fri Feb 25 2000 - 20:29:27 CST


On Fri, 25 Feb 2000, Paul Rohr wrote:

> At 10:04 PM 2/22/00 -0500, Kevin Atkinson wrote:
> >Do you maintain the session and personal word lists? How do you plan on
> >handling storing session dictionaries with a document. What about
> >allowing the user to edit the personal dictionary.
>
> You're talking about add, ignore, and ignore all, right? They all work now
> from both the context menu and the dialog. (You *have* used AbiWord,
> right?)

No, no, no. Do YOU maintain the session and personal word lists or does
ispell? When coming up with near misses can the personal and session
words appear as near misses? This is an important question so please
don't ignore it.

> >> 4. We can feed it words in UCS2 and not have to take a huge performance
> hit
> >> on the lookup.
> >
> >By this you mean?
>
> We store all our content internally as Unicode (UCS2), which means that
> depending on what charset the ispell hash uses, we often need to take a
> speed hit for charset-conversion on a per-word basis. I suspect that this
> pales in comparison to the work needed to actually look up words, though.

Probably yes. Aspell will need to convert it also as it is 8 bit and will
be for the conceivable future. Before telling me about the benefits of
unicode please read http://aspell.sourceforge.net/international/.

> >> 5. We get fast, accurate results.
> >
> >Do you mean suggestion wise? Aspell may take a while (up to a second)
> >to come up with suggestions so its best to first check if the word
> >exists and then have aspell come up with the suggestions in the
> >background when you are spell checking the document as the user types.

> Actually, the operation which really needs to be fast is looking up the word
> in the first place, since we're doing that in response to editing
> operations, and don't want typing to slow down.

Lookups will defiantly be as fast as ispell if not faster.

> Suggestions can be slower, since we only look them up on demand. However, a
> full second of wall time to raise the context menu would probably make the
> GUI feel sluggish.

A full second is only on bad cases with short words. There is also a
suggestion mode in aspell which is over 5 times faster but doesn't give
quite as good results.

> >I do NOT recommend this. You should use the system dictionaries if they
> >are available. Also Aspell CAN NOT use a simple word list. Double,
> >aspell dictionaries have more than byte order problems and should be
> >compiled for each archicile just like the binaries are....
>
> This is a distribution issue for us.
>
> 1. We *have* to ship dictionaries. Yes, it's certainly a Good Thing to use
> pre-installed dictionaries if they're available. One of the reasons we
> added the system.profile was so that folks like Darren (who packages AbiWord
> for Debian) can specify distro-specific stuff like this.
>
> However, over 50% of our users don't have *any* usable dictionaries
> installed on their systems -- remember how many platforms we support? -- and
> so we have to ship something for them to use. Otherwise the product won't
> Just Work as soon as they run it.
>
> 2. Having to build and distribute multiple dictionaries (one per language)
> is already more than we've managed to handle to date. I don't even want to
> think about the combinatorial explosion of building (and testing) a whole
> bunch of architecture-specific variants. Gack! What a nightmare. Anyone
> who thinks it isn't hasn't ever tried.

You have to distribute archive specific binaries right? Anyway I am not
going to argue with you as I don't have experience in this.

>
> >> It looks like your proposed API is intended to solve a whole different
> class
> >> of problems. Am I missing something here?
> >
> >It is intended to be GENERIC not AbiSource specific. You can use the
> >parts of it that you need. The other parts you can ignore.
>
> I'm sorry. I'm still confused. Is the your entire proposed API intended to
> be public, or are a lot of those details implementation-specific? I
> couldn't figure out which parts I was supposed to zero in on.

All public.

> I suppose I'm showing my ignorance here, but that sure looks like a large
> API for what feels like a pretty simple task. Perhaps my confusion stems
> from the fact that you're trying to implement a bunch of ispell
> functionality we don't use?

As I said before aspell is a lot smarter than ispell so needs more
information.

> So I guess I'll repeat the question -- what class of problems are you trying
> to solve with this API? I've tried to explain what our needs are from a
> spell engine, and now I'd like to understand your proposal.

To provide a generic spell checker interface that any project can use
which will use the best engine available and will let the engine work to
its full potential.

Will an example of how to use it help?

---
Kevin Atkinson
kevinatk@home.com
http://metalab.unc.edu/kevina/



This archive was generated by hypermail 2b25 : Fri Feb 25 2000 - 20:26:44 CST