Re: AbiSource, Ispell, Aspell and beyond...


Subject: Re: AbiSource, Ispell, Aspell and beyond...
From: Kevin Atkinson (kevinatk@home.com)
Date: Tue Feb 22 2000 - 21:04:01 CST


Paul Rohr wrote:

> I'm thus thrilled to hear that you're interested in providing a simpler API
> for us.

Thanks.
>
> However, I think you're radically overestimating our API needs for swapping
> out spell checkers. Have you seen what our current API into ispell is?
>
> -- snip --
>
> typedef struct _sp_suggestions {
> int count;
> short *score;
> unsigned short **word;
> } sp_suggestions;
>
> int SpellCheckInit(char *hashname);
> void SpellCheckCleanup(void);
> int SpellCheckNWord16(const unsigned short *word16, int length);
> int SpellCheckSuggestNWord16(const unsigned short *word16, int length,
> sp_suggestions *sg);
>
> -- snip --
>
> Essentially, *all* we use ispell for is to look up words and offer
> suggestions. That's it. Everything else we do ourselves already (because
> it's easy).

By that you mean....

Do you maintain the session and personal word lists? How do you plan on
handling storing session dictionaries with a document. What about
allowing the user to edit the personal dictionary.

Aspell needs more info than a simple minded spell checker to do its job
as well as it can, like communicating back the replacement pairs.

> When I envision a replacement API, its main features are:
>
> 1. We can set a default locale via ISO/IETF language codes (ie, en-US),
> instead of having to refer to fixed filenames in a pre-existing namespace
> (american.hash).

That can easily be done via the configure class.
>
> 2. When we've got a span of content flagged in another language (say
> pt-BR), the lookup will use the appropriate dictionary, if available.
>
> 3. If there are multiple dictionaries being used, we can specify where to
> look for them on disk, along with a fallback order.
>
> 4. We can feed it words in UCS2 and not have to take a huge performance hit
> on the lookup.

By this you mean?
>
> 5. We get fast, accurate results.

Do you mean suggestion wise? Aspell may take a while (up to a second)
to come up with suggestions so its best to first check if the word
exists and then have aspell come up with the suggestions in the
background when you are spell checking the document as the user types.
>
> 6. We can control the in-memory profile of dictionary usage. For example,
> it's probably reasonable to keep an efficient dictionary for the default
> language in (virtual) memory. Likewise, it'd probably be nice if secondary
> dictionaries were demand-loaded on an as-needed basis, but that's not a
> requirement, and getting the caching behavior tuned right could certainly
> wait.

Aspell main dictionaries are memory mapped so this solves a bunch of
those problems. If memory mapping fails (does Win32 support it?) aspell
will (it currently does not because I want to no if the mmap failed)
simply allocate enough memory and load them in.

> 7. We can ship one high-quality dictionary per language per spell engine,
> without having to worry about endianness or byte size. Likewise, we
> wouldn't want to have to worry about negotiating IP licenses for the
> dictionary content.
>

I do NOT recommend this. You should use the system dictionaries if they
are available. Also Aspell CAN NOT use a simple word list. Double,
aspell dictionaries have more than byte order problems and should be
compiled for each archicile just like the binaries are....

> It looks like your proposed API is intended to solve a whole different class
> of problems. Am I missing something here?

It is intended to be GENERIC not AbiSource specific. You can use the
parts of it that you need. The other parts you can ignore.

-- 
Kevin Atkinson
kevinatk@home.com
http://metalab.unc.edu/kevina/



This archive was generated by hypermail 2b25 : Tue Feb 22 2000 - 21:01:17 CST