Re: AbiSource, Ispell, Aspell and beyond...


Subject: Re: AbiSource, Ispell, Aspell and beyond...
From: Paul Rohr (paul@abisource.com)
Date: Tue Feb 22 2000 - 20:40:22 CST


Hi Kevin,

I'm glad to "meet" you at last! :-) As you've probably realized by now,
we've been lusting after your algorithms for quite a while now, but aren't
willing to take on the portability burden of the advanced C++ features
you're good at.

I'm thus thrilled to hear that you're interested in providing a simpler API
for us.

However, I think you're radically overestimating our API needs for swapping
out spell checkers. Have you seen what our current API into ispell is?

  -- snip --

typedef struct _sp_suggestions {
   int count;
   short *score;
   unsigned short **word;
} sp_suggestions;
   
int SpellCheckInit(char *hashname);
void SpellCheckCleanup(void);
int SpellCheckNWord16(const unsigned short *word16, int length);
int SpellCheckSuggestNWord16(const unsigned short *word16, int length,
sp_suggestions *sg);

  -- snip --

Essentially, *all* we use ispell for is to look up words and offer
suggestions. That's it. Everything else we do ourselves already (because
it's easy).

When I envision a replacement API, its main features are:

1. We can set a default locale via ISO/IETF language codes (ie, en-US),
instead of having to refer to fixed filenames in a pre-existing namespace
(american.hash).

2. When we've got a span of content flagged in another language (say
pt-BR), the lookup will use the appropriate dictionary, if available.

3. If there are multiple dictionaries being used, we can specify where to
look for them on disk, along with a fallback order.

4. We can feed it words in UCS2 and not have to take a huge performance hit
on the lookup.

5. We get fast, accurate results.

6. We can control the in-memory profile of dictionary usage. For example,
it's probably reasonable to keep an efficient dictionary for the default
language in (virtual) memory. Likewise, it'd probably be nice if secondary
dictionaries were demand-loaded on an as-needed basis, but that's not a
requirement, and getting the caching behavior tuned right could certainly
wait.

7. We can ship one high-quality dictionary per language per spell engine,
without having to worry about endianness or byte size. Likewise, we
wouldn't want to have to worry about negotiating IP licenses for the
dictionary content.

8. (Optional) We can choose which spell engine(s) to use at runtime.

We haven't specified an API at this level yet, but I suspect that it should
be pretty straightforward.

It looks like your proposed API is intended to solve a whole different class
of problems. Am I missing something here?

Paul

PS: I should note that we haven't done a great job of encapsulating the
"rest" of the spell-related logic. It's currently split among the following
three locations:

  abi/src/text/fmt/xp/fl_BlockLayout.cpp
  abi/src/text/fmt/xp/fv_View.cpp
  abi/src/wp/ap/xp/ap_Dialog_Spell.cpp

It would be cleaner if we concentrated this further, but it's not really
that much code, and it works well enough as is. However, that work can
easily wait until we're ready to add spelling logic to another AbiSuite app,
when we'll need to move the spell dialog to XAP code anyhow. Without
knowing the exact needs of that app, we're unlikely to get the API right
anyhow. :-)



This archive was generated by hypermail 2b25 : Tue Feb 22 2000 - 20:34:54 CST