POW -- ispell support for non-English languages

Robert Sievers (bob@abisource.com)
Sat, 30 Oct 1999 12:47:59 -0500


ispell support for non-English languages

Justin and Paul have done a tremendous job getting spell support into
AbiWord. Moreover, there are a number of foreign language ispell
dictionaries available at
http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html, thus allowing
people all over the world to benefit form this work.

Unfortunately, any time a word is typed in that contains an umlaut, a grave
accent, an acute accent, or any other such character, the word is always
marked mis-spelled. We aren't quite sure what the exact problem is,
therein lies the reason for this Project Of the Week.

This may or may not be a quick and easy POW. We're looking for someone to
dive in and figure out whether this is a problem with ispell, a problem
within AbiWord, or a problem lurking somewhere in between. Most of this
POW is about actually doing the legwork to figure out where the problem is
in the first place. For all we know, there is some line of code in ispell
that looks like this:

MAX_CHAR=128 /* TODO: change to 256 for high-bit chars */

Obviously that is an exaggeration, but the point remains that we frankly do
not know enough about ispell to hazard a guess as to where to begin.

To complete this POW, start by grabbing an international ispell directory
from the URL above. Open AbiWord, and type a word containing high-bit
characters. Step through the code, and see why ispell thinks the word is
marked mis-spelled. Take a look at what AbiWord is sending over the wall,
and how it does or doesn't match ispell's internal format for saving words
with high-bit characters. Continue to track down the reason why words are
being marked mis-spelled when they shouldn't until you know what section of
code needs to be changed.

At that point, either change it yourself, or clue us in as to where the
problem lies. If the problem is in ispell, you will have much more of an
understanding of what to do than anyone else given the time spent doing the
investigation.

The place to start is in abi/src/other/spell

Normally, there would be a more complete list of instructions presented
here as to how to get started, but in the case, that's the whole point of
the POW. This might very well be a five-minute fix, but it will take three
hours to find it. Then again, it might take ten minutes to find it, two
hours to think about the solutions, and another hour to start writing some
code. Even if the fix itself turns out to be more complicated than you can
handle, getting to the root of the problem gets us A LOT closer to being
able to fix it.

Enjoy!

Bob

For more background on the whole POW / ZAP / SHAZAM concept, see the
following introduction:

http://www.abisource.com/mailinglists/abiword-dev/99/September/0097.html

Robert Sievers
Open Source Evangelist



This archive was generated by hypermail 1.03b2.