From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sat May 03 2003 - 08:34:02 EDT
--- Raphael Finkel <raphael@cs.uky.edu> wrote: >
Unfortunately, Aspell does not support Unicode, so
> far as I can tell.
I think the Aspell guys think it's too hard.
They say they'd have to move away from their current
internal table-based approach and they don't want to.
> I grabbed aspell-0.50.3, and it still has utf-8
> support on the todo list.
> Aspell seems to supersede pspell which supersedes
> ispell; I don't know of any
> spell checker (except for my own, which doesn't
> generate suggestion lists) that
> handles utf-8.
Aspell is the successor to Pspell. It's been
redesigned
recently and this version is being called "The New
Aspell".
There is another Open Source spell checker. That is
the one which is part of OpenOffice. It was originally
called "myspell". I have no idea if it can handle
Unicode.
> My Yiddish ispell dictionary is usable, but it never
> suggests removing a letter
> or adding a letter (that's a two-byte change!), and
> it can't be set to ignore
> Yiddish open and close quotes, because their first
> byte in UTF-8 looks just
> like the first byte of many other letters.
You shouldn't handle UTF-8 as bytes since you'll make
all these type operations too fiddly to implement.
What you want to do is first convert any UTF-8 strings
into either UTF-16 or UTF-32. Use UTF-16 if you only
ever want to handle 16-bit code points. Use UTF-32 if
you want to handle all of Unicode.
Now each character is always a single word or long
and much easier to handle.
When you have processed your string, looked at adding
or removing a letter, convert it back to UTF-8
(if that's what is needed).
> By the way, I don't see my Yiddish dictionary on the
> download page for AbiWord
> internationalizations; it should be placed there,
> even though ispell's ability
> to manipulate it is substandard. At least it
> generally shows what words are
> misspelled.
If somebody doesn't get onto this, please file a bug
report in Bugzilla. I'd do it myself but I don't know
how.
I'm working on the spellchecker code right now so let
me know if you do come up with a new spellchecker
class and I can look at integrating it. Good luck!
Andrew Dunbar.
> Raphael
=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com
__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer
This archive was generated by hypermail 2.1.4 : Sat May 03 2003 - 08:46:12 EDT