Re: correctWord API

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sat May 03 2003 - 08:34:02 EDT

  • Next message: Dom Lachowicz: "Re: commit (HEAD): rtf export"

     --- Raphael Finkel <raphael@cs.uky.edu> wrote: >
    Unfortunately, Aspell does not support Unicode, so
    > far as I can tell.

    I think the Aspell guys think it's too hard.
    They say they'd have to move away from their current
    internal table-based approach and they don't want to.

    > I grabbed aspell-0.50.3, and it still has utf-8
    > support on the todo list.
    > Aspell seems to supersede pspell which supersedes
    > ispell; I don't know of any
    > spell checker (except for my own, which doesn't
    > generate suggestion lists) that
    > handles utf-8.

    Aspell is the successor to Pspell. It's been
    redesigned
    recently and this version is being called "The New
    Aspell".

    There is another Open Source spell checker. That is
    the one which is part of OpenOffice. It was originally
    called "myspell". I have no idea if it can handle
    Unicode.

    > My Yiddish ispell dictionary is usable, but it never
    > suggests removing a letter
    > or adding a letter (that's a two-byte change!), and
    > it can't be set to ignore
    > Yiddish open and close quotes, because their first
    > byte in UTF-8 looks just
    > like the first byte of many other letters.

    You shouldn't handle UTF-8 as bytes since you'll make
    all these type operations too fiddly to implement.
    What you want to do is first convert any UTF-8 strings
    into either UTF-16 or UTF-32. Use UTF-16 if you only
    ever want to handle 16-bit code points. Use UTF-32 if
    you want to handle all of Unicode.
    Now each character is always a single word or long
    and much easier to handle.
    When you have processed your string, looked at adding
    or removing a letter, convert it back to UTF-8
    (if that's what is needed).

    > By the way, I don't see my Yiddish dictionary on the
    > download page for AbiWord
    > internationalizations; it should be placed there,
    > even though ispell's ability
    > to manipulate it is substandard. At least it
    > generally shows what words are
    > misspelled.

    If somebody doesn't get onto this, please file a bug
    report in Bugzilla. I'd do it myself but I don't know
    how.

    I'm working on the spellchecker code right now so let
    me know if you do come up with a new spellchecker
    class and I can look at integrating it. Good luck!

    Andrew Dunbar.

    > Raphael

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Yahoo! Plus
    For a better Internet experience
    http://www.yahoo.co.uk/btoffer



    This archive was generated by hypermail 2.1.4 : Sat May 03 2003 - 08:46:12 EDT