Re: More Widespread Adoption of Enchant ...

From: Martin Sevior <msevior_at_gmail.com>
Date: Wed Sep 14 2011 - 03:10:03 CEST

Dear Kevin,

With regard to your request for a separate mailing list for enchant.
This summer abiword GSoC student, Chen Xiajian, worked extensively to
expand the Enchant functionality to allow it to be used for
hyphernation.

This was performed in direct consultation with the abiword community.
At this point I would not like to have enchant discussion removed from
the abiword-dev mailing list.

With regard to enchant being used as a distro wide solution for linux
distributions, I believe that this was always it's grand goal. It is
used by both GNOME and KDE already. What do you see as being needed
for adoption by firefox and/or google chrome?

Cheers

Martin

On Wed, Sep 14, 2011 at 8:55 AM, Kevin Atkinson <kevina@gnu.org> wrote:
>
> I thought you might be interested in this post.  Basically I think before
> Aspell can move forward Enchant needs to become the system spell checker so
> that people can actually use Aspell from major applications such as
> Firefox.
>
> Also, I would like to be able to follow the development of Enchant but don't
> want to shift through all the other AbiWord development traffic, thus I
> would like to formally request a separate mailing list be created to
> Enchant.  I don't care where it is, it can still be hosted on abisource if
> need be.
>
> ---------- Forwarded message ----------
> Date: Mon, 12 Sep 2011 01:48:07 -0600 (MDT)
> From: Kevin Atkinson <kevina@gnu.org>
> Reply-To: aspell-devel@gnu.org
> To: aspell-announce@gnu.org, aspell-devel@gnu.org
> Subject: [aspell-devel] Aspell's Future
>
> Aspell not dead, in this post I will outline how I see Aspell moving
> forward.  Please circulate this post as you see fit, but keep in mind
> that I still consider this document a draft so expect the occasional
> spelling (yes even a spell checker doesn't catch everything) and
> grammar mistakes as perhaps a few (hopefully minor) factual errors.
>
> Please direct all discussion to aspell-devel@gnu.org, if you are not
> subscribed, don't worry I will approve your posts in a timely fashion.
> Please direct grammar and factual corrections directly to me at
> kevina@gnu.org.
>
> INTRO
> =====
>
> In recent years the development of Aspell has stagnated, but I have
> never really lost interest in Aspell.  The problem was that I just did
> not know how to move forward in light of Hunspell slower taking the
> role that I meant for Aspell to take.  However, after giving it a lot
> of thought, I have finally figured out how Aspell, and spell checking
> in general on GNU/Linux and other Free Unix like operating system, can
> move forward.
>
> For a long time I had two goals for Aspell:
>
>  1. Do a superior job of suggesting possible replacements for a
>     misspelled word than just about any other spell checker out there
>     for the English language (see http://aspell.net/test/cur/, also see
>     http://suggest.aspell.net)
>
>  2. Become the standard system spell checker for GNU/Linux and
>     other Free Unix like operation systems.
>
> I have succeeded in the first goal but, due to Hunspell slowly talking
> over the role of a system spell checker [1], I'm failing on the second.
> Unfortunately the fact that Hunspell is taking over as a system spell
> checker make it increasing difficult for users to take advantage of
> Aspell high suggestion quality.  Right now Aspell is still in most
> distributions, and users can still use it with many applications
> (Open/LibreOffice, Firefox, Thunderbird, and Google Chrome being some
> notable exceptions), but unless I do something this may no longer be
> the case.
>
> ([1] See http://fedoraproject.org/wiki/Releases/FeatureDictionary and
> http://wiki.ubuntu.com/ConsolidateSpellingLibs)
>
> For a long time I thought about ways to regain Aspell status as the
> standard system spell checker, but after giving it a lot of through I
> have decided that this goal that is no longer worth pursuing.  One of
> Hunspell advantages over Aspell is that it has better support for many
> languages thanks to its support for compounding and complex
> morphology, so I thought that if I could add add support for these
> features I could support the same set of languages that Hunspell does,
> and convince Linux distributions to consider Aspell over Hunspell as
> the one true spell checker; however after many years, I finally
> decided that it wasn't worth it.  Since Aspell was originally designed
> to be able to support multiple backends I briefly considering making
> Hunspell a backend for Aspell; however, as Aspell multiple backend
> support has never really been tested it would probably be more trouble
> than its worth, especially in light of Enchant
> (http://www.abisource.com/projects/enchant/), which is a meta-spell
> checker that already has working support for multiple backends.
>
> Hence, the way forward for Aspell, and spell checking in general, is
> to make Enchant the system spell checker.  Using Enchant will not only
> allow users to take advantage of Aspell superior suggestion quality
> for the English language, it will also add proper support for the
> Finnish language by being able to use Voikk
> (http://voikko.sourceforge.net/) instead of Hunspell, which at the
> time of this writing has poor support for the Finnish language.
>
> In addition using a meta-spell checker as the system spell checker
> will pave the way for more advance forms of spell checking than either
> Aspell or Hunspell support, such as taking into account frequency or
> context information.
>
> The rest of this post will outline how I see Aspell moving forward by
> making Enchant the system spell checker for GNU/Linux and other Free
> Unix like operating systems.
>
>
> THE WAY FORWARD FOR ASPELL
> ==========================
>
> As already mentioned the way forward centers around making Enchant the
> system spell checker.  Here is how I see that happening:
>
>  1. Get any applications that use Hunspell directly to use Enchant.
>     The primary applications of concern are Firefox, Thunderbird,
>     LibreOffice (and maybe OpenOffice), and Google Chrome.
>
>  2. Convince the Enchant projects (and all distributions using it) to
>     prefer Aspell over Hunspell for the English language.
>
>  3. Convert all applications that use Aspell directly to also use
>     Enchant.
>
>  4. Enhance enchant so that it can better support both Aspell and
>     Hunspell advance features.  At minimal Enchant will need to be
>     able to work with encoding other than UTF-8 and provide some sort
>     of way for applications to talk to the backend spell checker
>     engine directly.
>
>  5. Once Enchant is sufficiently enhanced to support Aspell features
>     abolish the current C ABI and instead have applications use
>     Aspell through Enchant.  Also convert the Aspell utility to be a
>     more generic Enchant front-end.
>
>  6. Eventually distill Aspell so that it is nothing but a plugin for
>     Enchant.  Encourage Hunspell and other spell checkers to go the
>     same route.
>
> Step (1) is probably the most difficult as there right now is a lot of
> inertia towards making Hunspell the one and only spell checker [1], I
> believe this is a mistake.  Hunspell is a good spell checker which
> supports a lot of languages but making any one spell checker engine
> _the_ only spell checker is a mistake.  Different spell checkers have
> different strengths and weaknesses and it does not make sense to have
> one spell checker used for every language.  Furthermore, right now the
> Finnish language is not well supported by Hunspell, so for every
> program that uses Hunspell directly, a plugin for Voikko, the Finish
> spell checker needs to be written.  If Enchant was used instead this
> would not be an issue.  In addition neither Aspell nor Hunspell is
> well equipped to support languages such as Thai which don't have any
> sort of separation between words [2].
>
> ([2] There is a Thai dictionary for Hunspell but it only is useful
> once the words are already separated somehow, perhaps using the
> zero-width space (ZWSP) marker.
> http://www.thaivisa.com/forum/topic/444360-thai-in-openoffice-on-ubuntu-lucid-lynx/,
> http://openoffice.org/bugzilla/show_bug.cgi?id=43583)
>
> Step (2) is technically very easy, it is simply adding a line to the
> enchant ordering file.  However, right now their seams to be the
> conception that Aspell is this legacy spell checker that needs to be
> eventually eliminated [1] and why would anyone want to use Aspell over
> Hunspell unless they have to.  So, even if the changes makes it into
> Enchant, I'm not sure that the change will stick as the various Linux
> distributions might remove the line in their packaged version.
> Furthermore, changing the default spell checker will change the
> personal dictionary used, which will likely lead to confusion.  Thus,
> (2) should likely only be done after (1) and, furthermore, I do not
> want to be the one who has to push the change; I hope others, can
> eventually see Aspell advantage for the English language and want to
> use it.
>
> Step (3) will eventually happen on its own, again due to the
> conception that Aspell is this legacy spell checker that needs to be
> eventually eliminated.  However, once (1) is done I will help push
> (3).
>
> Step (4) does not really depend on the previous steps and in fact
> should happen in parallel to them. Furthermore this step is something
> I am willing to do most of the work for.
>
> For (4), by supporting other encoding other than UTF-8 I mean adding
> support for conversion between UTF-8 and other encoding so that users
> of the Enchant library can use what ever encoding they are storing the
> document in and the backend can use what ever encoding is most
> efficient.  This will avoid unnecessary conversions to and from UTF-8
> when neither the users of the library or the backend is using UTF-8
> internally.
>
> Step (5) and (6) are more long term goals for Aspell are are not
> fundamental to the plan of making Enchant the system spell checker.
> Rather they will greatly simply the Aspell library and make it easier
> to maintain in the future.
>
>
> THE WAY FORWARD IN GENERAL
> ==========================
>
> Having all applications use Enchant will make it easy for newer and
> better spell checkers to replace both Aspell and Hunspell.
> Unfortunately, the current interface provided by Enchant is inadequate
> for many advantaged forms of spell checking.  For example, context
> sensitive spell checking is impossible since words are fed in one at a
> time, and sometimes in random order.  Furthermore, Enchant only gives
> a boolean response to the question is the word correctly spelled; when
> taking into account frequency information the answer might be maybe,
> as in yes it is correctly spelled, but it is not that common of a word
> and most likely you meant some other similarly spelled word.
> Therefore another long term goal is:
>
>  7. Enhance enchant to support more advance forms of spell checking
>     such as to be able to support:
>
>     * Context sensitive spell checking.
>
>     * Flagging uncommon words, but not outright marking than as
>       misspelled.
>
>     * Taking in to account local frequency information in the
>       document.
>
>     * Words with spaces in them such as "de facto".
>
>     * Languages such Thai, which do not have spaces between words [2].
>
> I have some ideas on more advance interfaces, but they are beyond the
> scope of this post.
>
>
> MAKING IT HAPPEN
> ================
>
> I still have limited time to work on Aspell.  I am still motivated to
> move forward with Aspell, but if no one else seams to care I am
> unlikely to spend much effort on it.  That is, I would be a lot more
> motivated if I get get a sense that others would like to see Aspell
> continue to be development for its technical merits.  As far as what
> those technical merits are that is something I will happy to discuss
> in a follow-up post.
>
> In addition, I am unlikely to move forward unless I see some movement
> on the first step, that is move Hunspell only applications towards
> Enchant.  Unfortunately, I do not have the time to push this goal
> myself.  Thus others will need to convince major projects such as
> Libra/OpenOffice, Firefox, and Chrome to move away from using Hunspell
> directly.
>
> So, basically, none of this is going to happen without some effort on
> others.
>
> Feedback welcome.
>
>
> _______________________________________________
> Aspell-devel mailing list
> Aspell-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/aspell-devel
>
>
Received on Wed Sep 14 03:10:12 2011

This archive was generated by hypermail 2.1.8 : Wed Sep 14 2011 - 03:10:12 CEST