Re: More Widespread Adoption of Enchant ...

From: F Wolff <friedel_at_translate.org.za>
Date: Wed Sep 14 2011 - 16:30:41 CEST

Hi Kevin

I'm mostly a bystander on this list (Abiword localiser), but do have
some interest in Enchant (I'm a developer for Virtaal which uses the
Enchant API), and maintain spell checking libraries for South African
languages.

Some comments inline...

Op Di, 2011-09-13 om 23:24 -0600 skryf Kevin Atkinson:
>
> On Tue, 13 Sep 2011, Dominic Lachowicz wrote:
>
> > Some years later, I think that a clear winner has emerged that
> > addresses all of my above concerns. And while it wasn't the horse I
> > would have picked, I must be pragmatic. Hunspell is by no means
> > perfect, but the perfect is, after all, the enemy of the "good
> > enough".
> >
> > While I can see some clear benefits to both Aspell and Enchant in your
> > proposal below, it's clear to me that neither you nor I will have
> > enough time or willpower to see that to fruition, and in any case, I
> > feel it would only prolong the deaths of our respective projects.
>
> I am truly discouraged to hear you say that. Do you truly believe that
> that everyone using Enchant should switch to Hunspell?

I would say there is still a definite role for Enchant to play. Building
a whole spell checking engine is hard, but some people obviously have a
reason to do it. Although we've gone for Hunspell recently for several
African languages, people did raise concerns about missing features or
difficulties, which I can't relay properly here. For Zulu, I was
definitely limited, but did the best I could.

Some people have looked at things like foma for constructing better
spell checkers, and if they have success on the NLP side, obviously they
wouldn't want to integrate it separately with each major consumer of
this type of technology. Enchant works wonderfully in this way as it
allows a new spell checking engines to be used by many applications. Not
having to write support for personal word lists, etc. is a big win.

Many languages of the world still don't have spell checkers, and I
wouldn't try to guess if current technology is good enough for all of
them.

> What about
> languages that Hunspell doesn't support well yet (for example, Finish)?

In the case of Finnish we have to realise that they already have support
for Mozilla products, as well as OpenOffice.org and LibreOffice. The new
release cycle of Firefox seems to have left them behind a bit, and from
a quick look of the code, it looks as if they use the XPCOM stuff in
Mozilla which is not the currently blessed way to keep addons compatible
with the new fast release process.

That said, if somebody wants to get Enchant/Aspell to work with any of
these projects, the code in the Voikko project is probably the best
start available.

About integration into Mozilla products and Chrome, we have to realise
that they place a humongous emphasis on performance these days, and
might not accept something if it represents a noticeable regression in
any of startup time, executable size, memory consumption or CPU time.

I'm not aware of many other languages supported significantly better by
non-Hunspell engines, except maybe for Turkish. I think in the case of
Turkish they build a Hunspell checker from the same data with the
downside being a really huge dictionary consuming more resources than it
would with the Zemberek backend to Enchant. In that way they are not
left behind when only Hunspell is available.

> Do you think it just a matter of time before Hunspell does.

I can't guess an answer to this. Hunspell has incorporated many features
in the last few years that made it attractive for many reasons, but I
guess it still misses some that people want/need, like unlimited affix
support. If Hunspell continues to grow, that might happen. I don't know
what the vitality of the Hunspell project is, but got the impression
that it is also (mostly) a one-man project.

> What about
> languages such as Thai, which doesn't even have spaces in words.

To my best knowledge this is to some extent a matter of using a proper
word boundary detection algorithm (at whatever level of the stack). As
far as I know some of these are dictionary based, which doesn't sound
like a useful way to feed misspelt words into a spell checker :-) but I
think there are other techniques as well.

> I am no
> expert but I believe a more intelligent solution is needed than simple hash
> table lookups; for example, if a word is misspelled how do you find the
> misspelling? What about Chinese and Japanese, I don't even know what is
> involved in spell checking those languages, a dedicated spell checker will
> likely be needed.

These are mostly typed by input methods that already suggest correctly
typed words. Errors are still possible, but I think their nature is
different, and such spell checkers might fall more into the category of
what might be considered "grammar checkers" in our way of thinking. As I
understand it, word segmentation is ambiguous (not just hard) and
therefore complete paragraphs or segments might need to be handled by
the spell checkers, and might need to take semantics into consideration.
Of course I acknowledge that you dream of doing similar things with
Enchant/Aspell even for English, so the lines are definitely blurred.

Along those lines, extending enchant to have an API for grammar/style
checking might be quite useful as a way of making it easier to provide
such functionality across the whole Free desktop. On OSX it is now
available in all applications (from what I understand) and it makes
great sense to have writing aids available in all the places that users
want to write.

> I'm not sure what the future of Aspell is, but I do not consider Enchant
> to be dead, far from it, I consider it to be the future.
>
> But if you (and anyone else on this list) just no longer care about
> Enchant, than there is not a whole lot I can do. I can not push it
> myself.

As with many good ideas on the Free desktop, it does depend on somebody
doing the work, and I have complete understanding if somebody says that
they don't have the time to pursue something.

>From my side as an application developer, I can tell you that things
like solid releases, good documentation and good builds for windows
makes a difference, but of course that is not everything. (I'm not
saying anything about Aspell in the above, as I've only really worked
with Enchant.) I end up using what is available easily for Windows and
my application stack, since building for Windows is a pain and not
really where I want to spend my time. There is definitely also a role
for marketing/promotion of code to increase awareness and ramp up the
necessary support.

For me as somebody doing a lot of work in non-English languages, I can
assure you that either of Aspell or Hunspell are great for English! In
that sense I struggle to see why better suggestions would be a major
selling point for Aspell (if that is indeed the major one), since it is
probably "good enough" for many people, and if you are used to spell
checkers fairly often missing words, the relative quality of suggestions
might not be appreciated. Of course I completely acknowledge that there
is still room for improvement. It is far more true for a lot of other
languages. To some extent we count on people leading the way in the
major languages with better features like context sensitive spelling.

Sorry that I didn't have time to write a shorter and better message.

Keep well
Friedel

--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/virtaal-070-released
Received on Wed Sep 14 16:30:56 2011

This archive was generated by hypermail 2.1.8 : Wed Sep 14 2011 - 16:30:56 CEST