Re: Implementing support for barbarisms correction

From: Dom Lachowicz (doml@appligent.com)
Date: Sat Sep 21 2002 - 12:16:02 EDT

Next message: Dom Lachowicz: "Re: RTF idea"

Previous message: Jordi Mas: "Re: Implementing support for barbarisms correction"
In reply to: Jordi Mas: "Re: Implementing support for barbarisms correction"
Next in thread: Jordi Mas: "Re: Implementing support for barbarisms correction"
Reply: Jordi Mas: "Re: Implementing support for barbarisms correction"
Reply: Andrew Dunbar: "Re: Implementing support for barbarisms correction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

On Sat, 2002-09-21 at 11:31, Jordi Mas wrote:

> Well, I think that we need a solution that marks the misspelled word and
> offers a replacement, since that what a user will expect from a spell checker.
> I do not think that we can do this using ispell since it just does not work
> that way.

One of my questions is "should these words be marked as misspelled at
all?" Words that are "incorrect" but are widely in use sound like good
candidates for addition into a language, regardless of what the French
and Spanish purists/governments think. Languages are fluid, evolving
things and borrow heavily from their times, surroundings, other
languages and cultures, technology/science, and the people who speak the
language. But let's say that these words should really be marked as
incorrect, for the sake of argument.

If they really aren't allowable words, does this fall under a spell
checking problem? I'd argue no. Spell checking solves the problem of
mapping:

possibly misspelled word->correctly spelled word(s)

and not

possibly suboptimal/illegal word->better/legal word(s)

In my opinion, this looks like a different, but related problem, one
related to a language's constructs (i.e. something more closely related
to grammar) than to the spelling of its words. If you argue that a
misspelled word is "suboptimal" or "illegal" you would be correct in a
sense. But here the user's intent was to write a legal/optimal word. In
your case, it is the user's intent to write a correctly spelled
"suboptimal" word. Should some sort of warning pop up? Maybe, but that
could get annoying really fast if I honestly mean to use these words.
But I don't want to disable spell checking because I really do want the
rest of the words checked.

So through all of this, we've proven is that this is possibly a proofing
problem, and not a spell checking problem. Read on.

> I see this an enhancement to the spell checking system and it most likely will
> take under 100 lines code. Any particular reason that makes you think that
> this is not appropriated for our main tree?

Because I don't see this as a spelling issue and don't believe that it
will only take 100 lines to get it right. Consider simply these folowing
cases. Please tell me how to fix them without creating a *huge*
barbarism file and how to properly identify and handle them in under 100
lines of code:

*) Mixed capitalization (ComPutEr)
*) Different verb tenses (compute, computed, has computed)
*) Pluralization (computes, computers)
*) Split infinitives
*) The "barbaric" word is misspelled. You'd need to do at least 2
mappings here to get the intended effect: misspelled barbaric->correct
barbaric->preferable word

Note that this is just what I could think of in 30 seconds, and isn't an
exhaustive study of the problem at hand.

I see this as a separate service that we could provide in addition to
spell checking, but it is certainly not spell checking.

> Alan has suggesting that we can implement this as an enhanced custom.dic for
> every language. It makes sense to me. What do you think?

I don't think that you can achieve this through using a custom.dic for
every language, as the custom.dic only has a list of words you mark as
"allowable" or "correctly spelled" for a language. It doesn't offer a
mapping from wrong->correct word. It doesn't use any algorithm (eg:
soundex, visual similarity) to suggest words. To go through this route,
in my estimation, would involve writing something nearly equivalent in
both size and scope to ispell.

> Dom, I think that we should discuss this more a bit to see how good can it be
> for other languages, and finally if is not useful we move it to a plugin, but
> I think that is a bit early to say "I don't want this in the main tree.".

You asked if people had objections. I had one. It seems silly to
basically say "I'm looking for objections" and then tell me that "It's
too early to object, wait until we discuss more," especially since your
message didn't even mention the possibility of discussion. Your email
basically said "Here is perceived problem X. Does anyone want to stop me
because I'm about to implement something to fix perceived problem X."
The logic seems a bit flawed, at least to me...

Is this something useful, in my opinion? Maybe/probably. Would I object
to it being a plugin? Probably not. Do I still object to it being in the
main tree? Yup.

Cheers,
Dom

Next message: Dom Lachowicz: "Re: RTF idea"
Previous message: Jordi Mas: "Re: Implementing support for barbarisms correction"
In reply to: Jordi Mas: "Re: Implementing support for barbarisms correction"
Next in thread: Jordi Mas: "Re: Implementing support for barbarisms correction"
Reply: Jordi Mas: "Re: Implementing support for barbarisms correction"
Reply: Andrew Dunbar: "Re: Implementing support for barbarisms correction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sat Sep 21 2002 - 12:20:32 EDT