Re: Implementing support for barbarisms correction

From: Dom Lachowicz (doml@appligent.com)
Date: Sat Sep 21 2002 - 12:16:02 EDT

  • Next message: Dom Lachowicz: "Re: RTF idea"

    On Sat, 2002-09-21 at 11:31, Jordi Mas wrote:

    > Well, I think that we need a solution that marks the misspelled word and
    > offers a replacement, since that what a user will expect from a spell checker.
    > I do not think that we can do this using ispell since it just does not work
    > that way.

    One of my questions is "should these words be marked as misspelled at
    all?" Words that are "incorrect" but are widely in use sound like good
    candidates for addition into a language, regardless of what the French
    and Spanish purists/governments think. Languages are fluid, evolving
    things and borrow heavily from their times, surroundings, other
    languages and cultures, technology/science, and the people who speak the
    language. But let's say that these words should really be marked as
    incorrect, for the sake of argument.

    If they really aren't allowable words, does this fall under a spell
    checking problem? I'd argue no. Spell checking solves the problem of
    mapping:

    possibly misspelled word->correctly spelled word(s)

    and not

    possibly suboptimal/illegal word->better/legal word(s)

    In my opinion, this looks like a different, but related problem, one
    related to a language's constructs (i.e. something more closely related
    to grammar) than to the spelling of its words. If you argue that a
    misspelled word is "suboptimal" or "illegal" you would be correct in a
    sense. But here the user's intent was to write a legal/optimal word. In
    your case, it is the user's intent to write a correctly spelled
    "suboptimal" word. Should some sort of warning pop up? Maybe, but that
    could get annoying really fast if I honestly mean to use these words.
    But I don't want to disable spell checking because I really do want the
    rest of the words checked.

    So through all of this, we've proven is that this is possibly a proofing
    problem, and not a spell checking problem. Read on.

    > I see this an enhancement to the spell checking system and it most likely will
    > take under 100 lines code. Any particular reason that makes you think that
    > this is not appropriated for our main tree?

    Because I don't see this as a spelling issue and don't believe that it
    will only take 100 lines to get it right. Consider simply these folowing
    cases. Please tell me how to fix them without creating a *huge*
    barbarism file and how to properly identify and handle them in under 100
    lines of code:

    *) Mixed capitalization (ComPutEr)
    *) Different verb tenses (compute, computed, has computed)
    *) Pluralization (computes, computers)
    *) Split infinitives
    *) The "barbaric" word is misspelled. You'd need to do at least 2
    mappings here to get the intended effect: misspelled barbaric->correct
    barbaric->preferable word

    Note that this is just what I could think of in 30 seconds, and isn't an
    exhaustive study of the problem at hand.

    I see this as a separate service that we could provide in addition to
    spell checking, but it is certainly not spell checking.

    > Alan has suggesting that we can implement this as an enhanced custom.dic for
    > every language. It makes sense to me. What do you think?

    I don't think that you can achieve this through using a custom.dic for
    every language, as the custom.dic only has a list of words you mark as
    "allowable" or "correctly spelled" for a language. It doesn't offer a
    mapping from wrong->correct word. It doesn't use any algorithm (eg:
    soundex, visual similarity) to suggest words. To go through this route,
    in my estimation, would involve writing something nearly equivalent in
    both size and scope to ispell.

    > Dom, I think that we should discuss this more a bit to see how good can it be
    > for other languages, and finally if is not useful we move it to a plugin, but
    > I think that is a bit early to say "I don't want this in the main tree.".

    You asked if people had objections. I had one. It seems silly to
    basically say "I'm looking for objections" and then tell me that "It's
    too early to object, wait until we discuss more," especially since your
    message didn't even mention the possibility of discussion. Your email
    basically said "Here is perceived problem X. Does anyone want to stop me
    because I'm about to implement something to fix perceived problem X."
    The logic seems a bit flawed, at least to me...

    Is this something useful, in my opinion? Maybe/probably. Would I object
    to it being a plugin? Probably not. Do I still object to it being in the
    main tree? Yup.

    Cheers,
    Dom



    This archive was generated by hypermail 2.1.4 : Sat Sep 21 2002 - 12:20:32 EDT