Re: Explicit spellchecker suggestions implementation proposal

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Thu Sep 26 2002 - 00:37:09 EDT

  • Next message: Andrew Dunbar: "Notes on first-time building"

     --- Karl Ove Hufthammer <karl@huftis.org> wrote:
    > Dom Lachowicz <doml@appligent.com> wrote in
    > news:36C308EC-CFFE-
    > 11D6-80C8-0003934B5C22@appligent.com:
    >
    > > It should look
    > > something like this instead:
    > >
    > > <barbarism word="tamany">
    > > <suggestion word="mida" />
    > > <suggestion word="grandària" />
    > > <suggestion ... />
    > > </barbarism>
    >
    > I suggest moving the original word to elements too
    > (generally, in XML it's a good idea to never use
    > attributes for normal text -- only for codes and
    > similar things). Also, it should be possible to
    > have several original words (with the same meaning),
    > but with the same suggestion. Norwegian example:
    >
    > <barbarism>
    > <original>
    > <word>e-mail</word>
    > <word>email</word>
    > <word>mail</word>
    > </original>
    > <suggestions>
    > <word>e-post</word>
    > <word>e-brev</word>
    > </suggestions>
    > </barbarism>
    >
    > This is much more compact than having equal entries
    > for each original word ('e-mail', 'email',
    'e-mail').

    I thought about this too but I wasn't convinced until
    reading your opinions just now. This is a good idea.

    > One more thing. This only works on individual words.
    > Would it be possible to support leading and trailing
    > words/articles? For example, in Norwegian, 'ein e-
    > mail' should be changed to either 'ein e-post'
    > or 'eit e-brev' (note 'eit', not 'ein' for
    'e-brev').
    > Other languages use similar articles (e.g. 'la' and
    > 'le' in French).

    I really think article gender agreement is much more
    of a grammar issue than anything we've discussed so
    far. I think it would make things too complex because
    in the data structure more strange things can exist
    between two words than within a single word.
    But I do think it's worth discussing now, maybe
    mentioning it in code comments, and filing an RFE.
    When we start doing a grammar/style checker things
    such as these will be the easiest to implement but
    still more complex than anything a spellchecker should
    be concerned with.

    On another note, I think it might be a good idea to
    call this new feature something like "explicit
    spellcheker suggestions". Barbarisms is one case for
    this, fixing Americanisms in other English dialects
    is another, and replacing slang and neologisms with
    more general words could be another.
    All of these would work in exactly the same way with
    the same XML format and the same engine.

    In the GUI we can call it whatever we like.

    Revised XML samples:
    <explicit-spellcheck>
      <class>barbarism</class>
      <original>
        <word>e-mail</word>
        <word>email</word>
        <word>mail</word>
      </original>
      <suggestions>
        <word>e-post</word>
        <word>e-brev</word>
      </suggestions>
    </explicit-spellcheck>

    <explicit-spellcheck>
      <class>Americanism</class>
      <original>
        <word>optimize</word>
      </original>
      <suggestions>
        <word>optimise</word>
      </suggestions>
    </explicit-spellcheck>

    Okay my tag names suck and my Americanism example is
    weak since simple spellchecking will find it. The
    bonus is we can display in the dialog why it's wrong.
    For non-explicit suggestions we would just say
    "spelling".

    Andrew.

    > --
    > Karl Ove Hufthammer

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Thu Sep 26 2002 - 00:44:17 EDT