Re: Explicit spellchecker suggestions implementation proposal

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Thu Sep 26 2002 - 00:37:09 EDT

Next message: Andrew Dunbar: "Notes on first-time building"

Previous message: pclouds: "Vietnamese Translation Update"
In reply to: Karl Ove Hufthammer: "Re: Barbarism implementation proposal"
Next in thread: Andrew Dunbar: "Re: Barbarism implementation proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

--- Karl Ove Hufthammer <karl@huftis.org> wrote:
> Dom Lachowicz <doml@appligent.com> wrote in
> news:36C308EC-CFFE-
> 11D6-80C8-0003934B5C22@appligent.com:
>
> > It should look
> > something like this instead:
> >
> > <barbarism word="tamany">
> > <suggestion word="mida" />
> > <suggestion word="grandària" />
> > <suggestion ... />
> > </barbarism>
>
> I suggest moving the original word to elements too
> (generally, in XML it's a good idea to never use
> attributes for normal text -- only for codes and
> similar things). Also, it should be possible to
> have several original words (with the same meaning),
> but with the same suggestion. Norwegian example:
>
> <barbarism>
> <original>
> <word>e-mail</word>
> <word>email</word>
> <word>mail</word>
> </original>
> <suggestions>
> <word>e-post</word>
> <word>e-brev</word>
> </suggestions>
> </barbarism>
>
> This is much more compact than having equal entries
> for each original word ('e-mail', 'email',
'e-mail').

I thought about this too but I wasn't convinced until
reading your opinions just now. This is a good idea.

> One more thing. This only works on individual words.
> Would it be possible to support leading and trailing
> words/articles? For example, in Norwegian, 'ein e-
> mail' should be changed to either 'ein e-post'
> or 'eit e-brev' (note 'eit', not 'ein' for
'e-brev').
> Other languages use similar articles (e.g. 'la' and
> 'le' in French).

I really think article gender agreement is much more
of a grammar issue than anything we've discussed so
far. I think it would make things too complex because
in the data structure more strange things can exist
between two words than within a single word.
But I do think it's worth discussing now, maybe
mentioning it in code comments, and filing an RFE.
When we start doing a grammar/style checker things
such as these will be the easiest to implement but
still more complex than anything a spellchecker should
be concerned with.

On another note, I think it might be a good idea to
call this new feature something like "explicit
spellcheker suggestions". Barbarisms is one case for
this, fixing Americanisms in other English dialects
is another, and replacing slang and neologisms with
more general words could be another.
All of these would work in exactly the same way with
the same XML format and the same engine.

In the GUI we can call it whatever we like.

Revised XML samples:
<explicit-spellcheck>
  <class>barbarism</class>
  <original>
    <word>e-mail</word>
    <word>email</word>
    <word>mail</word>
  </original>
  <suggestions>
    <word>e-post</word>
    <word>e-brev</word>
  </suggestions>
</explicit-spellcheck>

<explicit-spellcheck>
  <class>Americanism</class>
  <original>
    <word>optimize</word>
  </original>
  <suggestions>
    <word>optimise</word>
  </suggestions>
</explicit-spellcheck>

Okay my tag names suck and my Americanism example is
weak since simple spellchecking will find it. The
bonus is we can display in the dialog why it's wrong.
For non-explicit suggestions we would just say
"spelling".

Andrew.

> --
> Karl Ove Hufthammer

=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Next message: Andrew Dunbar: "Notes on first-time building"
Previous message: pclouds: "Vietnamese Translation Update"
In reply to: Karl Ove Hufthammer: "Re: Barbarism implementation proposal"
Next in thread: Andrew Dunbar: "Re: Barbarism implementation proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Thu Sep 26 2002 - 00:44:17 EDT