Barbarism implementation proposal

From: Jordi Mas (jmas@softcatala.org)
Date: Tue Sep 24 2002 - 16:26:54 EDT

  • Next message: Mark Gilbert: "Re: logos?"

    Hello guys,

    After some discussion in the list and some IRC talking with Dom and others, I
    have put together a implementation proposal for the barbarism detection feature.

    * What is a barbarism

    Barbarism is a problem that manly concerns to minority languages, i.e.
    languages that are competing, in the same territory, with a more powerful one,
    called "rooflanguage", for example Welsh, Catalan, Occitan, and others.

    When two languages compete in the same territory comes up interferences, but
    they are not symmetric. The roof language is weakly affected but the minority
    one can be strongly affected, and can disappear (glottophagy). One of these
    interferences is barbarism.

    * How we implement it

    We have a class called Barbarism that lives in 'src\other\spell\xp\'. We init
    the class when the ispellclass is created and when we do CheckWord and
    suggestWord we add also call the Barbarism class.

    * How we store them

    The file that contains the barbarisms is an XML file that lives in the same
    directory that the dictionaries and it has the same name that the dictionary
    file but with a barbarism extension.

    For example, for American will be "american.barbarisms"

    This is an example file. The attribute "word" contains the wrong word, the
    attribute suggestion contains the right word to use

    <?xml version="1.0" encoding="utf-8"?>
    <AbiBarbarism app="AbiWord" ver="1.0" language="ca-ES">

            <Barbarism
                word="boleto"
                 suggestion1="billet"
            />

            <Barbarism
                word="tiro"
                 suggestion1="tret"
            />

            <Barbarism
                word="tanteig"
                 suggestion1="tempteig"
            />
            
            
            <Barbarism
                word="tamany"
                 suggestion1="mida"
                 suggestion2="grandària"
            />

    </AbiBarbarism>

    * Known problems in the design

    - We work at word level, not sentence level. We are just hacking a spell checker

    - Words that can be declined have to be coded several times (plurals, verbs
    declinations, etc). At least in Catalan, this is not very common.

    Ok, that basically it. I would love to heard your comments to see how we can
    define this better that it is right now.

    Thanks,

    -- 
    

    Jordi Mas http://www.softcatala.org



    This archive was generated by hypermail 2.1.4 : Tue Sep 24 2002 - 16:32:52 EDT