Re: The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Sep 22 2002 - 21:27:12 EDT

  • Next message: Andrew Dunbar: "Re: The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)"

     --- Martin Sevior <msevior@physics.unimelb.edu.au>
    wrote:
    >
    > On Sun, 22 Sep 2002, [iso-8859-1] Andrew Dunbar
    > wrote:
    > >
    > > What we probably need to do is start designing a
    > > grammar checker framework, complete with a plugin
    > > interface for extensions, and design the barbarism
    > > checker as a plugin for it.
    >
    > I've discovered that I personally definately need a
    > grammar checker so I'm happy to help out though not
    > take the lead on a grammar checker.

    It's probably time to start building a list of what
    we want a grammar checker to actually do. It's a
    pretty vague thing really. I think there's an RFE
    you can add to already.

    > There are two components. The "squiggling"
    > implementation and the actually parsing of text.
    >
    > Regarding the squiggling, we can borrow much of the
    > design from the spell-checker.

    In fact I'd like to refactor a little and make a
    generic squiggling class that both the spelling and
    the grammar can use. There's lots of little odditties
    in the squiggles (especially on Windows) that will be
    much easier to maintain in a single place.

    > To remind people this works by building a vector of
    > pointers to fl_BlockLayout classes then processing
    > these during idle time in the GUI mainloop.
    >
    > The fl_BlockLayout classes container pointer to text
    > in the piecetable which is seperated by white space
    > characters into words. These words are fed through
    > the spell checker.

    White space is not good enough for word separation.
    Some of the problems with quotes and/or apostrophes
    causing spellcheck problems will be due to this.
    Then there are Asian languages which do not use spaces
    between words but for which there are open source
    libraries available to do it the right way.

    The first step is to use the Unicode character
    functions that are available now in Win32 and glibc
    to tell us whether a character is a letter or not.

    > A grammar check would do exactly the same except it
    > would have to recognize sentences and parse these
    > through to the grammar checker.

    Perhaps. The existing grammar checkers don't really
    seem to do any parsing at all. They seem to have
    some-
    thing maybe similar to a regex engine for finding
    patterns.

    Having some kind of "sentence iterator" and "word
    iterator" is a very good idea though. ICU is open
    source and has both. But it's big and I'm not sure
    how possible it is to use just pieces of it...

    > I think we can reuse much of the spell checker code
    > so that fl_BlockLayouts are parsed through to both
    > the spell checker and the grammar checker.
    >
    > If a region of the text is found to be suspect the
    > text is marked with a green squiggle two pixels
    > below the red squggle.
    >
    > Hmm the more I think about this, the easier it
    > seems. We can re-use a lot of the existing classes
    > and methods and just add extra code to split
    > the text into sentences as well as words.
    >
    > The grammar checker would have to mark the start and
    > end points of the dodgy text and send this info
    > back. Then we reuse the squiggle code to draw
    > between the points.
    >
    > I think this would not be hard to get working rather
    > quickly.

    This sounds very good and is one of the things I've
    always wanted to work on! I really want it to be
    easily extensibel, especially via plugins.
    For instance, I'd love to see a German extension that
    can correct when you've used the wrong article:
    der vs. die vs. das - MS Word doesn't even do this for
    German but it would be great for 2nd language users.

    Andrew.

    > see the code in the file fl_BlockLayout.cpp
    >
    > Cheers!
    >
    > Martin
    >

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sun Sep 22 2002 - 21:31:58 EDT