Re: The abiword side of grammar checker.

From: Martin Sevior (msevior@physics.unimelb.edu.au)
Date: Mon Sep 23 2002 - 12:11:37 EDT

  • Next message: Petr Tomasek: "Will the new beta version be installable along with 1.0.3?"

     Mon, 2002-09-23 at 11:45, Andrew Dunbar wrote:
    > --- Martin Sevior <msevior@physics.unimelb.edu.au>
    > wrote:
    > >
    > >
    > > On Sun, 22 Sep 2002, Dom Lachowicz wrote:
    > >
    > > > Implementing a vector of incorrect phrases and
    > > > squiggling them is trivial. Determining if a given
    > > > phrase is correct/incorrect and why is so is the
    > > > hard part in front of us :) Oh, that and our code
    > > > to separate things on phrase/sentence boundaries
    > is
    > > > non-existent. This'll be non trivial too. Or the
    > > > grammar checking tool will have to determine this
    > > > for us. Either way, it's not fun.
    > >
    > > That is why we can break the problem into two parts.
    > > What abiword needs to do and what the grammar
    > > checker needs to do.
    >
    > Absolutely!
    >
    > > I'm definately not volenteering for the latter just
    > > the former. I don't think it would be too hard to
    > > split text in sentences, just look for full stops
    > > :-) If this is insufficient we can send out the
    > > entire paragraph of text, which is self-contained by
    > > default.
    >
    > Well how would we send it out? If we send it just as
    > a string of text how will we pass the start and end
    > positions of squiggles?
    > If we pass it in internal format it'll be yucky for
    > grammar types to hack on. We'll also have to watch
    > out for inline level attributes such as bold and
    > italic
    > which we don't want to bother the grammar checker
    > with.
    >

    We need just fill growbuf with UT_UCSChar via

    UT_GrowBuf pgb;
    fl_BlockLayout::getBlockBuf(&pgb)

    The properties etc are described by the PieceTable fragments which track
    the length of each segment of contiguous properties.

    The growbuf contains just the text in signed 32-bit integer format. So
    this can be passed out to the grammar checker quite easily.

    The grammar checker could do the work of splitting it into sentences or
    we could do that via the fl_PartOfBlock classes.

    The UT_GrowBuf class contains the size of the text buffer. The grammar
    checker should return a list of offsets that span incorrect text.

    But it should not be hard to get this going. BTW just getting Western
    European languages where sentences are easily distinguished via
    punctuation rules will be as good as our stiffest competition can
    manage.

    Cheers

    Martin



    This archive was generated by hypermail 2.1.4 : Mon Sep 23 2002 - 12:20:26 EDT