Re: The abiword side of grammar checker.

From: Martin Sevior (msevior@physics.unimelb.edu.au)
Date: Mon Sep 23 2002 - 12:11:37 EDT

Next message: Petr Tomasek: "Will the new beta version be installable along with 1.0.3?"

Previous message: Jody Goldberg: "Re: tb_hyperlink_xpm causes an assert"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Mon, 2002-09-23 at 11:45, Andrew Dunbar wrote:
> --- Martin Sevior <msevior@physics.unimelb.edu.au>
> wrote:
> >
> >
> > On Sun, 22 Sep 2002, Dom Lachowicz wrote:
> >
> > > Implementing a vector of incorrect phrases and
> > > squiggling them is trivial. Determining if a given
> > > phrase is correct/incorrect and why is so is the
> > > hard part in front of us :) Oh, that and our code
> > > to separate things on phrase/sentence boundaries
> is
> > > non-existent. This'll be non trivial too. Or the
> > > grammar checking tool will have to determine this
> > > for us. Either way, it's not fun.
> >
> > That is why we can break the problem into two parts.
> > What abiword needs to do and what the grammar
> > checker needs to do.
>
> Absolutely!
>
> > I'm definately not volenteering for the latter just
> > the former. I don't think it would be too hard to
> > split text in sentences, just look for full stops
> > :-) If this is insufficient we can send out the
> > entire paragraph of text, which is self-contained by
> > default.
>
> Well how would we send it out? If we send it just as
> a string of text how will we pass the start and end
> positions of squiggles?
> If we pass it in internal format it'll be yucky for
> grammar types to hack on. We'll also have to watch
> out for inline level attributes such as bold and
> italic
> which we don't want to bother the grammar checker
> with.
>

We need just fill growbuf with UT_UCSChar via

UT_GrowBuf pgb;
fl_BlockLayout::getBlockBuf(&pgb)

The properties etc are described by the PieceTable fragments which track
the length of each segment of contiguous properties.

The growbuf contains just the text in signed 32-bit integer format. So
this can be passed out to the grammar checker quite easily.

The grammar checker could do the work of splitting it into sentences or
we could do that via the fl_PartOfBlock classes.

The UT_GrowBuf class contains the size of the text buffer. The grammar
checker should return a list of offsets that span incorrect text.

But it should not be hard to get this going. BTW just getting Western
European languages where sentences are easily distinguished via
punctuation rules will be as good as our stiffest competition can
manage.

Cheers

Martin

Next message: Petr Tomasek: "Will the new beta version be installable along with 1.0.3?"
Previous message: Jody Goldberg: "Re: tb_hyperlink_xpm causes an assert"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Mon Sep 23 2002 - 12:20:26 EDT