Re: Would you like to collaborate on grammar and style checking?

From: <msevior_at_physics.unimelb.edu.au>
Date: Mon Jan 01 2007 - 02:07:44 CET

> Yes, I am very aware of what has be done with link-grammar. I've more
> or less ported the abi plugin to Qt and played with it a bit. OO.o
> also has a few grammar/style checkers that are in the works. We can
> also tap the knowledge of tools like GNU Diction.
>
> http://lingucomponent.openoffice.org/grammar.html
>
> So, what should the next steps be? I can write up an outline of the
> api and post it to this list. Is more investigation needed? Are there
> considerations specific to Abiword that I should be made aware of?
>
> Finally, are my requests for changes in Enchant realistic? I'll speak
> to Dom when he returns, but does anybody see potential objections?
>

Hi Jacob,
         I'll defer enchant discussion until when Dom gets back but I'm
sure he will be very reasonable.

Regarding the API for grammar checking, we have a very simple one already
in place which needs very little expansion.

Basically we need grammar/style checkers to support a single method:

bool LinkGrammarWrap::parseSentence(PieceOfText * pT)

Where PieceOftext are classes defined as:

class ABI_EXPORT AbiGrammarError
{
 public:
  AbiGrammarError(void);
  virtual ~AbiGrammarError(void);
  UT_sint32 m_iErrLow;
  UT_sint32 m_iErrHigh;
  UT_uint32 m_iWordNum;
  UT_UTF8String m_sErrorDesc;
};

class ABI_EXPORT PieceOfText
{
 public:
  PieceOfText(void);
  virtual ~PieceOfText(void);
  UT_sint32 iInLow;
  UT_sint32 iInHigh;
  UT_sint32 nWords;
  bool bHasStop;
  UT_UTF8String sText;
  bool m_bGrammarChecked;
  bool m_bGrammarOK;
  UT_GenericVector<AbiGrammarError *> m_vecGrammarErrors;
  UT_UTF8String m_sSuggestion;
  UT_sint32 countWords(void);
};

So the idea is that PieceOfText is a sentence. The text is current
contained in our UTF8 string class, however we could switch to a more
generic string container. After being parsed by the grammar checker is can
0 or more grammar errors. Each grammar error is defined in the class
AbiGrammarError
which is just the index to the start and end of the error as found by the
checker together with an optional description of the error found.

We could expand this class to contain a suggestion as to how to fix the
error.

All this is called asynchonisly at intervals we've found to be as optimal
as possible given the slowness of link-grammar. But that is all in the
AbiWord code base and need not be the concern of the grammar checking
library.

At this point in time we have not tried to employ any contextual analysis,
we just supply sentences. We could supply paragraphs with the sentences
embedded if a particular grammar checker needs it.

OK, that is where we are at with grammar checking. I have not devoted much
time to studying generic approaches to grammar checking, I've only looked
at what works for the combination of AbiWord and link-grammar.

If you're interested, go ahead and design an API that you feel would work
for various projects. I'm happy to provide comments and I'd love to be
able to test out new and better grammar checkers.

Cheers

Martin

> Thanks you,
>
> Jacob
>
> On 12/30/06, msevior@physics.unimelb.edu.au
> <msevior@physics.unimelb.edu.au> wrote:
>> >
>> > Hello,
>> >
>>
>> Hi Jacob,
>> Dom is away on holiday right now. This is a very interesting
>> proposal and I certainly hope we can find ways to co-operate. We
>> have framework for detecting grammar or style errors which works
>> along side our on-the-fly spelling code.
>>
>> The Abiword side of things works pretty well, but the best free grammar
>> checker we've found is link-grammar. While it is of some use, it is very
>> slow and has no ability to offer suggestions for what is wrong with a
>> particular sentence nor are other language well supported although in
>> principle they could be.
>>
>> Despite our best attempts to cooperate with the authors of link-grammar,
>> we've essentially maintained a friendly fork of the project. They appear
>> uninterested in merging our various fixes into their codebase.
>>
>> From my perspective, I'm very interested in the development of better
>> free
>> grammar and style checkers and welcome the opportunity to promote this.
>>
>> We have quite a different interface to the one grammar checker we
>> support
>> but I believe it should be relatively easy to support others in the
>> future.
>>
>> Are you familiar with link-grammar and our interface to it?
>>
>> Cheers
>>
>> Martin
>>
>>
>> > I'm the new co-maintainer of Sonnet in KDE. In KDE4 we are going to
>> > call our linguistics framework Sonnet, it will include tools such as
>> > kspell.
>> >
>> > Currently I'm in the middle of porting kspell to use Enchant
>> > exclusively rather than as a plugin. We won't have separate spelling
>> > plugins in KDE4, just Enchant.
>> >
>> > The is one aspect of Enchant that needs to change before we can fully
>> > adopt it. The location of the "enchant.ordering" file needs to be in a
>> > standard location (appropriate for the platform) and we need some way
>> > to know where it is located. We also need the option to use an
>> > explicit file given with an absolute system path. This will enable us
>> > to have GUI configure Enchant settings in a predictable manner.
>> >
>> > I'm willing to work with the Enchant team to extend and improve it.
>> >
>> > I've also come to the conclusion that we need another unified
>> > architecture, similar to Enchant for grammar and style/usage/diction
>> > checking. There seems to be little distinction between grammar errors,
>> > style errors or even potential semantic errors (i.e catching common
>> > errors such as "bed attitude" and suggesting s/bed/bad/ ) in the
>> > various tools available to do such checking. The justification for
>> > such a framework is the same as Enchant, to provide a unified
>> > interface for the various tools available.
>> >
>> > The API should should nearly be the same as Enchant, but also provide
>> > explanations of the detected error.
>> >
>> > I propose to call this project Elixir. The name fits in with Enchant
>> > name. It is a magic reference and refers to a concoction produced by
>> > some recipe that cures some or all ills, which is more or less what
>> > Elixir aspires to do for writing.
>> >
>> > Regards,
>> >
>> > Jacob R Rideout
>> >
>>
>>
>>
>
Received on Mon Jan 1 02:08:09 2007

This archive was generated by hypermail 2.1.8 : Mon Jan 01 2007 - 02:08:09 CET