Text Access Interface (was: Re: The AbiWord side of a grammar checker...)

From: Omer Zak (omerz@actcom.co.il)
Date: Mon Sep 23 2002 - 00:04:05 EDT

  • Next message: Andrew Dunbar: "Re: commit: Vietnamese"

    On Mon, 23 Sep 2002, [iso-8859-1] Andrew Dunbar wrote:

    > Absolutely!
    >
    > > I'm definately not volenteering for the latter just
    > > the former. I don't think it would be too hard to
    > > split text in sentences, just look for full stops
    > > :-) If this is insufficient we can send out the
    > > entire paragraph of text, which is self-contained by
    > > default.
    >
    > Well how would we send it out?If we send it just as
    > a string of text how will we pass the start and end
    > positions of squiggles?
    > If we pass it in internal format it'll be yucky for
    > grammar types to hack on.We'll also have to watch
    > out for inline level attributes such as bold and
    > italic
    > which we don't want to bother the grammar checker
    > with.

    Let's specify a standard interface for accessing text?

    The interface will basically consist of callback function, which the
    external text processor can invoke to access Nth character of the text
    fragment it was asked to work with.

    Such an interface can be used also by an improved FriBidi (Tomas wrote in
    a personal communication that the FriBidi-AbiWord interface for passing
    text to be subjected to BiDi reordering is currently horribly
    inefficient).

    The interface shall consist of 3 pointers, which the client will pass to
    text processing server, when it requests the text processing server to do
    something (the server can be FriBidi, spell checker or grammar checker, or
    even another plugin which does something else):

    a) Pointer to a function which returns the number of characters in the
       text fragment being processed.
    b) Pointer to a function which returns the Nth character (counting from 0)
       of the text fragment, probably with a special case for accessing a
       simple character string?
    c) Handle which identifies the text fragment to the above functions.

    The prototypes of the functions will be as follows (FriBidiChar is the
    type of a single Unicode character, can use whatever type AbiWord
    prefers):

    a) int fraglength(void *handle);
    b) FriBidiChar fragchar(void *handle, int char_index);

    The handle will typically point at AbiWord's internal data structure
    describing the string.
    A text processing server, which needs also character attributes, can
    request also:
    c) AbiCharAttributes fragcharattributes(void *handle, int char_index);

    If the text processor needs to return information about parts of text
    fragments (such as segments to be decorated with squiggles), it can return
    an array of indices to the relevant characters, or to invoke another
    callback to convert char_index into an opaque pointer to be stored into
    array.

    While in C++, the above will look better, some text handling libraries
    (such as FriBidi) are limited to C.
                                                 --- Omer
    WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html



    This archive was generated by hypermail 2.1.4 : Mon Sep 23 2002 - 00:08:37 EDT