Re: Proposal: using OTS and keeping the style info of the document;

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Tue Jun 03 2003 - 03:14:30 EDT

  • Next message: Hubert Figuiere: "1.0.6 binaries wanted"

    Sorry this will be a very quick reply.
    Why not ignore the styles in OTS and instead just
    have word iterator functions like "getNextWord" which
    returns the plain word from the document to OTS, which
    would then process it, then another function OTS can
    call into AbiWord to say whether this word is in or
    out of the summry. This way you leave all the
    formatting decisions up to AbiWord itself.

    It might take a couple of passes. The first one to
    analyze the text, the next to iterate through and
    "do something" for each word depending on whether it's
    in or out.

    Andrew.

     --- Nadav Rotem <nadavrotem@mail.ru> wrote: >
    > Using OTS and keeping the style info of the
    > document;
    >
    > At the moment when OTS is used the text is parsed
    > into the internal
    > ots structure of otsArticle that contains
    > otsSentences where each
    > sentence is a list of words (char *) ; The result
    > is that fonts, sizes ,
    > titles and footnotes are lost since they are stored
    > as plain text;
    >
    > with the new proposed structure a pointer to the
    > original styled data
    > structure will be kept within every sentence;
    >
    > INPUT
    >
    > The plug-in will have to re-implement
    > "ots_parse_file () " with a
    > few minor changes;
    >
    > when loading the sentence the parser will add to
    > each sentence a
    > pointer to the original sentence structure that
    > abiword uses; OTS will
    > ignore this pointer in the internal processing
    > algorithem and
    > will not change the data;
    >
    > The input module may also set hints for ots to use
    > in structure grading
    > such as "Is it a title" etc. This may be implemented
    > with a pointer to a
    > structure that holds information such as is it a
    > "new paragraph?" ,
    > "title?" , "footnote?". armed with this info , ots
    > will make better
    > decisions of how to summarize the text; Its best
    > that Abiword will detect
    > that, since Ots has no styling info;
    >
    >
    > OUTPUT
    >
    > The export module will have to be rewritten by
    > abiword;
    > Its as easy as the HTML.c or TEXT.c; just loop
    > through the list of
    > sentences and if ots set the "selected" flag "on"
    > then it should be
    > returned to the program (or simply not deleted);
    >
    >
    > typedef struct
    > {
    > GList *words; /* a Glist of words
    > (char*) */
    > glong score;
    > gboolean selected;
    > gint wc;
    >
    > void *style; <---- be a pointer to style
    > information or
    > a sentence structure
    >
    > void *structue; <-- be a pointer to info about
    > this line , such as "is
    > it a title?"
    >
    > } OtsSentence;
    >
    >
    >
    > Updated version of this doc should be found in here:
    > http://nadav.homelinux.org/data/ots_style.txt
    >
    >

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Yahoo! Plus - For a better Internet experience
    http://uk.promotions.yahoo.com/yplus/yoffer.html



    This archive was generated by hypermail 2.1.4 : Tue Jun 03 2003 - 03:29:54 EDT