Re: Proposal: using OTS and keeping the style info of the document;

From: Dom Lachowicz (domlachowicz@yahoo.com)
Date: Tue Jun 03 2003 - 09:06:04 EDT

  • Next message: Nadav Rotem: "Re: Proposal: using OTS and keeping the style info of the document;"

    Hi Nadav,

    It's not entirely clear to me that we'll want to
    retain style information. In the case of
    summarization, style information is grossly secondary
    to the actual text content itself. Losing the style
    information isn't such a big deal (IMO of course). I
    think it's much more likely that our user will want to
    re-format the summary using yet another set of styles,
    such as Headings.

    Dom

    --- Nadav Rotem <nadavrotem@mail.ru> wrote:
    >
    > Using OTS and keeping the style info of the
    > document;
    >
    > At the moment when OTS is used the text is parsed
    > into the internal
    > ots structure of otsArticle that contains
    > otsSentences where each
    > sentence is a list of words (char *) ; The result
    > is that fonts, sizes ,
    > titles and footnotes are lost since they are stored
    > as plain text;
    >
    > with the new proposed structure a pointer to the
    > original styled data
    > structure will be kept within every sentence;
    >
    > INPUT
    >
    > The plug-in will have to re-implement
    > "ots_parse_file () " with a
    > few minor changes;
    >
    > when loading the sentence the parser will add to
    > each sentence a
    > pointer to the original sentence structure that
    > abiword uses; OTS will
    > ignore this pointer in the internal processing
    > algorithem and
    > will not change the data;
    >
    > The input module may also set hints for ots to use
    > in structure grading
    > such as "Is it a title" etc. This may be implemented
    > with a pointer to a
    > structure that holds information such as is it a
    > "new paragraph?" ,
    > "title?" , "footnote?". armed with this info , ots
    > will make better
    > decisions of how to summarize the text; Its best
    > that Abiword will detect
    > that, since Ots has no styling info;
    >
    >
    > OUTPUT
    >
    > The export module will have to be rewritten by
    > abiword;
    > Its as easy as the HTML.c or TEXT.c; just loop
    > through the list of
    > sentences and if ots set the "selected" flag "on"
    > then it should be
    > returned to the program (or simply not deleted);
    >
    >
    > typedef struct
    > {
    > GList *words; /* a Glist of words
    > (char*) */
    > glong score;
    > gboolean selected;
    > gint wc;
    >
    > void *style; <---- be a pointer to style
    > information or
    > a sentence structure
    >
    > void *structue; <-- be a pointer to info about
    > this line , such as "is
    > it a title?"
    >
    > } OtsSentence;
    >
    >
    >
    > Updated version of this doc should be found in here:
    > http://nadav.homelinux.org/data/ots_style.txt
    >
    >

    __________________________________
    Do you Yahoo!?
    Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
    http://calendar.yahoo.com



    This archive was generated by hypermail 2.1.4 : Tue Jun 03 2003 - 09:21:27 EDT