From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Tue Jun 03 2003 - 03:14:30 EDT
Sorry this will be a very quick reply.
Why not ignore the styles in OTS and instead just
have word iterator functions like "getNextWord" which
returns the plain word from the document to OTS, which
would then process it, then another function OTS can
call into AbiWord to say whether this word is in or
out of the summry. This way you leave all the
formatting decisions up to AbiWord itself.
It might take a couple of passes. The first one to
analyze the text, the next to iterate through and
"do something" for each word depending on whether it's
in or out.
Andrew.
--- Nadav Rotem <nadavrotem@mail.ru> wrote: >
> Using OTS and keeping the style info of the
> document;
>
> At the moment when OTS is used the text is parsed
> into the internal
> ots structure of otsArticle that contains
> otsSentences where each
> sentence is a list of words (char *) ; The result
> is that fonts, sizes ,
> titles and footnotes are lost since they are stored
> as plain text;
>
> with the new proposed structure a pointer to the
> original styled data
> structure will be kept within every sentence;
>
> INPUT
>
> The plug-in will have to re-implement
> "ots_parse_file () " with a
> few minor changes;
>
> when loading the sentence the parser will add to
> each sentence a
> pointer to the original sentence structure that
> abiword uses; OTS will
> ignore this pointer in the internal processing
> algorithem and
> will not change the data;
>
> The input module may also set hints for ots to use
> in structure grading
> such as "Is it a title" etc. This may be implemented
> with a pointer to a
> structure that holds information such as is it a
> "new paragraph?" ,
> "title?" , "footnote?". armed with this info , ots
> will make better
> decisions of how to summarize the text; Its best
> that Abiword will detect
> that, since Ots has no styling info;
>
>
> OUTPUT
>
> The export module will have to be rewritten by
> abiword;
> Its as easy as the HTML.c or TEXT.c; just loop
> through the list of
> sentences and if ots set the "selected" flag "on"
> then it should be
> returned to the program (or simply not deleted);
>
>
> typedef struct
> {
> GList *words; /* a Glist of words
> (char*) */
> glong score;
> gboolean selected;
> gint wc;
>
> void *style; <---- be a pointer to style
> information or
> a sentence structure
>
> void *structue; <-- be a pointer to info about
> this line , such as "is
> it a title?"
>
> } OtsSentence;
>
>
>
> Updated version of this doc should be found in here:
> http://nadav.homelinux.org/data/ots_style.txt
>
>
=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com
__________________________________________________
Yahoo! Plus - For a better Internet experience
http://uk.promotions.yahoo.com/yplus/yoffer.html
This archive was generated by hypermail 2.1.4 : Tue Jun 03 2003 - 03:29:54 EDT