From: Nadav Rotem (nadavrotem@mail.ru)
Date: Tue Jun 03 2003 - 01:46:39 EDT
Using OTS and keeping the style info of the document;
At the moment when OTS is used the text is parsed into the internal
ots structure of otsArticle that contains otsSentences where each
sentence is a list of words (char *) ; The result is that fonts, sizes ,
titles and footnotes are lost since they are stored as plain text;
with the new proposed structure a pointer to the original styled data
structure will be kept within every sentence;
INPUT
The plug-in will have to re-implement "ots_parse_file () " with a
few minor changes;
when loading the sentence the parser will add to each sentence a
pointer to the original sentence structure that abiword uses; OTS will
ignore this pointer in the internal processing algorithem and
will not change the data;
The input module may also set hints for ots to use in structure grading
such as "Is it a title" etc. This may be implemented with a pointer to a
structure that holds information such as is it a "new paragraph?" ,
"title?" , "footnote?". armed with this info , ots will make better
decisions of how to summarize the text; Its best that Abiword will detect
that, since Ots has no styling info;
OUTPUT
The export module will have to be rewritten by abiword;
Its as easy as the HTML.c or TEXT.c; just loop through the list of
sentences and if ots set the "selected" flag "on" then it should be
returned to the program (or simply not deleted);
typedef struct
{
GList *words; /* a Glist of words (char*) */
glong score;
gboolean selected;
gint wc;
void *style; <---- be a pointer to style information or
a sentence structure
void *structue; <-- be a pointer to info about this line , such as "is
it a title?"
} OtsSentence;
Updated version of this doc should be found in here:
http://nadav.homelinux.org/data/ots_style.txt
This archive was generated by hypermail 2.1.4 : Tue Jun 03 2003 - 02:04:28 EDT