Proposal: using OTS and keeping the style info of the document;

From: Nadav Rotem (nadavrotem@mail.ru)
Date: Tue Jun 03 2003 - 01:46:39 EDT

  • Next message: Hubert Figuiere: "Commit (STABLE & HEAD): Re: New BeOS diff File!"

    Using OTS and keeping the style info of the document;

    At the moment when OTS is used the text is parsed into the internal
    ots structure of otsArticle that contains otsSentences where each
     sentence is a list of words (char *) ; The result is that fonts, sizes ,
    titles and footnotes are lost since they are stored as plain text;

    with the new proposed structure a pointer to the original styled data
    structure will be kept within every sentence;

    INPUT

    The plug-in will have to re-implement "ots_parse_file () " with a
    few minor changes;

    when loading the sentence the parser will add to each sentence a
    pointer to the original sentence structure that abiword uses; OTS will
    ignore this pointer in the internal processing algorithem and
    will not change the data;

    The input module may also set hints for ots to use in structure grading
    such as "Is it a title" etc. This may be implemented with a pointer to a
    structure that holds information such as is it a "new paragraph?" ,
    "title?" , "footnote?". armed with this info , ots will make better
    decisions of how to summarize the text; Its best that Abiword will detect
    that, since Ots has no styling info;
     

    OUTPUT

    The export module will have to be rewritten by abiword;
    Its as easy as the HTML.c or TEXT.c; just loop through the list of
    sentences and if ots set the "selected" flag "on" then it should be
    returned to the program (or simply not deleted);

    typedef struct
    {
      GList *words; /* a Glist of words (char*) */
      glong score;
      gboolean selected;
      gint wc;
            
      void *style; <---- be a pointer to style information or
                            a sentence structure

      void *structue; <-- be a pointer to info about this line , such as "is
                          it a title?"

    } OtsSentence;

    Updated version of this doc should be found in here:
    http://nadav.homelinux.org/data/ots_style.txt



    This archive was generated by hypermail 2.1.4 : Tue Jun 03 2003 - 02:04:28 EDT