Re: How to import text boxes from MS Word.

From: msevior_at_physics.unimelb.edu.au
Date: Sat Mar 20 2004 - 12:29:40 EST

  • Next message: Fábio Ranquetat: "new version for pt_BR.strings"

    > Hi Martin,
    >
    > The basic issue with the footnotes is that they are not stored inside
    > the main document as we do in AW, but reside in a separate sub-
    > document. So, inside the _docProc() we load the positioning
    > information about all of the footnotes (i.e., the document position
    > at which the note references are located) -- this is done by the
    > _handleNotes() function.
    >
    > Later on, everytime we read a bit of the document in, we check the
    > new document position (in the main document) agains the positions of
    > the notes we retrieved earlier; if we find a note the position of
    > which matches the current document position, we insert it -- this is
    > handled by _handleNotesText() function.
    >
    > The code for the textboxes should be similar to that for the notes,
    > although the docs are, as often, very confusing. The text box is a
    > special case of office art object, so we will have to retrieve info
    > for all art objects and ignore those that are not boxes.
    >
    > We retrieve the FSPA structs from the plcspa as we do for the notes,
    > except there are two separate plcspa streams, one for the main data
    > (plcspaMom) and another for hdr/ftrs (plcspaHdr). I am not sure
    > whether we will need two separate tables (like m_pFootnotes) for
    > these, or whether we could hold all the data in the same table,
    > probably the former.
    >
    > Once we have the FSPAs, we have to translate the FSPA into the actual
    > art shape data using the dggInfo table into which FSPA.spid is
    > somekind of an index -- the problem is that the format of the drawing
    > data (stored in dggInfo) does not seem to be described in the docs I
    > have.
    >
    > Once we have the shape data, we should be able to get from it TXTID
    > from which we can isolate the index n into the plctxbxs: plctxbxs[n]
    > will give us the offset of the text for our text box in the textbox
    > subdocument; plctxbxs[n+1] holds the postion immedately after the
    > last of our text, i.e., textlength = plctxbxs[n+1]-plctxbxs[n].
    >
    > The real snag is the the translation of FSPA into the drawing shape
    > data; I think you will need to examine the OO importer to find out
    > what the format of this data is. The rest is identical to the notes
    > handling.
    >

    *sigh* This is clearly going to be hard work. Ximian have very nicely put
    the source code of OOo in a nice easy to browse form. After some search I
    believe I have found the relevant parts of the code for MSWord import.
    However much of the helpful comments and debug prints are in German...

    See Here.

    http://ooo.ximian.com/lxr/source/sw/sw/source/filter/w4w/

    It's going to be a while before I make sense of this...

    Martin

    > Tomas
    >>
    >> Hi Tomas,
    >> I would really like to get Text Boxes imported from MS Word.
    >> From reading the docs and scanning the wv code it appears that
    >> the process is very similar to that for importing
    >> footnotes/endnotes. There is a seperate set of tables that hold
    >> the text outside the main stream of the document flow. It alos
    >> appears that wv can recognize them and makes them available.
    >>
    >> However I don't understand your code that does the footnote/ednote
    >> imports. I think that importing text boxes will be very similar,
    >> especially since the RTF import of text boxes is a pretty good match to
    >> our piecetable - much like footnotes/endnotes is.
    >>
    >> Anyway, any help you can give me to get text boxes imported from MS Word
    >> would be most appreciated :-)
    >>
    >> Cheers
    >>
    >> Martin
    >
    >
    >



    This archive was generated by hypermail 2.1.4 : Sat Mar 20 2004 - 12:32:16 EST