Re: How to import text boxes from MS Word.

From: msevior_at_physics.unimelb.edu.au
Date: Sat Mar 20 2004 - 12:29:40 EST

Next message: Fábio Ranquetat: "new version for pt_BR.strings"

Previous message: Tomas Frydrych: "Re: How to import text boxes from MS Word."
In reply to: Tomas Frydrych: "Re: How to import text boxes from MS Word."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

> Hi Martin,
>
> The basic issue with the footnotes is that they are not stored inside
> the main document as we do in AW, but reside in a separate sub-
> document. So, inside the _docProc() we load the positioning
> information about all of the footnotes (i.e., the document position
> at which the note references are located) -- this is done by the
> _handleNotes() function.
>
> Later on, everytime we read a bit of the document in, we check the
> new document position (in the main document) agains the positions of
> the notes we retrieved earlier; if we find a note the position of
> which matches the current document position, we insert it -- this is
> handled by _handleNotesText() function.
>
> The code for the textboxes should be similar to that for the notes,
> although the docs are, as often, very confusing. The text box is a
> special case of office art object, so we will have to retrieve info
> for all art objects and ignore those that are not boxes.
>
> We retrieve the FSPA structs from the plcspa as we do for the notes,
> except there are two separate plcspa streams, one for the main data
> (plcspaMom) and another for hdr/ftrs (plcspaHdr). I am not sure
> whether we will need two separate tables (like m_pFootnotes) for
> these, or whether we could hold all the data in the same table,
> probably the former.
>
> Once we have the FSPAs, we have to translate the FSPA into the actual
> art shape data using the dggInfo table into which FSPA.spid is
> somekind of an index -- the problem is that the format of the drawing
> data (stored in dggInfo) does not seem to be described in the docs I
> have.
>
> Once we have the shape data, we should be able to get from it TXTID
> from which we can isolate the index n into the plctxbxs: plctxbxs[n]
> will give us the offset of the text for our text box in the textbox
> subdocument; plctxbxs[n+1] holds the postion immedately after the
> last of our text, i.e., textlength = plctxbxs[n+1]-plctxbxs[n].
>
> The real snag is the the translation of FSPA into the drawing shape
> data; I think you will need to examine the OO importer to find out
> what the format of this data is. The rest is identical to the notes
> handling.
>

*sigh* This is clearly going to be hard work. Ximian have very nicely put
the source code of OOo in a nice easy to browse form. After some search I
believe I have found the relevant parts of the code for MSWord import.
However much of the helpful comments and debug prints are in German...

See Here.

http://ooo.ximian.com/lxr/source/sw/sw/source/filter/w4w/

It's going to be a while before I make sense of this...

Martin

> Tomas
>>
>> Hi Tomas,
>> I would really like to get Text Boxes imported from MS Word.
>> From reading the docs and scanning the wv code it appears that
>> the process is very similar to that for importing
>> footnotes/endnotes. There is a seperate set of tables that hold
>> the text outside the main stream of the document flow. It alos
>> appears that wv can recognize them and makes them available.
>>
>> However I don't understand your code that does the footnote/ednote
>> imports. I think that importing text boxes will be very similar,
>> especially since the RTF import of text boxes is a pretty good match to
>> our piecetable - much like footnotes/endnotes is.
>>
>> Anyway, any help you can give me to get text boxes imported from MS Word
>> would be most appreciated :-)
>>
>> Cheers
>>
>> Martin
>
>
>

Next message: Fábio Ranquetat: "new version for pt_BR.strings"
Previous message: Tomas Frydrych: "Re: How to import text boxes from MS Word."
In reply to: Tomas Frydrych: "Re: How to import text boxes from MS Word."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sat Mar 20 2004 - 12:32:16 EST