Re: Abiword internal represenations, paragraph boundaries and pf_frag / strux

From: Ben Martin <monkeyiq_at_users.sourceforge.net>
Date: Thu Jun 30 2011 - 13:15:41 CEST

I now have a dumpDoc() in my github branch which gives a little insight
into what internal objects are used where in a loaded doc.

See for example:
https://github.com/monkeyiq/odf-2011-track-changes-git-svn/blob/2fbefdb33b9b957302b7e23947ae0362a65bc8c7/src/text/ptbl/xp/pt_PT_DeleteSpan.cpp#L84

The dump at the end of this message is for a document with a few
paragraphs followed by a table and some trailing paragraphs. A paragraph
from an .abw file <p> tag becomes a PFT_Strux of type PTX_Block. A <c>
span becomes a PFT_Text.

Later in the dump one can see that a cell becomes a normal para series
wrapped in a PTX_SectionCell to start and PTX_EndCell to close.

It should be noted that pd_document uses PL_StruxDocHandle instead of
passing back pf_Frag_Strux* objects. Depending on which you have you
either have to call the method of the fragment or other methods on
pd_document. For example, the pf_Frag class has getPos() but for a
PL_StruxDocHandle you have to use document->getStruxPosition( sdh );

I'm still working out why the different abstractions are there for the
fragments between pd_doc and the piecetable classes.

Happy hacking folks...

DEBUG: dumpDoc(IE_Exp_OpenDocument()) showing range:0 to 139
DEBUG: dumpDoc() PFT_Strux pos:0 frag:0xe1d4a0 len:1 frag type:2
extra:
DEBUG: PTX_Section eStruxType:0
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:1 frag:0x13389a0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:2 frag:0x13368d0 len:5 frag type:0
extra:para1
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:7 frag:0x13431d0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:8 frag:0x13436d0 len:5 frag type:0
extra:para2
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:13 frag:0x1343800 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:14 frag:0x1343ad0 len:5 frag type:0
extra:para3
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:19 frag:0x13440c0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:20 frag:0x13449a0 len:5 frag type:0
extra:para4
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:25 frag:0x1344a40 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:26 frag:0x1338d80 len:5 frag type:0
extra:para5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:31 frag:0x1338ee0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:32 frag:0x1345820 len:5 frag type:0
extra:para6
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:37 frag:0x1345f10 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:38 frag:0x1345980 len:2 frag type:0
extra:pa
DEBUG: dumpDoc() PFT_Text pos:40 frag:0x1336c50 len:3 frag type:0
extra:ra7
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:43 frag:0x11c3060 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:44 frag:0x134aa30 len:5 frag type:0
extra:para8
DEBUG: dumpDoc() PFT_Strux pos:49 frag:0x12fac00 len:1 frag type:2
extra:
DEBUG: PTX_SectionTable eStruxType:4
DEBUG: dumpDoc() PFT_Strux pos:50 frag:0x1345460 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:51 frag:0x134cc80 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:52 frag:0x1350060 len:2 frag type:0
extra:ce
DEBUG: dumpDoc() PFT_Text pos:54 frag:0x134ab10 len:3 frag type:0
extra:ll1
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:57 frag:0x134fb40 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:58 frag:0x13509e0 len:7 frag type:0
extra:cell1p2
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:65 frag:0x134abd0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:66 frag:0x13470c0 len:4 frag type:0
extra:c1p3
DEBUG: dumpDoc() PFT_Strux pos:70 frag:0x1351560 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:71 frag:0x1351820 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:72 frag:0x1350e00 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:73 frag:0x1347110 len:5 frag type:0
extra:cell2
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:78 frag:0x1350300 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:79 frag:0x1352720 len:4 frag type:0
extra:c2p2
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:83 frag:0x1350c30 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:84 frag:0x1352800 len:4 frag type:0
extra:c2pe
DEBUG: dumpDoc() PFT_Strux pos:88 frag:0x1087b10 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:89 frag:0xffb050 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:90 frag:0xf89400 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:91 frag:0xf6d0b0 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:92 frag:0xf44620 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:93 frag:0xef3c00 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:94 frag:0xeec3b0 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:95 frag:0x13525f0 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:96 frag:0x135d750 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:97 frag:0x135d8e0 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:98 frag:0x135dbd0 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:99 frag:0x135e130 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:100 frag:0x135e2c0 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:101 frag:0x135e5f0 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:102 frag:0x13581b0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:103 frag:0x1358340 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:104 frag:0x1358670 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:105 frag:0x135f5c0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:106 frag:0x135f750 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:107 frag:0x135fa50 len:1 frag type:2
extra:
DEBUG: PTX_SectionCell eStruxType:5
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:108 frag:0x135ffd0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Strux pos:109 frag:0x1360160 len:1 frag type:2
extra:
DEBUG: PTX_EndCell eStruxType:11
DEBUG: dumpDoc() PFT_Strux pos:110 frag:0x1360210 len:1 frag type:2
extra:
DEBUG: PTX_EndTable eStruxType:12
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:111 frag:0x1360280 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:112 frag:0x135f9b0 len:5 frag type:0
extra:para9
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:117 frag:0x1360980 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:118 frag:0x135eb30 len:6 frag type:0
extra:para10
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:124 frag:0x135ec90 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:125 frag:0x1361520 len:6 frag type:0
extra:para11
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:131 frag:0x1361680 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1
DEBUG: dumpDoc() PFT_Text pos:132 frag:0x1361b80 len:6 frag type:0
extra:para12
DEBUG: dumpDoc()
DEBUG: dumpDoc() PFT_Strux pos:138 frag:0x1361ce0 len:1 frag type:2
extra:
DEBUG: PTX_Block eStruxType:1

On Tue, 2011-05-31 at 11:07 +1000, Ben Martin wrote:
> Hi,
>
> While digging around looking at change tracking I was looking into the
> internal document model of abiword again and thought I'd send this
> somewhat terse message to the list which may be of interest to new abi
> hackers. Conversely, if there are any gross mistakes below, please
> feel free to point them out to me :)
>
> The below is skewed toward my goal at hand: tracking explicitly the
> start and end of a paragraph element. Which translates to knowing when
> a PTX_Block has its beginning and/or end deleted to merge with
> previous or subsequent content.
>
> Much of my understanding has been put together by RTFC. For
> pf_fragment and strux layout, one might consider
> ODe_AbiDocListener::populateStrux() which does a switch on the
> getStruxType() of the PX_ChangeRecord_Strux and simulates an end block
> using _closeBlock() when a PTX element other than the set Z is
> encountered.
>
> Z = {
> PTX_SectionFootnote,
> PTX_SectionEndnote,
> PTX_SectionAnnotation
> };
>
> In particular, when a PTX_Block is encountered, the old block, if open
> is first closed. Concretely, it appears that a "paragraph" can be
> considered to be the document content from a PFT_Strux/PTX_Block
> marker to a PFT_Strux/{all PTX - set Z} marker.
>
> The following trace might be of interest, it is done during a deletion
> so the same position is encountered many times as content is deleted
> and thus lower content moves upwards in the document. The abw file
> fragment follows it, the trace was made by deleting from the "r" in
> para1 through to the "r" in para3 inclusive.
>
> Pos 5 a pf_frag of length 4, a fragment type PFT_Text.
> Pos 7 a pf_frag of length 1, type PFT_Strux
> strux type PTX_Block
> Pos 8 a pf_frag of length 1, a fragment type PFT_Text.
> Pos 9 a pf_frag of length 3, a fragment type PFT_Text.
> Pos 12 a pf_frag of length 1, a fragment type PFT_Text.
> Pos 13 a pf_frag of length 1, type PFT_Strux
> strux type PTX_Block
> Pos 13 a pf_frag of length 1, type PFT_Strux
> strux type PTX_SectionTable
> Pos 13 a pf_frag of length 1, type PFT_Strux
> strux type PTX_Block
> Pos 13 a pf_frag of length 7, type PFT_Text
> Pos 13 a pf_frag of length 1, type PFT_Strux
> strux type PTX_Block
> Pos 14 a pf_frag of length 2, type PFT_Text
> Pos 16 a pf_frag of length 3, type PFT_Text
>
> The simplified abw fragment:
>
> <section xid="4">
> <p style="Normal" xid="5">
> <c revision="1">p</c>
> <c revision="1"/>
> <c revision="1,!2{font-weight:bold}{author:0}">ara1</c>
> </p>
> <p revision="1" style="Normal" xid="1">
> <c revision="1">p</c>
> <c revision="1"/>
> <c revision="1,!2{font-style:italic}{author:0}">ara</c>
> <c revision="1">2</c>
> </p>
> <p revision="4" style="Normal" xid="7">
> <c revision="1"/>
> </p>
> <table revision="4" xid="8">
> <cell revision="4" xid="9">
> <p style="Normal" revision="4" xid="10"><c revision="4">r1c1</c></p>
> </cell>
> <cell revision="4" xid="12">
> <p style="Normal" revision="4" xid="13"><c/></p>
> </cell>
> <cell revision="4" xid="15">
> <p style="Normal" revision="4" xid="16"><c/></p>
> </cell>
> <cell revision="4" xid="18">
> <p style="Normal" revision="4" xid="19"><c revision="4">r2c2</c></p>
> </cell>
> </table>
> <p revision="4" style="Normal" xid="6">
> <c revision="4">para2.B</c>
> </p>
> <p revision="1" style="Normal" xid="2">
> <c revision="1,!2{font-weight:bold}{author:0}">pa</c>
> <c revision="1,!2{font-weight:bold}{author:0}"/>
> <c revision="1">ra3</c>
> </p>
> </section>

Received on Thu Jun 30 13:16:09 2011

This archive was generated by hypermail 2.1.8 : Thu Jun 30 2011 - 13:16:09 CEST