Re: Piece Tables


Subject: Re: Piece Tables
jeff@abisource.com
Date: Tue Apr 04 2000 - 22:29:19 CDT


On Mon, Mar 27, 2000 at 02:51:49PM +0100, raj.curwen@eutech.com wrote:
>
> Can anyone explain the piece table structures, especially use of spans (or
> fragments). The current document (pieceTable.html) is out of date.
>

[ the term 'span' is used in the formatter and the piece table
  to refer to a sequence of text within a paragraph (aka block).
  the term 'fragment' is used in the piece table to refer to a
  piece of the document. ]

a document is constructed as a doubly-linked list of fragments.
there is a fragment base-type and sub-classes for each type of
fragment; this includes "section-markers", "paragraph-markers",
"images", "blobs-of-text", and etc.

each fragment has an INDEX to its attributes/properties [pFrag->m_indexAP].
each text fragment also has an INDEX to its text [pTextFrag->m_bufIndex].
these INDEXes are are special and are interpreted by the pt_VarSet
member of the pt_PieceTable [pPT->m_varset].

any given piece of a document may contain arbitrary set of [XML]
attributes or [CSS-like] PROPERTIES. these are stored in a PP_AttrProp
[2 hash tables of name/values pairs, one for attrs and one for props]
and we do some crunching on them [to get to a cannonical form] and
compute a signature and then we store them in an pp_TableAttrProp
[glorified vector with uniqueness] in the pt_VarSet. the result of
this is an "attr-prop-index". any two frag may be tested for
format/style-equality by testing the numerical value of their
PT_AttrPropIndex members. as new, unique combinations of formatting
occur, new PP_AttrProp's are created and thus new PT_AttrPropIndex
values. These are never deleted, even as they become unreferenced
in the current contents of the document, since the unlimited undo/redo
may 'revive' it.

all document text is appended to the end of a large UT_GrowBuf. the
ordering of data in this buffer reflects the order it was entered --
either during import or during editing -- and not necessarily the
current order in the document. each text fragment contains an PT_BufIndex
and length pair which can be used to get the fragment's actual text
[via pt_VarSet].

all fragments (except text fragments) are of length 1 (ie one unit
in terms of keyboard motion of the caret). the length of a text
fragment is determined by the length of contiguous text in the UT_GrowBuf
[and must be all of the same formatting]. for example, upon import
a paragraph (with no inline format changes), can be stored in one text
fragment. if you type a character in the middle, you will get 3 text
frags -- the existing one will be split and a new one (of length 1)
will be inserted between them. if you delete that character, the middle
frag will be deleted and the other 2 will be coalesced into one.
symmetry is very important here [for undo/redo to work].
the value of the new character will be appended to the growbuf (and
never deleted).

<extreme_detail>the pt_VarSet actuall contains 2 growbuf's and 2
pt_TableAttrProp's. [0] is used during import and is considered
read-only. [1] is used during edits. one should be able to autosave
by blasting the [1]'s to disk [along with the change records], and [along
with the original document] be able to recover -- the [0]'s aren't
necessary (since they are constructed exclusively by the importer).
also, i wanted the generality in place incase we need to put a
limit on the size of the growbufs (and use more of them) for performance
reasons (or should we ever (gulp) attempt a 16bit port)).</extreme_detail>

in order to get undo/redo to work, we also have change history
[px_ChangeHistory]. this contains a vector of change records
[PX_ChangeRecord]. as change operations (edits) occur, a new
change record (of the appropriate type) is appended to this history.
each change record represents an atomic operation -- relative to
the current state of the document -- such that it is possible to
compute an inverse of it (for undo purposes). that is, a change
record of "Span[InsertSpan,docPosition,attrPropIndex,bufIndex,length]"
can be flipped into "Span[DeleteSpan,docPosition,...length]".

undo works, by flipping the last change record in the history
and eval'ing it. by keeping the history as a vector (along
with an undo-index), we can "undo" all the way back to the
beginning (the state as imported). redo works by walking the
history forward and replaying the change records (as i said
symmetry is important (also state consistency)).

unlike, the other attr/prop vectors and growbufs, the change
history vector can be truncated -- if you undo a bunch and then
start editing, we will truncate the history (at the undo index)
and start creating new change records.

in an attempt to conserve memory (and make the user experience
more pleasant) we try to coalesce change records if possible.
this allows a sequence of edits to be collapsed into one, such
as typing in a paragraph (without mistakes) or a series of
delete-chars -- and then undo-ing it in one operation rather
than forcing you to undo-it char-by-char. [personally, i think
we might be too aggressive here, but i don't want to change
anything at this point.]

the change records are also broadcast as they happen to all
"DocListeners". the formatter/view is an example of a document
listener. the change record contains enough information for
the receiver (listener) to update its representation of the
document based upon this atomic change.

<extreme_detail>one sub-class of change record is called a
GLOB [PX_ChangeRecord_Glob]. these are used to wrap
a group of one or more (non-glob) change records to allow them
to appear to the user as a single step for the purposes of
undo/redo. some globbing may occur because of the nature of
the editing operation (a global search/replace or a replace-
the-current-selection-with-text) or because of the fragmenting
of the underlying document (such as a delete-selection that
spans multiple paragraphs).</extreme_detail>

well, that pretty much sums up the major points of the piece
table. at this point, i would highly recommend that you look
at the piece table and formatter test dump routines that i put
in -- these are quite readable and should help to illustrate
what's going on. start with an empty document, dump it, type
some stuff, dump it, edit some, dump it, and etc. and you can
see how things react....

to get a dump, use ALT F12 (with a debug build). you should
see a dump acknowledgement on the xterm or msvc debug window
and a series of numbered files in your current directory.
set a breakpoint in abi/src/text/fmt/xp/fv_View_TestRoutines.cpp
inside FV_View::Test_Dump() if you want to step thru the
festivities.

hope this helps,
jeff



This archive was generated by hypermail 2b25 : Tue Apr 04 2000 - 22:29:20 CDT