Some thoughts on changing the underlying field code


Subject: Some thoughts on changing the underlying field code
From: Keith Stribley (keith@stribley.dabsol.co.uk)
Date: Sat Feb 12 2000 - 13:24:35 CST


I've been thinking about doing something on fields for awhile, so here are my ideas.

It seems to have been agreed that a field container is going to be used which means changes to the importer which currently assumes that Objects don't have any content. What should fields be represented as at the PieceTable level? Should we just add additional code to an object, is a new type of Strux more appropriate or even a completly new piecetable type? I'm struggling to get a grasp of all the issues involved, so would welcome some others looking at this.

Also, as I see it a single field may well need to consist of several runs (or have I misunderstand the code). Take for example a field with several words of text such as the author of a document, we want to allow a line to wrap in the middle of such a field. Clearly, the contents of a field needs to be represented by runs which are distinguishable from editable text. As I see it at least 2 possibilities exist:

1. Continue having specific fp_FieldRuns but modify them such that any one field is represented by a linked list of such runs to represent its content. The line breaking code would need to be implemented specifically for these FieldRuns.

2. Add code to allow any of the existing runs to be a non-editable field run. All the functionality for the current types of run, including line breaking etc., could then be used by a field if desired.

As far as where to implement the field calculation code, I was thinking of a completly new class structure which would be linked in at the document and piecetable fragment level. This is outlined in more detail below, but to start with I tried to split fields into categories from a functionality stand point.

A. Fields independent of position within document
   - although their value would probably be changing as the document was edited, the value displayed would be independent of where the field was inserted, but needs to be updated every time it changes.

B. Reference fields which are linked to a bookmark of some kind
   - These are also independent of their own position in the document, but potentially need to be updated every time their bookmark changes/moves.

C. Fields relating to position in document
   - This information is already available in current field implementation probably no need for any helper classes. Every time the field moves in the document they need to be updated.

D. Sequence fields eg for Figure numbering
   - These are dependent on their position relative to other sequence fields with the same sequence name. These will change every time a field of the same type above them in the document is added or removed.

E. Tables of contents etc
   - These change continuously with updating of the document and may need manual updating to avoid performance penalties.

F. Other types
   - logical, database fields etc - I'm assuming these won't be implemented for a long time so haven't thought much about them.

It is clear that each of these types has different requirements for updating. I'm sure there any many possible implementations but here is what I have come up with which I hope might at least be a basis of discussion. The attached pdf file contains a UML Class diagram (done in Dia) which hopefully clarifies the discussion. (The attributes and operations in the classes are not supposed to be exhaustive, just to give an idea of functionality. I also suspect their names need changing a bit).

A. Document wide information/attributes could be contained in a list at a document level. Any field which used an attribute would link to that specific attribute. Whenever that attribute was changed it would call update on all the fields linked to it.

B. Tags at block level or lower (eg. <p> <c> <field> - this looks consistent with Star Office, I'm not sure about MSWord) would be bookmarkable but would not be namespaced. When a bookmark was created it would first check with the document level BookmarkList that it was unique. The forceNewBookmark would generate a new unique name when it was not possible to prompt the user eg pasting in from another document. As for DocumentAttributes, Fields would be able to register themselves with the bookmark and would be told to update whenever the bookmark changed.

C. All fields would have access to their page number etc in the document from suitable pointers so no additional classes would be needed for positional classes.

D. The name of a sequence (eg. Figure, Equation, Table) would need to be held at a document level, but need only contain a pointer to the first such field. When updating a linked list should be sufficient. When inserting it would be necessary to iterate through the sequence to locate the position, but this method seems to used in other places in the code and so shouldn't be too much of a problem.

E. Possibly TOCs etc should be a new type of tag similar to a section as they will probably contain blocks within them. This could be implemented by keeping blocks of a given style (and lower levels) in a linked list, it will rather depend on how heading numbering is implemented. Some of the tables will be based on sequence fields, and I have already described how they could be linked together.

I'm sure there are some other fields which may not fit into this scheme, but possibly they are better left to a future scripting capability.

Finally, some intial thoughts on the new file format for some basic fields is also attached. It is loosely based on Justin Bradford's original posting.
I havn't included any "id" type attributes which would be invisible to the user, as some have suggested. This is because of the impact it would have if a document got its ids corrupted. If ids are in the file format they can only be corrected by manual editing of the file once corrupted. However, if they are generated in code on opening the file then there is a good chance they will be sorted out the next time the file is openned. I suspect this is the type of problem which causes Word to count Figures and Headings randomly whenever a document gets big and close to its deadline!

I hope the length of this isn't too much for my first message to the list!

I obviously don't have a full understanding of the document data and layout mechanism so I can well believe that there are issues I have completely failed to consider in this. It is clearly important to achieve a consensus on these design issues since I am suggesting quite major additions to the class structure of AbiWord.

cheers,

Keith



-- 
Keith Stribley		http://www.stribley.dabsol.co.uk/



This archive was generated by hypermail 2b25 : Sat Feb 12 2000 - 13:34:31 CST