Re: word and lists

Caolan McNamara (Caolan.McNamara@ul.ie)
Wed, 21 Jul 1999 14:38:10 +0100 (IST)


On 21-Jul-99 Jeff Hostetler wrote:
>Caolan, could you elaborate on what you've learned of m$word lists
>and stuff. i guess i always considered my proposal to be a strawman
>until we had sufficient feedback from those with insight into word files....

Lists are different in word than just about anything else i've encountered,
there is no connection between members of a list and their physical proximity.
There is no explicit list start or end marker, you have to check each paragraph
as it comes in and see if it is a member of a list of some kind. So by the
nature of this, member elements of a list can be distributed throughout the
document with paragraphs in between that have nothing to do with the list (which
of course causes your eyes to boil when trying to figure them out). Word keeps
a table of list information elsewhere, and each paragraph that is tagged as
belonging to a list contains some member ids that identify
1) the list that it belongs to and
2) the level of the list that it is in.
(Each list can contain up to 9 sublevels)
Also due to this, each list element is always a paragraph, (or at least thats
the only way i can imagine it)

So as each para starts you basically check to see if its a member of a list and
if so what level it is at. Using these scan the table info structures and get
what list properties might apply to it. If there are any you apply them to the
para, this usually takes the form of what style to put the number at the
beginning of the para into,and what distance from the number the texts starts,
and what layout changes might be have to be applied to the paragraph.

a sort of mad but very possible pseudo layout resulting from this would look
like this

<p PROPS="listid:list1" level=1>member of one list</p>
<p PROPS="listid:list2" level=1>member of a different list *right in the middle
of this list*<p>
<p PROPS="listid:list1" level=2>member of one list, level2</p>

giving you an output maybe like this..
1. member of one list
A. member of a different list
1.1 member of one list, level2

There is a whole nightmarish amount of structs and overrides for lists which
just boggle the imagination as to their complexity, and exceed the difficulty
of word's complex file format para and character property distribution. But as
a user of wv, you'd receive the information for each individual para as
to what list it thinks its in and where it is in the list. I imagine that in
practice for abiword for a potential situation like the above is that you'd
create a list entity that can contain a paragraph and then for the importer
youd create the list when you find the first para that is a member of the list,
close it when you find a para that isnt and create a completely new list that
starts at the next list number of the original list when you meet back up at
the next element of the original list.

I often get mails critizing mswordview about its list handling as it doesn't
use any of the html list constructs, but instead just outputs the paragraph
as a paragraph with the list number or bullet prepended to be beginning, which
is the way i implemented it at that stage.

They're a bit of a toughy to wedge into a structured layout.

Tables are nowhere near as bad, but they have the interesting feature that
a "table" doesn't exist. Only "rows" exist, and it just so happend that two
or three rows underneath eachother match up into a table (now this is as
how its stored in disk, not as manipulated by word as a program), So you
have no idea when starting out into a table how many rows there are, or
cells. So for my original case of converting to html, things got quickly
nasty, e.g. if the first row had only one cell and the next one had two cells
and you were doing (as i was) a one pass filter you didnt know in advance that
you should have made the first cell a colspan=2, so you ended up with rubbish,
in practice i just split the table into two one under the other if the amount
of cells differed from one row to another. For my next run at it :-), im going
to try something a wee more sophisticated where wv parses a table twice so
as to be able to tell a user app using it the table layout before it starts
giving it the content, i havn't thought too deeply about that one yet though.

C.

(i found lists the hardest, so I might have the wrong idea about them here and
there, so don't rely on my analysis of them)

Real Life: Caolan McNamara * Doing: MSc in HCI
Work: Caolan.McNamara@ul.ie * Phone: +353-86-8790257
URL: http://www.csn.ul.ie/~caolan * Sig: an oblique strategy
Question the heroic approach



This archive was generated by hypermail 1.03b2.