Subject: Re: Commit: fonts part 2 & 3
From: Jesper Skov (jskov@cambridge.redhat.com)
Date: Sun Mar 18 2001 - 04:58:02 CST
>>>>> "Dom" == Dom Lachowicz <cinamod@hotmail.com> writes:
Dom> This actually enables a fair bit of simple caching up the ladder
Dom> in the Run classes, so that setFont() doesn't get called all the
Dom> time. Also cuts down execution time on the default case (i.e. the
Dom> previous font is the same as the last one). There is still *tons*
Dom> of room for optimization here, and I plan on working on that in
Dom> the upcoming week. Abi is too darn slow. Hopefully this has made
Dom> it a bit better.
I've been secretly plotting to change the way text properties are
handled in AbiWord. But I've only been thinking lightly about it and
there are some issues that need some proper solutions so I've not been
wanting to raise the discussion.
However, seeing as I'll probably not get around to think about it in
depth for the next week, I might as well post a rant here and have
y'all think about it.
Your current changes (which I haven't looked at) apparently reduces
changes when subsequent Runs have the same properties. That's the core
idea in what I want to achieve (and I've even posted a short thing
about it a long time ago). But here it is again.
Presently we have something like (. denoting a set property):
Property Run_0 Run_1 Run_2 Run_3 ... Run_N
---------------------------------------------------------------
font . . . . ... .
ascent . . . . ... .
descent . . . . ... .
superscript . . . . ... .
subscript . . . . ... .
underline . . . . ... .
overline . . . . ... .
strikethrough . . . . ... .
FG color . . . . ... .
(other properties...)
Where each (Text)Run contains all the information to render
it. Whenever we split or join Runs, we clone / check these properties
and basically spend a lot of cycles on moving stuff around that
doesn't change. Work that doesn't actually need to be done for the
majority of the Runs: most body text will have the same attributes,
with a few exceptions such as section headings and your superscript 2
in E=mc2 :)
The alternative implementation I'm after will not be less powerful -
it simply moves all the properties into special property objects which
are maintained on a separate list.
So we get something like:
Property Att_0 Att_1 ... Att_N
-------------------------------------------------
font . . ... .
ascent . . ... .
descent . . ... .
superscript . . ... .
subscript . . ... .
underline . . ... .
overline . . ... .
strikethrough . . ... .
FG color . . ... .
(other properties...)
and the matching
Property Run_0 Run_1 Run_2 Run_3 ... Run_N
---------------------------------------------------------------
attributes Att_0 Att_0 Att_1 Att_0 ... Att_x
What does that get us?
o Automatic caching: well, you only do the lookups for each distinct
attribute object, so you don't need clever caching schemes to hide
the fact that properties are all over the place.
o Leaner run-time representation of documents: today, the leanest
representation you have of a document is just after loading it
since the amount of Runs are minimal for the given document
content. As soon as your editing, you're bloating the
representation with repeated attributes.
o Faster code: both due to the "caching" effect, but also because
we're not forever cloning non-changing stuff for all newly created
Runs. Oh, and because the code has to deal with a leaner
representation (hard to quantify the gain though, but it will be a
saving).
o Simpler objects: I look forward to being able to print a Run object
in GDB and it not taking up half the screen with stuff that I don't
really care about :)
At what cost?
o Well, we need to rewrite some code obviously. It will take time and
it'll probably introduce bugs.
o Worst case you get a bloated representation: that's when every Run
has a different attribute set than the previous Run (or all other
Runs depending on how clever we get, see below). The bloat is in
the form of the pointer from the Run to the attribute object, and
whatever malloc overhead is associated with having separate
structures. Content wise, there'll obviously be no change (we
already have full attribute info for each Run, it cannot be worse
than this :)
o Probably other things I haven't thought about. I'm not trying to
sugar up this baby - I'm freely admitting to not having thought
this completely through :0)
Implementation wise there are some possibilities for handling the
attribute objects. Here's three implementation suggestions that vary
in complexity (and memory footprint):
Simple: You start out with a single attribute object, the first Run
using this. When Runs are cloned they simply inherit attribute
object from the parent Run. If attribute changes are made,
create a new attribute object and let the changed Runs link to
that. Use reference counting in the attribute objects, delete
when not referred. Note that if a span is changed, the
attribute object can be changed iff the change spans # runs
matching the attribute object usage count. Otherwise, the
changed Runs need a new object.
Medium: Same as the simple scheme, but add a hash table (hash is a
function of the attribute properties), allowing changes to
look for existing attribute objects and use those. Avoids a
certain amount of potential representation bloat. Not sure how
much though [the effect will be bigger the longer time a
document gets edited. If we coalesce on saves even the simple
scheme will only have # attribute objects matching the #
of changes of attributes in the document - hashing version
will have # attribute objects matching # of different
attributes in the document]
Harder: [This is my favorite :] As with the medium scheme, but also add
persistent attributes. These can be named attributes which
Runs can be "assigned" to use. Let user add these named
attributes as she sees fit. But also ship with a set of
defaults:
heading 1
heading 2
heading 3
superscript
subscript
...
Geddit? If the user want, oh say, heading 1 to be set with
1.5 line spacing, she can make it so :0) And all Runs set
to use that attribute will adjust to the new setting -
throughout the document in one fell swoop.
Does this sound good or what? :) I haven't followed the
styles discussion, but I suspect it's somewhat the same
thing.
Unresolved issues:
o Impacts on the Undo/Redo mechanism
[I suspect with the reference counting, we'll be home free. Undoing
an attribute change will ask the backend to set a span to a new
(the old) attribute, and since the attribute object spans only
become smaller (ah, thermodynamics, entropy :) these changes will
apply without splitting runs or creating new attribute objects.]
Implementation time:
o Hard to say for sure. But I know someone who has a weeks vacation
in a weeks time and might just waste it on something silly such as
AbiWord development... Assuming people like the idea, of course.
OK, now it's time for you to shoot this thing down so I can spend my
vacation on something else :) Gimme your best!
Jesper
This archive was generated by hypermail 2b25 : Sun Mar 18 2001 - 04:58:28 CST