Re: Commit: fonts part 2 & 3


Subject: Re: Commit: fonts part 2 & 3
From: Jesper Skov (jskov@cambridge.redhat.com)
Date: Sun Mar 18 2001 - 04:58:02 CST


>>>>> "Dom" == Dom Lachowicz <cinamod@hotmail.com> writes:

Dom> This actually enables a fair bit of simple caching up the ladder
Dom> in the Run classes, so that setFont() doesn't get called all the
Dom> time. Also cuts down execution time on the default case (i.e. the
Dom> previous font is the same as the last one). There is still *tons*
Dom> of room for optimization here, and I plan on working on that in
Dom> the upcoming week. Abi is too darn slow. Hopefully this has made
Dom> it a bit better.

I've been secretly plotting to change the way text properties are
handled in AbiWord. But I've only been thinking lightly about it and
there are some issues that need some proper solutions so I've not been
wanting to raise the discussion.

However, seeing as I'll probably not get around to think about it in
depth for the next week, I might as well post a rant here and have
y'all think about it.

Your current changes (which I haven't looked at) apparently reduces
changes when subsequent Runs have the same properties. That's the core
idea in what I want to achieve (and I've even posted a short thing
about it a long time ago). But here it is again.

Presently we have something like (. denoting a set property):

Property Run_0 Run_1 Run_2 Run_3 ... Run_N
---------------------------------------------------------------
font . . . . ... .
ascent . . . . ... .
descent . . . . ... .
superscript . . . . ... .
subscript . . . . ... .
underline . . . . ... .
overline . . . . ... .
strikethrough . . . . ... .
FG color . . . . ... .
(other properties...)

Where each (Text)Run contains all the information to render
it. Whenever we split or join Runs, we clone / check these properties
and basically spend a lot of cycles on moving stuff around that
doesn't change. Work that doesn't actually need to be done for the
majority of the Runs: most body text will have the same attributes,
with a few exceptions such as section headings and your superscript 2
in E=mc2 :)

The alternative implementation I'm after will not be less powerful -
it simply moves all the properties into special property objects which
are maintained on a separate list.

So we get something like:

Property Att_0 Att_1 ... Att_N
-------------------------------------------------
font . . ... .
ascent . . ... .
descent . . ... .
superscript . . ... .
subscript . . ... .
underline . . ... .
overline . . ... .
strikethrough . . ... .
FG color . . ... .
(other properties...)

and the matching

Property Run_0 Run_1 Run_2 Run_3 ... Run_N
---------------------------------------------------------------
attributes Att_0 Att_0 Att_1 Att_0 ... Att_x

What does that get us?

 o Automatic caching: well, you only do the lookups for each distinct
   attribute object, so you don't need clever caching schemes to hide
   the fact that properties are all over the place.

 o Leaner run-time representation of documents: today, the leanest
   representation you have of a document is just after loading it
   since the amount of Runs are minimal for the given document
   content. As soon as your editing, you're bloating the
   representation with repeated attributes.

 o Faster code: both due to the "caching" effect, but also because
   we're not forever cloning non-changing stuff for all newly created
   Runs. Oh, and because the code has to deal with a leaner
   representation (hard to quantify the gain though, but it will be a
   saving).

 o Simpler objects: I look forward to being able to print a Run object
   in GDB and it not taking up half the screen with stuff that I don't
   really care about :)

At what cost?

 o Well, we need to rewrite some code obviously. It will take time and
   it'll probably introduce bugs.

 o Worst case you get a bloated representation: that's when every Run
   has a different attribute set than the previous Run (or all other
   Runs depending on how clever we get, see below). The bloat is in
   the form of the pointer from the Run to the attribute object, and
   whatever malloc overhead is associated with having separate
   structures. Content wise, there'll obviously be no change (we
   already have full attribute info for each Run, it cannot be worse
   than this :)

 o Probably other things I haven't thought about. I'm not trying to
   sugar up this baby - I'm freely admitting to not having thought
   this completely through :0)

Implementation wise there are some possibilities for handling the
attribute objects. Here's three implementation suggestions that vary
in complexity (and memory footprint):

Simple: You start out with a single attribute object, the first Run
        using this. When Runs are cloned they simply inherit attribute
        object from the parent Run. If attribute changes are made,
        create a new attribute object and let the changed Runs link to
        that. Use reference counting in the attribute objects, delete
        when not referred. Note that if a span is changed, the
        attribute object can be changed iff the change spans # runs
        matching the attribute object usage count. Otherwise, the
        changed Runs need a new object.

Medium: Same as the simple scheme, but add a hash table (hash is a
        function of the attribute properties), allowing changes to
        look for existing attribute objects and use those. Avoids a
        certain amount of potential representation bloat. Not sure how
        much though [the effect will be bigger the longer time a
        document gets edited. If we coalesce on saves even the simple
        scheme will only have # attribute objects matching the #
        of changes of attributes in the document - hashing version
        will have # attribute objects matching # of different
        attributes in the document]

Harder: [This is my favorite :] As with the medium scheme, but also add
        persistent attributes. These can be named attributes which
        Runs can be "assigned" to use. Let user add these named
        attributes as she sees fit. But also ship with a set of
        defaults:
          heading 1
          heading 2
          heading 3
          superscript
          subscript
          ...

         Geddit? If the user want, oh say, heading 1 to be set with
         1.5 line spacing, she can make it so :0) And all Runs set
         to use that attribute will adjust to the new setting -
         throughout the document in one fell swoop.

         Does this sound good or what? :) I haven't followed the
         styles discussion, but I suspect it's somewhat the same
         thing.

Unresolved issues:

 o Impacts on the Undo/Redo mechanism
   [I suspect with the reference counting, we'll be home free. Undoing
   an attribute change will ask the backend to set a span to a new
   (the old) attribute, and since the attribute object spans only
   become smaller (ah, thermodynamics, entropy :) these changes will
   apply without splitting runs or creating new attribute objects.]

Implementation time:

 o Hard to say for sure. But I know someone who has a weeks vacation
   in a weeks time and might just waste it on something silly such as
   AbiWord development... Assuming people like the idea, of course.

OK, now it's time for you to shoot this thing down so I can spend my
vacation on something else :) Gimme your best!

Jesper



This archive was generated by hypermail 2b25 : Sun Mar 18 2001 - 04:58:28 CST