Re: speed results


Subject: Re: speed results
From: Paul Rohr (paul@abisource.com)
Date: Tue Apr 17 2001 - 14:57:58 CDT


At 09:04 AM 4/15/01 +1000, Martin Sevior wrote:
>With a header in place getEdittableBounds() searches for the end of the
>Edittable region by skipping through the SectionLayout claases until it
>bumps up against a HdrFtrSection. Then it Finds the last block then then
>last run in the block and returns the docPosition of the run. That's the
>end of Edittable region.

Cool!

>On the other hand getBounds() skips through all
>the Frags in the document until it runs out of then. Then find the
>position of the last one.

Yep. Note, however, that in *both* cases you still walk the entire frag
list, which gets slower and slower as documents balloon.

Jeff's original code to calculate DocPositions just walks through all
Frags from the beginning of the document, adding lengths. It's not fast,
but it's guaranteed to be reliable, which is good, since any math errors
here would be devastating. (Think about it.)

At the time, the two of us discussed a few caching strategies which could
speed up this operation for common cases, but didn't have enough data on
actual use cases to choose the right one. For example:

  - cache the last lookup results
  - cache the overall doc size
  - cache aggregate lengths of a strux frag's contents (walk vs. skip)
  - etc.

The first is simplest, because it allows "repeat" lookups, and doesn't need
to interact with the editing code. The second would speed up getBounds, and
allow walking from either end of the document, but at the cost of minimal
interactions on edit. The third enables more speedups, but requires a lot
more edit-time interaction, and could theoretically slow down typing. I'm
sure that clever folks with a good profiler could come up with even more
alternatives to these three.

Caching code often tends to be tricky to debug and maintain, and at the time
we had other things to get working (like typing, for instance). Thus, we
agreed to file the TODO bug mentioned earlier in this thread to come back
and speed this up later. Sounds like now is the time, huh? :-)

Any volunteers? If so, I'd suggest that any patches or commits in this area
be accompanied by:

  - before and after profile results
  - a theoretical explanation of *why* it's faster
  - an explanation of why *this* cache isn't brittle

Thanks,
Paul

PS: For critical caches like this, it might also be worth stealing a trick
from the Excel development team and add some DEBUG-only code which compares
the results of *both* approaches:

  - the cache, and
  - the full-walk it supercedes

thereby detecting any runtime errors in the caching code. It shouldn't make
debug builds any slower than they already are, and it's a *wonderful* way to
isolate any problems that might arise.



This archive was generated by hypermail 2b25 : Tue Apr 17 2001 - 14:50:39 CDT