commit: Some of these. Re: More 5291 speedups

From: Martin Sevior (msevior@seviorpc.ph.unimelb.edu.au)
Date: Fri Oct 03 2003 - 04:28:41 EDT

  • Next message: Hubert Figuiere: "Re: Commit: lots of spell checking changes"

                                                                                    
    CVS:
    ----------------------------------------------------------------------
    CVS: Enter Log. Lines beginning with `CVS:' are removed automatically
    CVS:
    CVS: Committing in .
    CVS:
    CVS: Modified Files:
    CVS: text/fmt/xp/fl_SectionLayout.cpp
    CVS: text/fmt/xp/fp_ContainerObject.cpp
    CVS: text/fmt/xp/fp_ContainerObject.h
    CVS: text/fmt/xp/fp_TableContainer.cpp
    CVS:
    ----------------------------------------------------------------------
    Jonny Lee's latest speedups. Great work!
                                                                                    I committed the clearContainer and the binary search for tablecells. These not only improve the import performance of abiword, they greatly improve the performance of the application while editting these large documents.

    We're now roughly on par with Open Office and MS Word for editting the
    RTFSpec although we could certainly improve the look at feel while
    editting. (For example we could just do a break section until after the
    current page has been reformatted before redrawing the page and leave
    the rest of the break section until after the current page is redrawn)

    (Look in fb_ColumnBreaker.cpp)

    I also put in some checks in the getCellAtRowCol to return NULL if the
    requested position is not in the table.

    I didn't commit the rest of the fixes because of the numerous conflicts
    of the patch with the new background image stuff.

    Johnny could you please do a cvs update to merge in current cvs, then
    resubmit the rest of the patch?

    Thanks!

    Martin

    Great Work :-)

    On Fri, 2003-10-03 at 16:55, Martin Sevior wrote:
    > On Fri, 2003-10-03 at 15:25, Johnny Lee wrote:
    > > Summary:
    > >
    > > There are several speedups here and some size reduction with these changes.
    > >
    > > Import and building the RTF doc has been reduced from 23 seconds to ~15
    > > seconds total.
    > >
    > > Some of the changes may not be acceptable since I'm concentrating on
    > > speeding up loading the RTF spec and I haven't been able to determine if
    > > some of my assumptions are valid.
    > > I'm just proposing a solution. It may not be the best solution since I
    > > barely know the code and the coding style. I did increase the size of
    > > fp_Container objects by 4 bytes and the size of UT_Stringbuf objects by 4
    > > bytes. Feel free to code up a different solution.
    > >
    > > The code for change #3 is in the attached bug5291e.txt.
    > > The code for change #1, #2, and #4 is in bug5291f.txt.
    > >
    > >
    > > Change #1
    > > ==============
    > > After profiling Abiword with all the previous 5291 patches, the function
    > > listed at the top was fp_TableContainer::getCellAtRowColumn.
    > >
    > > Looking at the code for this function, there's a linear search thru the
    > > cells in a table to find the cell in the appropriate spot.
    > >
    > > When I dumped the position portion of the cell objects, they turned up in
    > > sorted order, from top to bottom, left to right.
    > >
    > > *** If this assumption is not correct, then the patch will probably fail to
    > > work correctly. ***
    > >
    > > Using that sorted array, I replaced the linear search with a binary search.
    > >
    > > This saved about 4 seconds on building the document.
    > >
    > > Change #2
    > > ==============
    > > Profiling the new version, the top function listed was
    > > fp_Container::clearBrokenContainers. This is a recursive function which is
    > > called millions of times while building the document.
    > >
    > > It took me a long time to figure out the function of 'broken containers'.
    > >
    > > Robert Wilhelm had a patch so that clearBrokenContainers would only write to
    > > the m_pMyBrokenContainer field if it was non-NULL.
    > >
    > > I added code to see how often the code NULLed out this field.
    > >
    > > Less then 20 times for the RTF spec! Millions of calls to NULL a variable
    > > less than 20 times.
    > >
    > > The reason it has to do this recursively is that the parent container
    > > doesn't know anything about broken containers in its children.
    > >
    > > And how many broken containers are there?
    > >
    > > fp_Container::setMyBrokenContainer is only called about 120 times.
    > >
    > > So I added a field to fp_Container which represents the number of broken
    > > containers in the current container and any children.
    > >
    > > When fp_Container::setMyBrokenContainer is called, the code goes from the
    > > current container up to its parent, grandparent, etc. and increments the
    > > count of broken containers in each of them.
    > >
    > > When fp_Container::clearBrokenContainers is called, if m_pMyBrokenContainer
    > > is cleared, the broken--container-count is decremented in the current
    > > container, and in its parent and grandparent, etc.
    > >
    > > When it loops through the children containers, the code checks their
    > > broken-container-count. If it is zero, then we don't bother calling
    > > clearBrokenContainers on them.
    > >
    > > This saves a couple of seconds building the RTF spec.
    > >
    >
    > Hi Johnny,
    > I'm totally responsible the above code. The binary search for
    > getCellAtRowColumn is a great idea. I'll check your code and commit it.
    > The assumption of left before right, top before bottom is always valid.
    >
    > The clearBrokenContainers patch is a good general purpose idea for when
    > we're editing documents too. Once again after checking I'll commit
    > these.
    >
    > The string class changes I'll leave to Dom or FJF who had a lot more to
    > do with the code than me.
    >
    > Cheers
    >
    > Martin
    >
    > > Change #3
    > > ==============
    > > Profiling again, showed the following functions near the top:
    > >
    > > operator delete
    > > UT_String::UT_String
    > > UT_StringPtrMap::find_slot
    > > UT_StringPtrMap::pick
    > >
    > > Looking at the code, UT_StringPtrMap::pick creates a UT_String object which
    > > allocates space for the string, copies the string, and gets passed to
    > > UT_StringPtrMap::find_slot. Afterwards, the UT_String is destroyed which
    > > includes freeing the allocated string.
    > >
    > > I modified UT_Stringbuf and UT_String so that you can pass in a
    > > null-terminated string and no memory is allocated for the string, the code
    > > just references the string passed in - ideal when used for a read-only
    > > string, such as checking if something is in a hash table.
    > >
    > > My code changes add a few checks to the base string classes so that you
    > > can't modify the external string. You may want to come up with another
    > > solution that doesn't involve such extensive changes in the class. The idea
    > > is to get rid of the memory allocation/deallocation, etc. when the string is
    > > only examined. There are other places in ut_hash.cpp where this occurs as
    > > well.
    > >
    > > Change #4
    > > ==============
    > > Finally, a patch that deals more with reducing size than speed. I noticed in
    > > fp_TableContainer.cpp that there's lotsa code similar to this:
    > >
    > > getNthRow(i)->requisition = 0;
    > > getNthRow(i)->allocation = 0;
    > > getNthRow(i)->spacing = m_iRowSpacing;
    > > getNthRow(i)->need_expand = 0;
    > > getNthRow(i)->need_shrink = 0;
    > > getNthRow(i)->expand = 0;
    > > getNthRow(i)->shrink = 0;
    > >
    > > I looked at the disassembly of the code. The compiler didn't realize that
    > > the result from getNthRow(i) was the same for every line in this block of
    > > code. So the compiler generated code which would calculate the result for
    > > each statement.
    > >
    > > I created a variable and assigned the result from getNthRow/getNthCol to the
    > > variable and used the variable instead.
    > >
    > > This saved about 16 bytes each time with VC6.
    > >
    > > _________________________________________________________________
    > > Instant message during games with MSN Messenger 6.0. Download it now FREE!
    > > http://msnmessenger-download.com
    >



    This archive was generated by hypermail 2.1.4 : Fri Oct 03 2003 - 03:42:53 EDT