Fwd: Improving AbiWord's PT

From: Martin Sevior <msevior_at_gmail.com>
Date: Thu Jun 04 2009 - 15:36:16 CEST

From Joaquin

---------- Forwarded message ----------
From: Joaquin Cuenca Abela <e98cuenc@gmail.com>
Date: Thu, Jun 4, 2009 at 7:10 PM
Subject: Re: Improving AbiWord's PT
To: Martin Sevior <msevior@gmail.com>
Cc: abiword-dev <abiword-dev@abisource.com>

Hi guys, it has been a long time!

There is not much that I can add beyond what I wrote about the PT back
in the days, the idea was to substitute the linked list of fragments
by what a later learn some people call a "counted tree", ie a balanced
tree that keep track on each node of how many elements are contained
in each of its descendants.

I think the choice of tree is going to be largely irrelevant for
Abiword, probably a b-tree one will be faster than a binary one with
today's memory architecture, but I don't think it will ever make a
measurable difference, so I will suggest to go for the simplest thing
that works.

I wrote a rb-tree that worked (as far as I know) in the latex importer
(I guess I was trying to get the price to the most over engineered
patch ever... :-) ), it can be a starting point for this. You can also
take the code in the web page that I wrote about this, if I remember
it was not working correctly.

bdb also has an implementation of this data structure. If you're
wondering why people don't add such to nodes to every b-tree, given
that it will allow to implement efficiently the LIMIT restrict in SQL
for certain queries, it's because its concurrency sucks, you have to
lock the root for each insert / delete. But given that Abiword is
single threaded this should not be a problem.

Cheers,

On Wed, Jun 3, 2009 at 3:26 AM, Martin Sevior <msevior@gmail.com> wrote:
> Hi everyone and especially Joaquin!
>
> (PS. congrats on your current gig with Google!)
>
>                  Our GSoC student,  Aditya Manthramurthy, has been
> making very impressive progress on his project to improve AbiWord's
> table layout algorithm. In fact he has identified and fixed the origin
> of our n^2 growth of the table layout code. However in his
> investigations he found that the layout code is actually a trivial
> component of the amount of time it takes to insert a large table.
>
> He then profiled our code discovered that the time is dominated by
> "insertStrux(...)
>
> ------------------
> Aditya Manthramurthy wrote:
>> Hi Martin,
>>
>> I have isolated that most of the time taken seems to be in calls to various insertStrux funcs in for loop starting at src/text/fmt/xp/fv_View_cmd.cpp - line 3305. All the work to layout the table happens after this for loop. This means that the PieceTable needs to sped up. Not knowing much about the piece table makes it harder to profile that part of the code more (using just valgrind), without setting up clock_gettime()s between bits of code. So I am not really sure about how to go on with this.
>>
>
> Digging in deeper using valgrind tells that
> pt_PieceTable::_realInsertStrux() takes up most of the insert table
> time.
>
>
> The screenshot below, shows the callees of the above function in the
> bottom part of the right side. The list is sorted in descending order
> of cost. [Here, cost seems to be proportional to time taken, afaik.]
>
> <screenshot supressed>
>
>
> The above dump is from a run of my branch of abiword, with an insert
> of a 400x4 table (it took 20-23 secs with valgrind!). The problem is
> very much out of the table layout code I guess. How do we go about
> improving the piece table?
>
> ------------------------------------
>
> Here is my response to Aditya:
>  --------------------------------------------------------------
> HI Aditya,
>             OK, the AbiWord PieceTable has a well known n dependence
> for insertions and deletions. I wrote this up on our wiki. You can
> find it here.
>
> http://www.abisource.com/wiki/AbiWordDevelopment#PieceTable_speedups.
>
> I think we're hitting the issues described by Joaquin.
>
> http://e98cuenc.free.fr/wordprocessor/piecetable.html
>
> Every insertStrux, means we recalculate the offset vector of the
> entire PiecetTable. Since the size of this vector increases with the
> size of the table,
> it means we get a n^2 dependence of the total insertion time.
>
> Look at the code in:
>
> src/text/ptbl/xp/pf_Fragments.cpp::cleanFrags()
>
> That is being called on every insertion. Note also that
> vecFrags grows with every insertion.
>
> So I can see two ways to go forward. We can work around not actually
> calling cleanFrags on every insertion or we can attempt to implement
> the
> more sophisticated PieceTable described by Joaquin.
>
> I would rather attempt to do the more sophisticated PieceTable. It
> would substantially improve AbiWord's performance for large documents.
>
> BTW I'm going to forward this discussion to the abiword ML.
>
> This is very interesting work Aditya!
>
> Cheers
>
> Martin
>
> ----------------------------------------------------------------------------------
>
> So there it is folks. Maybe the time has come to attempt Joaquin's PieceTable.
>
> Cheers!
>
> Martin
>

--
Joaquin Cuenca Abela
Received on Thu, 4 Jun 2009 23:36:16 +1000

This archive was generated by hypermail 2.1.8 : Thu Jun 04 2009 - 15:42:29 CEST