From: msevior@physics.unimelb.edu.au
Date: Wed Sep 24 2003 - 07:13:37 EDT
>
> The present system has two real weaknesses; one is the need to do
> string comparisons when comparing properties, and the other is the  fact
> that we have to clone the string values when making copies,
> passing properties through functions, etc. If the properties were stored
>  internally in a numerical format both of these would go away. I think
> Hub's proposal should be considered.
>
> Tomas
The problem is in pt_VarSet.cpp::addIfUnique(...)
If instead of requiring each index to be unique we just add every property
set we get a dramatic increase in loading speed at the cost of extra
memory usage.
What is happenning is that almost every cell in that huge 3000 cell table
will give a unique indexAP since every one has different properties. But
we don't know that until we've finished scanning the entire Att/Prop set.
After that we add another.
So we get a quadratic decrease in speed with document size.
However if we put in the following ifdef....
bool pt_VarSet::addIfUniqueAP(PP_AttrProp * pAP, PT_AttrPropIndex * papi)
{
        // Add the AP to our tables iff it is unique.
        // If not unique, delete it and return the index
        // of the one that matches.  If it is unique, add
        // it and return the index where we added it.
        // return false if we have any errors.
        UT_ASSERT(pAP && papi);
        UT_uint32 subscript = 0;
#if 0
        UT_uint32 table = 0;
        for (table=0; table<2; table++)
                if (m_tableAttrProp[table].findMatch(pAP,&subscript))
                {
                        // use the one that we already have in the table.
                        delete pAP;
                        *papi = _makeAPIndex(table,subscript);
                        return true;
                }
        // we did not find a match, so we store our new one.
#endif
        if (m_tableAttrProp[m_currentVarSet].addAP(pAP,&subscript))
        {
                *papi = _makeAPIndex(m_currentVarSet,subscript);
                return true;
        }
        // memory error of some kind.
        UT_ASSERT(UT_SHOULD_NOT_HAPPEN);
        delete pAP;
        return false;
}
The document in bug 5290 loads into the piecetable in 8 seconds on my 1
Ghz laptop as opposed to 40 seconds without the ifdef.
The full document load (layout stops filling) takes 24 seconds with the
ifdef 0 as opposed to 55 seconds without it.
By comparison MS Word 2000 running under wine takes 4 seconds to load it's
piecetable and a total of 11 seconds to fully load the document.
(When it's layout structures stop filling.)
Sorry I don't have Open Office on this machine for another comparison.
I've done some quick tests on my machine and AbiWord still works fine with
the ifdef 0. I guess we should do extensive tests and well as memory usage
comparisons before and after the #ifdef.
My guess is that the ifdef is worth it. The memory usage will grow
linearly with document size as it does now from the storing of text and
building layout structures.
If we wish to save on memory we could always store the strings gzipped.
So everyone, should we put in the #ifdef 0 code?
Cheers
Martin
This archive was generated by hypermail 2.1.4 : Wed Sep 24 2003 - 07:33:30 EDT