Re: 5291 - A Task Force

From: Dom Lachowicz (domlachowicz@yahoo.com)
Date: Thu Sep 25 2003 - 13:49:16 EDT

  • Next message: Dom Lachowicz: "Re: 5291 - A Task Force"

    Hi Mike,

    > I'd say: Yes. All two of them should internally be
    > represented in the same
    > way, and when written out to a (possibly new) file
    > they should be
    > normalized.

    Some pre-processing normalization might help here.
    Like stripping out extraneous spaces, ordering props
    alphabetically, etc... Then again, it might be
    expensive to do such a thing.

    > Indeed! It also displays a crucial flaw in my idea.
    > Yes, a single 32-bit
    > value would still be good to represent such
    > "combined" props, but how to
    > represent it?

    See above for some proposed "intelligence". However,
    you need to make sure that your ordering + integer
    assignment phase is faster than what we're doing now.
    This isn't clear to me to be the case (though it may
    very well be).

    Since we've got 27000 add/store cases, that will
    result in 27000 re-orderings, plus 47,000,000 integer
    comparisons. The cost of handing out a new integer
    will be constant-time (i++). It all hinges on how
    expensive the reordering phase is. A quick test could
    be to make use of FJF's new string map class which
    will take a CSS props block and then you sort its
    contents.

    > > We have
    > > quadratic behavior in our comparison routine. We
    > must
    > > perform 47,000,000 comparisons from something
    > that's
    > > only called 27,000 times.
    >
    > That can't be right, can it? 27K^2 = 729K, more than
    > an order of magnitude
    > worse (like it wasn't bad anough already ;-) ).

    O(n^2), not n^2. What we have is a situation like
    this:

    For the first item, the list is empty. Insert it. For
    the second item, you must do 1 comparison to see if
    it's a duplicate. For the second, you must do (on
    average) 1, but maybe 2. You'll see that you have an
    alogorithm that looks like this for an "average" case
    where you're as likely to find something in the first
    half as you are in the second half:

    27000/2 + 26999/2 + 26998/2 + ...
    or
    (n/2) + ((n-1)/2) + ((n-2)/2) + ...

    So you end up having things that are roughly (n *
    (n/2)) operations, or O(n^2) behavior. In any case,
    it's bad.

    > > Yes, strcmp is a little
    > > wasteful. Is the solution to replace strcmp with
    > an
    > > integer comparison, or to find a solution that
    > doesn't
    > > require calling strcmp (or an equivalent integer
    > > comparison) 47 million times?
    >
    > That's rethoric. :-) What is really want to ask is:
    > Do you know of an algorithm to take AW away from
    > this n^2-hell? If so, speak
    > up!

    Not so much rhetoric as me being genuinely confused as
    to a good course of action. If we can do better than
    O(n^2), that's great. I'm not sure we can. But I
    haven't given it enough thought.

    If that fials, we might turn that strcmp into a if()
    statement. That _might_ be a big boost too. That is,
    *assuming that normalization of properties and
    assignment of integer ids proves fast*. I mean fast
    here in terms of both BIG-O complexity, and in
    wall-clock times, with a deference given to wall-clock
    times since our users really are looking at their
    watches at this point.
     
    > I myself hasn't been in the loop for quite some
    > time, but I do recognize the
    > same problems popping up over and over - and this is
    > one of'em. I believe
    > trying to remove that n^2 property is a major
    > concern. Should that prove
    > (too) difficult, at least going from O=str(i)cmp to
    > O=integer_comparison
    > would probably buy a *lot*.

    It might. It's worth prototyping at least. If each by
    itself proves worthwhile, combining the two would
    probably be an even bigger win.

    Dom

    __________________________________
    Do you Yahoo!?
    The New Yahoo! Shopping - with improved product search
    http://shopping.yahoo.com



    This archive was generated by hypermail 2.1.4 : Thu Sep 25 2003 - 14:04:37 EDT