Re: 5291 RTF import slow - the solution :-)

From: Raphael Finkel (raphael@cs.uky.edu)
Date: Wed Sep 24 2003 - 12:15:55 EDT

  • Next message: Ted Lemon: "Re: 5291 RTF import slow - the solution :-)"

    It sounds like hashing is just the thing you want. Every string gets
    hashed (can be fast), placed in a hash table (resolving collisions
    somehow; I prefer external chaining). Checking for equality and
    insertion in the table each requires about O(1) time (the time depends
    on the string length, whether your table is big enough, and how unlucky
    you are about collisions). It scales linearly with document length, not
    quadratically. It's a well-known technique.

    Raphael



    This archive was generated by hypermail 2.1.4 : Wed Sep 24 2003 - 12:31:26 EDT