From: Raphael Finkel (raphael@cs.uky.edu)
Date: Wed Sep 24 2003 - 12:15:55 EDT
It sounds like hashing is just the thing you want. Every string gets
hashed (can be fast), placed in a hash table (resolving collisions
somehow; I prefer external chaining). Checking for equality and
insertion in the table each requires about O(1) time (the time depends
on the string length, whether your table is big enough, and how unlucky
you are about collisions). It scales linearly with document length, not
quadratically. It's a well-known technique.
Raphael
This archive was generated by hypermail 2.1.4 : Wed Sep 24 2003 - 12:31:26 EDT