100-string ispell hashes (was Re: Commit: more localization stuff)


Subject: 100-string ispell hashes (was Re: Commit: more localization stuff)
From: Paul Rohr (paul@abisource.com)
Date: Fri Jul 06 2001 - 19:01:33 CDT


At 01:13 PM 6/27/01 -0400, Dom Lachowicz wrote:
>Hi Hub,
>
>Quoting Hubert Figuiere <hfiguiere@teaser.fr>:
>
>> The problem is that Debian provides 100 string hash files. I find idiot
>> to
>> provide ispell hash files if there are already with the system (think
>> packager side). So it may be smarter to also handle 100 string hashes. I
>> don't know if it is doable and to which extent.
>
>It'd be great to support both.

Absolutely. It's unfortunate that different flavors of ispell hashes are
found "in the wild", but since they do, we ought to Just Work with more of
them.

>But the problem is that this value gets set at
>compile-time, so we can support one or the other *easily* (we may be able to
>support both types, but that would require lots more work).

Dom's right. It's currently a compile-time option.

As mentioned before on this list, the problem is that the current hashloader
is dumb, dumb, dumb. The differences between 100-string hashfiles and
128-string hashfiles are *extremely* trivial. Both get read from disk into
the SAME kind of struct in memory using the EXACT SAME CODE. Want to guess
what it is?

  1. fread() the entire 20KB+ struct in one swell foop
  2. check the magic number at the beginning of the struct
  3. check for the same magic number at the end of the struct
  4. spot check a few other things in-between
  5. if anything's not OK, puke
  6. otherwise, use that header struct to read in the bulk of the dictionary

So what's the difference between 100-string and 128-string hashes?

In one case the struct is "wider" because it's storing more information in
the "middle" of the hashheader struct. (This difference is what the
compile-time switch controls.) Other than that, it's basically the same
struct, so in theory, it really shouldn't be too hard to write code to read
the first portion of the struct, see how wide the "middle" needs to be, and
then read the rest in.

The way you know you've done the math right is if you can locate both copies
of the magic number at the beginning and the end of the on-disk hashheader.

As one of the many sacrificial victims who's "done time" wading through our
hacked-up version of the ispell code, I can attest that the current code
ain't pretty to read, but that shouldn't scare anyone off.

>I'm not spending any
>more of my time making ispell work. Others may feel differently and are
>encouraged to work on the ispell code all they like.

Yep. It's time for someone else to volunteer for what ought to be a fairly
straightforward task. It might not be pretty, but it'd be very very useful.
(In fact, the combination makes it a pretty righteous hack.) In any event,
it certainly oughta be a heck of a lot easier than what Thomas just did to
get multiple instances of ispell running simultaneously.

Any takers?

Paul



This archive was generated by hypermail 2b25 : Fri Jul 06 2001 - 18:54:22 CDT