Re: problem seems to be solved: unreadable .hash files (dictionaries)


Subject: Re: problem seems to be solved: unreadable .hash files (dictionaries)
From: Paul Rohr (paul@abisource.com)
Date: Fri Mar 16 2001 - 16:13:08 CST


Vlad,

Thanks for the detective work!

At 02:02 PM 3/16/01 +0400, Vlad Harchev wrote:
> I remember a lot of people complained that AW can't use some hash files
(i.e.
>dictionaries for ispell) - that ispell module spits out some message about
>incorrect header..
> While helping other people to select a russian dictionary, I discovered
that
>'file' utility knows ispell format (at least on my RH6.0) and that we can
>judge whether the hash file will be loadable by ispell module or not
basing on
>the output of 'file' command. For example, here is an output for the
>russian.hash file that can be used by AW's ispell:
>
>[hvv@h dictionary]$ file russian.hash
>russian.hash: little endian ispell 3.1 hash file, 8-bit, capitalization, 26
>flags and 100 string characters
>[hvv@h dictionary]$
>
> It seems that hash files for which '7-bit' is mentioned in the output of
>'file' command can't be used by AW.

Bingo. That's it. If you grep the sources for NO8BIT, you'll see that one
of the few things it affects is SET_SIZE, which in turn controls the size of
various ispell structs, inclung the main hashtable.

  http://www.abisource.com/lxr/source/abi/src/other/spell/ispell.h#495

The error message we usually get is a sanity check to make sure that
ispell's not reading a hashtable of the wrong length. For example, see:

  http://bugzilla.abisource.com/show_bug.cgi?id=902
  http://bugzilla.abisource.com/show_bug.cgi?id=824

Note that the hashtable loader currently just reads the entire struct from
disk to memory here:

  http://www.abisource.com/lxr/source/abi/src/other/spell/lookup.c#159

Gag. Methinks it would be prudent to just rewrite the loader to do the math
to detect this situation and do the extra work needed to try and load 7-bit
content into the 8-bit structs we currently use.

>Also it turns out that (at least for
>russian dictionary) it's possible to specify whether to use 7-bit or 8-bit
>format of hash files by altering Makefile for dictionary (there are makefile
>variables that control that). So, it seems we have a hope of knowing te
way of
>building ispell dictionaries that will be understood by our ispell. At least
>we may try to build .hash files for languages for which only unreadable by
our
>iconv compiled dictionaries are available..

Exactly. Until someone's willing to write the code mentioned above to also
load 7-bit dictionaries, we now have a few simple workarounds:

  - update the FAQ to tell folks not to use 7-bit dictionaries
  - ideally, point them to 8-bit alternatives

Any volunteers? ;-)

Paul



This archive was generated by hypermail 2b25 : Fri Mar 16 2001 - 16:13:09 CST