(Fwd) ispell hash files (for abiword).


Subject: (Fwd) ispell hash files (for abiword).
From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Wed May 02 2001 - 13:48:33 CDT


I though the message below merited the attention of the list;
Ramon makes rather a strong case for moving our ispell
configuration to the 8bit/56flags/128characters model, so we
probably should ditch the existing american.hash and compile a
new one (as the original >> posting proposed, I forgot who it was
from).

Tomas

------- Forwarded message follows -------
From: "Ramón Flores d'as Seixas" <fa2ramon@usc.es>
To: tomas@frydrych.uklinux.net
Subject: ispell hash files (for abiword).
Date sent: Mon, 19 Mar 2001 11:02:38 +0100

Dear Tomas:

   I have found a message from you in the abiword-dev mailinglist, that I want
to comment with you. If the message is not yours, please excuse my fault. I
write to you because I am not subscribed to the list. In any case I thank you
if you forward it to the list.

   I am user of Abiword, but not a developer, and I am making a
(galician) dictionary for ispell. And making tests I found that my dictionary
doesn't work correctly with Abiword, even when it works correctly with ispell.
I have made several test, using linux and windows 95. Using linux I could not
make it works at all, using windows 95 it works, but not correctly. After
searching in the abiword-dev mailing list I think that understood why: to
compile my dictionary I use 8bits and MASKBITS 64.

This is part of your message (I suppose):

>>Please note: I downloaded various language affix/hash med/xlg
>>files from the ispell home site and they also when built were 8-bit
>>56 flags 128 string characters.
>That's curious, because when you install the ispell package as is
>and build hash files with it, it will build 7bit/26flags/100 characters
>hashes.
 
>> It seems to me it is far more logical to dump the original
>> american.hash and change the 3 small changes that I suggested
>> in my emails to abi-dev.
>It would be possible to modify our ispell code to build 8/56/128
>hashes, but I am not entirely convinced that we should do that; the
>hashfiles built this way are quite a bit bigger than 8/26/100 hashes
>(the British xlg hash is about 2.1MB compared to 1.7MB), so I
>would not move to use 8/56/128 hashes unless we really need it

Well, I think that use "7bit/26flags/100 characters hashes" can be a good
option for english, but it isn't for majority of the other languages.
I have made a little study of the 24 ispell dictionaries cited in the Geoff
Kuenning's page (http://ficus-www.cs.ucla.edu/geoff/ispell-dictionaries.html)
and I have found that at least 10 of them need a different configuration,
I think that "8/56/128 hashes" will be OK. These 10 dictionaries are: Africaans,
Danish, Dutch, French, German, Italian, Portuguese, Norwegian, and Swedish. At
the end of the message I include bits of information about them.

I must stress that some of the others could need a "8/56/128 hashes"
configuration, but I coudn't obtain information.

Moreover, I think that the "8/56/128 hashes" configuration is getting usual in
the new linux distributions, because in this way all dictionaries can
be used. At least in Redhat, Mandrake and Conectiva they use the "8/56/128
hashes" configuration.

In windows I know two ways to obtain a binary ispell package, one of
then is using the one compiled by Piet Tutelaers using EMX/gcc:
(ftp://ftp.tue.nl/pub/tex/GB95/ispell-dutch96/ispellw32.zip)
and the other is using the cygwin port (from Humblet_Pierre_A?)
(http://www.hirmke.de/software/develop/gnuwin32/cygwin/porters/Humblet_Pierre_A/V1.1/ispell-3.1-cygwin-1.1-bin.README)
Both of them use 8bits, and the cygwin port uses 56 flags.

Sincely yours

        Ramón

Why I think the aforementioned dictionaries doesn't work properly with a
"7bit/26flags" configuration.

========================AFRICANS=================================

#
# FILE: afrikaans.aff (in 8bit ISO Latin1 character encoding!!)
#
# PURPOSE: affix file for Afrikaans wordlist (according to spelling defined
# in Afrikaanse Woordelys en Spelreëls 1991) for 8bit ispell
# version. (`Ispell -vv' should report `!NO8BIT (8BIT)'.)
#
# Derived from ideas in dutch96.aff
# by Piet Tutelaers (rcpt@urc.tue.nl)
#
# AUTHOR: Reinier de Vos <devos@aqua.ccwr.ac.za>
# VERSION: 1.0 (Feb. 2000)

========================DANISH=================================

# dansk.aff, version 1.4.1
#
# Copyright (c) Henrik Chr. Grove 2000.
# Copyright (c) G÷ran Andersson 1997.
#
# Notice that this affix file will only work (correctly) with
# ispell compiled with MASKBITS >= 64.

========================DUTCH=================================

#
# FILE: dutch96.aff (in 8bit ISO Latin1 character encoding!!)
#
# PURPOSE: affix file for new Dutch wordlist (according to spelling defined
# in Groene Boekje 1996) for 8bit ispell version. (`Ispell -vv'
# should report `!NO8BIT (8BIT)'.)

=======================FRENCH=================================

>From the IREQ-Francais readme file (a french ispell dictionary)
(ftp://ftp.robot.ireq.ca/pub/ispell/francais-IREQ.LISEZMOI)

IREQ-Francais.dico -- Un dictionnaire français pour ispell 3.1

Il faut s'assurer que ispell n'est pas compilé avec l'option NO8BIT
(voir config.X).
========================GERMAN====================================

# Affix table for German
#
#
# This affix file uses both upper- and lower-case affix flags, so you
# must #define MASKBITS 64 in your local.h file.
#
# Here's a record of flags used, in case you want to add new ones.
#
# ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
# Used: *** ****** ********** **** *** *** ** * ******
# ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
# Available: - - - ---- - --- ---
#
========================ITALIAN=====================================

Italian Ispell Installation Instructions

----------------------------------------------------------------------------

Requirements:
     the last release of Ispell, compiled with the ISO character set enabled
     (you need to comment the line "#define NO8BIT" in the file "local.h" in
     the distribution of ispell).

========================PORTUGUESE=================================

# Affix table for Portugues
# For JSpell and ISpell (3.1...)
#
# This affix file uses both upper- and lower-case affix flags, so you
# must #define MASKBITS 64 in your local.h file.
#
# 8 bits used (latin1) (please remove the #define NO8BIT line)

========================NORWEGIAN===================================
>From the README-file for the distribution of the Norwegian dictionaries
for ISPELL (http://www.uio.no/~runekl/README)

* CONFIGURE ISPELL The file Config.X in the ispell-3.1 distribution
  contains configuration information for ispell (no ./configure yet).
  The definitions are overridden by those in the file local.h, for
  which there is a local.h.samp. The following local.h works for me
  on my Redhat-6.0 system. You have to adopt the file to those
  languages you have dictionaries for.
-----------------------------------------------------------------------------
#define MINIMENU /* Display a mini-menu at the bottom of the screen */
#define USG /* Define this on System V */

#define BINDIR "/usr/bin"
#define LIBDIR "/usr/lib"
#define MAN1DIR "/usr/man/man1"
#define MAN4DIR "/usr/man/man4"

#define LANGUAGES "{american,MASTERDICTS=american.med+,HASHFILES=americanmed+.ha
sh,EXTRADICT=/usr/dict/words} {norsk}"
#define MASKBITS 64 <--------------------------
#define LOOK "look"
#define CFLAGS "-O3" /* Mostly to speed up my batch operations */
#define LDFLAGS "-s"
#define COMPOUNDBABEL
-----------------------------------------------------------------------------

========================SWEDISH=================================
>From the Swedish ispell homepage:
(http://www.sslug.dk/locale/ispell/iswedish/swedish.html)

Installation

First check that ispell is installed on your system. If it isn't, you can
get the source code from ispell's home page,
http://fmg-www.cs.ucla.edu/geoff/ispell.html.
Note that ispell must be compiled without the directive NO8BIT in the file
local.h. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

------- End of forwarded message -------



This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:00 CDT