Re: ispell blues (was: localization formats proposal)


Subject: Re: ispell blues (was: localization formats proposal)
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sat Jul 21 2001 - 06:25:37 CDT


 --- Frodo Looijaard <frodol@dds.nl> wrote: > Martin
Sevior wrote:
>
> > > Remind me: How should I create ispell
> dictionaries so they'll work
> > > with AbiWord?
> > >
> >
> > Sorry I don't know. Frodo does though. Care to
> post a link to a HOWTO
> > Frodo?
>
> I was not really sure myself. One option is simply
> to use pspell, which
> will do the hard work for you - but I gather that is
> not an option on
> all platforms.

For non latin-1 dictionaries, you need to create a
plain ASCII text file in the same directory as the
dictionary with the suffix "-endoding" added:
ru-RU.hash -> ru-RU.hash-encoding
This file contains the standard name of the encoding
in plain ASCII. For the above example:
KOI8-R

This is handled by AbiWord's ispell interface code,
not by ispell itself. (I belive Vlad implemented
this technique). I have no idea how pspell
handles encodings sorry.

Please note this is *very* important for non-western
languages. Spellchecking will not work without it.

Andrew Dunbar.

> I still find it a bit wasteful to have AbiWord use
> its own ispell
> libraries, when the system-wide ones are perfectly
> fine. I would like
> to suggest the following solution:
>
> * When we distribute binary packages, we include
> all AbiWord ispell
> libraries
> * When somebody compiles from source and selects
> ispell, not pspell,
> we try to find any system-wide installed ispell
> dictionaries. If
> none are found, we again install the AbiWord
> ispell libraries. If
> we do find a system-wide ispell, we determine
> what options are used
> in it (see below) and compile our own ispell
> support with the same
> options. Perhaps we still offer the option to
> use AbiWord's own
> dictionaries, though.
>
> I have attached a small C program that will detect
> for you what options
> an ispell hash is compiled with (rewriting it in C++
> is left as an exercize
> for the reader). When run on the (old) AbiWord
> ispell hashes you get this:
>
> (bash-2.05.0)
> frodo@arda:~/projects/abiword/ispell-detect$
> ./ispell-detect ../abidistfiles/dictionary/*.hash
>
../abidistfiles/dictionary/BigEndian32.american.hash:
> big endian (options: 0002) #undef NO8BIT
> #define NO_CAPITALIZATION_SUPPORT
> #define MASKBITS 32
>
../abidistfiles/dictionary/LittleEndian32.american.hash:
> little endian (options: 0006)
> #undef NO8BIT
> #define NO_CAPITALIZATION_SUPPORT
> #define MASKBITS 64
>
> Comments?
> Frodo
>
> --
> Frodo Looijaard <frodol@dds.nl> PGP key and more:
> http://huizen.dds.nl/~frodol
> Defenestration n. (formal or joc.):
> The act of removing Windows from your computer in
> disgust, usually followed
> by the installation of Linux or some other
> Unix-like operating system.
> > #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> int main (int argc, char *argv[])
> {
> int i,fd;
> unsigned char buf[4];
> unsigned int options;
>
> for (i=1; i < argc; i++) {
> printf("%s: ",argv[i]);
> if (! (fd = open(argv[i],O_RDONLY))) {
> printf("can't open!\n");
> goto NEXT;
> }
> /* Not quite safe, should be put in a for loop */
> if (4 != read(fd,buf,4)) {
> printf("can't read!\n");
> goto NEXT1;
> }
> if ((buf[0] == 0x96) && (buf[1] == 0x02)) {
> options = (buf[2] << 8) + buf[3];
> printf("big endian (options: %04x)\n",options);
> } else if ((buf[0] == 0x02) && (buf[1] == 0x96)) {
> options = (buf[3] << 8) + buf[2];
> printf("little endian (options:
> %04x)\n",options);
> } else {
> printf("not an ispell dictionary!\n");
> goto NEXT1;
> }
>
> if (options & 0x01)
> printf(" #define NO8BIT\n");
> else
> printf(" #undef NO8BIT\n");
> if (options & 0x02)
> printf(" #define NO_CAPITALIZATION_SUPPORT\n");
> else
> printf(" #undef NO_CAPITALIZATION_SUPPORT\n");
> if ((options & 0x0c) == 0x00)
> printf (" #define MASKBITS 32\n");
> else if ((options & 0x0C) == 0x04)
> printf (" #define MASKBITS 64\n");
> else if ((options & 0x0C) == 0x08)
> printf (" #define MASKBITS 128\n");
> else if ((options & 0x0C) == 0x0c)
> printf (" #define MASKBITS 256\n");
>
> NEXT1:
> close(fd);
> NEXT:
> }
> exit(0);
> }
>

=====
http://linguaphile.sourceforge.net

____________________________________________________________
Do You Yahoo!?
Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
or your free @yahoo.ie address at http://mail.yahoo.ie



This archive was generated by hypermail 2b25 : Sat Jul 21 2001 - 06:25:39 CDT