Re: Bug 1408

Subject: Re: Bug 1408
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Wed Jul 04 2001 - 08:40:31 CDT

sorted by: [ date ] [ thread ] [ subject ] [ author ]
Next message: Tamas Decsi: "hungarian updates"
Previous message: Martin Sevior: "Re: Default Zoom Preference"

Dom Lachowicz wrote:
>
> Quoting Andrew Dunbar <hippietrail@yahoo.com>:
>
> > Hi Dom. I don't know what's changed but I've updated Abi and done
> > a full rebuild. Now 1408.abw doesn't spellcheck properly.
> > It's breaking the words at 'z with hacek' etc, and garbage
> > characters appear in the spelling dialog.
> > My guess is it's related to encodings and possibly locales.
> > There is too much code that still assumes 8-bit strings are
> > going to work for everything ):
> > I'll reboot and try with a Latvian system locale.
>
> One possibility is that my recent change from '#if 0' to '#if 1' in
> UT_isWordDelimiter (ut_misc.cpp) caused your spell-checking problem. If it did,
> please revert the change. This functions sucks anyway, which is why I can't
> wait to move to Pango.

Aha! Well Windows has iswalpha(), iswupper(), and iswlower() so
I'm now calling these. Doesn't glib have some wide char functions?
Some other platforms must. Pango isn't needed for this.
Either the platforms have the correct functions or we can roll
our own ctype.h stuff via tables on unicode.org.

> If you have the time, please look at this bug, but don't go too crazy about it.

It's too late. I got too crazy about it. There are loads of places
in the spell checking that convert between 16-bit and 8-bit strings
and make incorrect assumptions. The support for encodings is much
worse - I blame the unix locale system for much of this since it
makes programmers unaware of what they really need to do. We also
have code that measures the length of a 16-bit string and allocates
a buffer of this length to write a converted 8-bit string into.
Some iconvs make the string longer to preserve accented characters.
So buffers were being overwritten. I'm trying to convert all the
char * and UT_UCSChar * functions to UT_STring and UT_UCS2String.
More and more I think we should ditch everything but UT_UCS2String
everywhere. It's a mess ):

Can somebody explain how the ispell code tried to guess the encoding
it uses? I'm finding this magic very difficult to follow. We
should pass encoding names around - we know them most of the time!

/me bangs head on keyboard... Andrew.

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Next message: Tamas Decsi: "hungarian updates"
Previous message: Martin Sevior: "Re: Default Zoom Preference"

This archive was generated by hypermail 2b25 : Wed Jul 04 2001 - 08:38:18 CDT