Subject: i18n megapatch to AW
From: Vlad Harchev (hvv@hippo.ru)
Date: Wed Oct 04 2000 - 03:09:06 CDT
Hi,
Sorry for so late announce of this.
The very patch is here: http://www.hippo.ru/~hvv/abiword/awrus-patch.gz
It's 200K uncompressed!
I've setup a "russian AW" page and announced it on Sunday night on all
russian linux news sites that matter. There were 1200 visitors so far of
the main page. Sorry for late announce of the patch here.
Sorry, I didn't try to compile it on Windows. Also there can be problems
using patched version of AW on non-x86 platforms (with other byte order and
other word length) and also non-glibc systems.
What's done:
In short, everything international user (like russian) can dream of
(mandatory fixes, and fixes adding luxury like fixing dialogs to make them
not using Fixed containers). There are no things remaining to be done from
international user's POV. Also some fixes small fixes/improvements for
various things (only a couple, AFAIR), and translations to Russian.
List of i18n-specific changes:
1) Added ability to input keys with keysyms > 256, with converting of keysym
values to unicode
2) Remappinng of characters from unicode to X Locale for in remapGlyph for
drawing and printing them. At startup, AW looks into the subdirectory
named after name of current locale's charset for locale-specific fonts.
Locale-specific fonts can override standard fonts (if they have the same
font name, e.g. "helvetica", but XLFD's registry-encoding is equal to the
current locale's charset name). Fixed 'makewrapper.sh', 't[gb]z_install.sh'
scripts - they now check for existance of suibdirectory with
locale-specific fonts and if it's found it's also added to X server's font
path.
3) Fixed printing. Only single-byte characters are supported.
4) Fixed cutting and pasting - pasting to/from other apps works well now.
5) Nothing. Just to eat number "5" - text moved to other item :)
6) Corrected importing of RTFs. The following constucts
{\f1\froman\fcharset2{\*\fname Symbol;}MT Symbol;}
in \fonttbl are now supported (i.e. canonical name of the font inside of {})
These constructs are produced by at least Win95 russian edition. They were
crashing AW (RTFstate stack underflow). Also "helvetica" is substituted
with "helvetic" to avoid problems.
7) RTF import: Added recoding of characters of form "\'e1" from windows
codepage to unicode.
With 6) and 7) I was able to import any RTFs I can find/produce by WordPad
from W95 and by Word2000 (with various output options).
8) On export to RTF, fontname "helvetica" is used unconditionally. Due to 6)
this doesn't introduce any problems. This allows russian texts to be read
without flaws on Wordpad (on russian Windows, "helvetic" is non-russified
font, while "helvetica" is of course is).
Also, on export to "default RTF format" all unicode symbols with value
>127 are exported in the form \uc1\uXXXX\'HH (if there is character 0xHH in
windows charset being exported to, falling back to \uc0\uXXXX if it doesn't
exist) - thus allowing old apps to read these files without problems.
9) Added "RTF for old apps" format. Some broken programs don't understand
\uc1\uXXXX\'HH form (e.g. Ted, StarOffice 5.2) - so \'HH form is used (if
there is character 0xHH in windows charset being exported to, exporting
nothing if that character doesn't exist). I understand that sed can be used
for converting files from "plain RTF" to "RTF for old apps" format, but
nevertheless.
10) When saving to .abw, "charset=" attribute is added to the 1st tag of XML
and all characters are saved in native encoding if it's one-byte
encoding (i.e. raw bytes are output instead of &#XXXX; or so) - of course if
there is a character in "native encoding". Of course, support for importing
files with this format is also supported (I've tested with expat only -
but AFAIK libxml does this out of the box).
11) Same for exporting to html (characters are output in native encoding, the
name of native encoding is also saved properly in html file. So, netscape
can display russian in such html files now.
12) When exporting to .latex, also convert to native encoding and raw bytes
are output. Proper \usepackage[...]{inputenc} and \usepackage[..]{babel}
are inserted to .latex file. Now exported latex documents with russian work
out of the box.
13) Added support for converting all translations of UI elements (Menu items,
Toolbar, stringset from arbitrary encoding to native encoding). For
menuitems and toolbaritems labelsets added macro BeginSetEnc that takes
same parameters as BeginSet plus taking encoding name as last parameter.
This allows to use same set of translations (supplied in any encoding) on
all platforms and on any locales, even if they use different charsets (like
russian - cp1251 is used for it on Windows and koi8-r is mostly used on
Unix).
14) Added support for spellchecking (by fixing current ispell's code). It was
trying to use charset name "UCS-2-INTERNAL" or so, unknown to linux's glibc.
So I added workaround for glibc - "UCS2" is used for glibc. Also, when
converting between dictionary's charset and UCS-2 (in any direction), UCS2
symbols should be byteswapped to get unsigned shorts (at least for x86) -
done that.[Note: we should check whether this is needed on arches with
other byteorder, or on systems that don't use glibc's iconv (and also
whether "UCS2" is known by iconv on these systems].
Also, slightly extended a way of guessing charset of dictionary:
if there is a file with name of dictionary with -encoding prepended (e.g.
"russian.hash-encoding" for "russian.hash") it's opened at its content is
treated as name of dictionary's charset (this is much more flexible than
hardcoding names of charsets for some known langauges).
15) Proper implementations for UT_is{lower,upper,alpha} and *_tolower.
Other changes:
1) Translations to russian provided (including icons for toolbar).
2) "columns" and "font" dialogs reworked not to use hardcoded
widget positions and dimension. I gave up fixing "Paragraph" dialog - the
only one that needs fixing now - since it looks reasonable in russian.
"Insert date/time" dialog reworked to expang list of all formats to the
width of widest format. Fixed "columns" dialog - non-translated string for
"line between" was used - now the one is acquired from StringSet.
3) My patch for automatic recoloring of BW and threecolor toolbar icons's
"black color" to the color used by gtk for drawing text is also included.
The most recent version of my patch can always be downloaded from the URL
I've given.
I will announce any changes I make to the patch.
I don't have time at all to test windows version this week.
I think it's a right time to start commiting this patch (after checking
on other platforms). I don't know of any bugs with this patch on unix (but AW
probably won't compile on Windows unless slightly modified). Most changes
needed for Win32 and other platforms will be using right name for
"iconv_open", "iconv_close", "iconv" with the ones available on that
platform. Other than that, nothing should prevent AW from compiling on other
platforms. No other changes are needed to use patches AW with latin
languages.
Feel free to contact me.
Best regards,
-Vlad
This archive was generated by hypermail 2b25 : Wed Oct 04 2000 - 03:22:50 CDT