i18n megapatch to AW


Subject: i18n megapatch to AW
From: Vlad Harchev (hvv@hippo.ru)
Date: Wed Oct 04 2000 - 03:09:06 CDT


 Hi,

 Sorry for so late announce of this.
 The very patch is here: http://www.hippo.ru/~hvv/abiword/awrus-patch.gz
 It's 200K uncompressed!

 I've setup a "russian AW" page and announced it on Sunday night on all
 russian linux news sites that matter. There were 1200 visitors so far of
 the main page. Sorry for late announce of the patch here.

 Sorry, I didn't try to compile it on Windows. Also there can be problems
 using patched version of AW on non-x86 platforms (with other byte order and
 other word length) and also non-glibc systems.

         What's done:
 In short, everything international user (like russian) can dream of
 (mandatory fixes, and fixes adding luxury like fixing dialogs to make them
 not using Fixed containers). There are no things remaining to be done from
 international user's POV. Also some fixes small fixes/improvements for
 various things (only a couple, AFAIR), and translations to Russian.

 List of i18n-specific changes:

1) Added ability to input keys with keysyms > 256, with converting of keysym
  values to unicode

2) Remappinng of characters from unicode to X Locale for in remapGlyph for
   drawing and printing them. At startup, AW looks into the subdirectory
   named after name of current locale's charset for locale-specific fonts.
   Locale-specific fonts can override standard fonts (if they have the same
   font name, e.g. "helvetica", but XLFD's registry-encoding is equal to the
   current locale's charset name). Fixed 'makewrapper.sh', 't[gb]z_install.sh'
   scripts - they now check for existance of suibdirectory with
   locale-specific fonts and if it's found it's also added to X server's font
   path.

3) Fixed printing. Only single-byte characters are supported.

4) Fixed cutting and pasting - pasting to/from other apps works well now.

5) Nothing. Just to eat number "5" - text moved to other item :)

6) Corrected importing of RTFs. The following constucts
        {\f1\froman\fcharset2{\*\fname Symbol;}MT Symbol;}
  in \fonttbl are now supported (i.e. canonical name of the font inside of {})
  These constructs are produced by at least Win95 russian edition. They were
  crashing AW (RTFstate stack underflow). Also "helvetica" is substituted
  with "helvetic" to avoid problems.

7) RTF import: Added recoding of characters of form "\'e1" from windows
  codepage to unicode.
  With 6) and 7) I was able to import any RTFs I can find/produce by WordPad
  from W95 and by Word2000 (with various output options).

8) On export to RTF, fontname "helvetica" is used unconditionally. Due to 6)
  this doesn't introduce any problems. This allows russian texts to be read
  without flaws on Wordpad (on russian Windows, "helvetic" is non-russified
  font, while "helvetica" is of course is).
  Also, on export to "default RTF format" all unicode symbols with value
>127 are exported in the form \uc1\uXXXX\'HH (if there is character 0xHH in
  windows charset being exported to, falling back to \uc0\uXXXX if it doesn't
  exist) - thus allowing old apps to read these files without problems.

9) Added "RTF for old apps" format. Some broken programs don't understand
  \uc1\uXXXX\'HH form (e.g. Ted, StarOffice 5.2) - so \'HH form is used (if
  there is character 0xHH in windows charset being exported to, exporting
  nothing if that character doesn't exist). I understand that sed can be used
  for converting files from "plain RTF" to "RTF for old apps" format, but
  nevertheless.

10) When saving to .abw, "charset=" attribute is added to the 1st tag of XML
  and all characters are saved in native encoding if it's one-byte
  encoding (i.e. raw bytes are output instead of &#XXXX; or so) - of course if
  there is a character in "native encoding". Of course, support for importing
  files with this format is also supported (I've tested with expat only -
  but AFAIK libxml does this out of the box).

11) Same for exporting to html (characters are output in native encoding, the
  name of native encoding is also saved properly in html file. So, netscape
  can display russian in such html files now.

12) When exporting to .latex, also convert to native encoding and raw bytes
  are output. Proper \usepackage[...]{inputenc} and \usepackage[..]{babel}
   are inserted to .latex file. Now exported latex documents with russian work
  out of the box.

13) Added support for converting all translations of UI elements (Menu items,
  Toolbar, stringset from arbitrary encoding to native encoding). For
  menuitems and toolbaritems labelsets added macro BeginSetEnc that takes
  same parameters as BeginSet plus taking encoding name as last parameter.
  This allows to use same set of translations (supplied in any encoding) on
  all platforms and on any locales, even if they use different charsets (like
  russian - cp1251 is used for it on Windows and koi8-r is mostly used on
  Unix).

14) Added support for spellchecking (by fixing current ispell's code). It was
  trying to use charset name "UCS-2-INTERNAL" or so, unknown to linux's glibc.
  So I added workaround for glibc - "UCS2" is used for glibc. Also, when
  converting between dictionary's charset and UCS-2 (in any direction), UCS2
  symbols should be byteswapped to get unsigned shorts (at least for x86) -
  done that.[Note: we should check whether this is needed on arches with
  other byteorder, or on systems that don't use glibc's iconv (and also
  whether "UCS2" is known by iconv on these systems].
     Also, slightly extended a way of guessing charset of dictionary:
  if there is a file with name of dictionary with -encoding prepended (e.g.
  "russian.hash-encoding" for "russian.hash") it's opened at its content is
  treated as name of dictionary's charset (this is much more flexible than
  hardcoding names of charsets for some known langauges).

15) Proper implementations for UT_is{lower,upper,alpha} and *_tolower.

 Other changes:
1) Translations to russian provided (including icons for toolbar).

2) "columns" and "font" dialogs reworked not to use hardcoded
   widget positions and dimension. I gave up fixing "Paragraph" dialog - the
   only one that needs fixing now - since it looks reasonable in russian.
  "Insert date/time" dialog reworked to expang list of all formats to the
   width of widest format. Fixed "columns" dialog - non-translated string for
   "line between" was used - now the one is acquired from StringSet.

3) My patch for automatic recoloring of BW and threecolor toolbar icons's
  "black color" to the color used by gtk for drawing text is also included.

 The most recent version of my patch can always be downloaded from the URL
 I've given.
 I will announce any changes I make to the patch.
 I don't have time at all to test windows version this week.

   I think it's a right time to start commiting this patch (after checking
 on other platforms). I don't know of any bugs with this patch on unix (but AW
 probably won't compile on Windows unless slightly modified). Most changes
 needed for Win32 and other platforms will be using right name for
"iconv_open", "iconv_close", "iconv" with the ones available on that
 platform. Other than that, nothing should prevent AW from compiling on other
 platforms. No other changes are needed to use patches AW with latin
 languages.
   Feel free to contact me.
 
 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Wed Oct 04 2000 - 03:22:50 CDT