Re: announce of patch to support CJK in AW


Subject: Re: announce of patch to support CJK in AW
From: hj (huangj@citiz.net)
Date: Sun Oct 29 2000 - 01:13:30 CST


It's the best that all non-English country save unicode in *.abw better than
mbs. We'll display characters with several language in one document in the
future.

----- Original Message -----
发件人: Vlad Harchev <hvv@hippo.ru>
收件人: <abiword-dev@abisource.com>
抄送: Belcon <rainfall@yeah.net>; hj <huangj@citiz.net>; Chih-Wei Huang
<cwhuang@linux.org.tw>
发送时间: 2000年10月28日 3:33
主题: announce of patch to support CJK in AW

> Here is a location of the patch:
> http://www.hippo.ru/~hvv/abiword/aw-cjk.diff.gz
>
> This patch can be cleanly applied over vanilla 0.7.11 patched with the
> following patches (I hope they are still there):
> ftp://seviorpc.ph.unimelb.edu.au/pub/abi-oct24-cvs.patch.gz
> ftp://seviorpc.ph.unimelb.edu.au/pub/wv-oct24-cvs.patch.gz
>
> What's there:
> 100% of the HJ's patch logic is there. The code and logic was greatly
cleaned
> up compared to original patch and should be compilable (didn't test) on
any
> platform (HJ's patch was making use of glib in xp code). Also, the logic
of
> HJ's is disabled if current locale is not CJK one. That was extensively
tested
> with Dom's german document and (in full, all aspects) with russian.
>
> The added functionality:
> * UT_Wctomb and UT_Mbtowc are used for converting between various charsets
> from now. They use iconv internally (so now they became working, and
portable
> and also allow to chose input/output encoding). All usage of iconv
everywhere
> (except wv) correctly swaps bytes of UCS (correct order is detected at
> runtime).
>
> * Thanks to HJ, AW emits only necessary fonts to the .ps when printing. It
> reduced the size of .ps file generated by AW by 5 times for one
font-enriched
> document of me (title for some paper).
>
> * Fixed bug with spellchecker ("replace" button didn't work).
>
> * Now AW looks for more files kinda
${prefix}//AbiSuite/AbiWord/system.profile
> - also ${prefix}/AbiSuite/AbiWord/system.profile-${SUFFIX}, for the
following
> values of ${SUFFIX}:
> 'language', 'charset', 'language-Country',
'language-Country.charset'
> This allows to ship language-specific defaults (e.g. metric system or
> name of spellchecker dictionary).
>
> * As for fonts, AW now tries to load fonts from the following
subdirectories
> of ${prefix}/AbiSuite/fonts
> 'language', 'charset', 'language-Country',
'language-Country.charset'
> This should solve CJK's people problems (before this patch, AW was
looking
> only in subdirectory 'charset').
> Fonts of 'fonts.dir' format should be placed in them. Under CJK
locales,
> the file with list of fonts is also named 'fonts.dir', but it has the
same
> extended format as HJ's 'fonts.hj' has. Consider this when trying.
>
> * If GNOME_XML2 is undefined UnknownEncodingHandler is set on XML parser
> in ie_imp_XML.cpp
>
> * Some translator's names added to CREDITS.TXT
>
> * Just to underline: support for "wrap-at-any-CJK-letter" logic of layout
> is already there too (thanks to HJ).
>
> I think that this patch can be committed since it doesn't break
anything
> under non-CJK locale (at least if you did 'make clean' after applying).
>
> I ask CJK people to test the following, in the order of precedence:
> * Input of CJK letters in various charsets. It should work. Insure twice
that
> gtk+ is installed correctly, that fonts are in the font path, etc.
> Currently you have to have all CJK fonts AW uses available in fontpath
> before the start of AbiWord wrapper (it's not yet updated to look into
all
> subdirectories AW looks now).
> * Cutting and pasting immediately. If this doesn't work, then test RTF
importers
> and exporters (AW uses them internally for cut/paste). Very minor tweaks
> would be needed to make it working if it doesn't work. If
cutting/pasting
> works, try exporting/importing of RTF and testing it with other apps
(e.g.
> word).
> * Cutting/pasting to/from other apps and saving/reading plain text files.
> It should just work.
> * Saving and loading of native AW file format. Should work. If import of
.abw
> doesn't work, then try removing "encoding=FOO" from the 1st line of it.
> * Printing. Since 100% of HJ's logic is there, it should just work.
> * Export to html. Should work. If it doesn't tell what changes are needed
to
> make browsers understanding it (keep in mind that xhtml importer should
be
> able to read produced file).
> * Checking that export of CJK texts to LaTeX works if correct prologue is
> added to exported document. That prologue should be added to the tables
in
> xap_EncodingManager.cpp
> * import of CJK doc files (most probably it won't work due to wv's
singlebyte
> encoding limitations). wv should be hacked to allow importing .doc
files.
>
> What won't work with CJK text:
> * export to WML, DocBook. I just don't know how to specify charset name in
> these formats.
> * import of XHTML (html importer assumes UTF8) and DocBook.
> * export to Word. It doesn't work for Latin1 yet, so forget it.
> * No other-than-unix specific code is touch, so CJK support is in the same
> state on platforms other than unix.
> * Spellchecking of CJK texts. Does it ever makes sense? English words can
be
> spellchecked inside CJK text.
>
> Donations/fees are appreciated.
> If anybody needs it, I can provide commercial support and extension of
this
> work.
>
> Enjoy.
>
> I'm going to bed, so I won't be able to read/post/hack in next 11 hours.
>
> Best regards,
> -Vlad
>
>



This archive was generated by hypermail 2b25 : Sun Oct 29 2000 - 01:16:27 CST