Re: AbiWord Chinese version of Linux


Subject: Re: AbiWord Chinese version of Linux
From: hj (huangj@citiz.net)
Date: Fri Mar 31 2000 - 21:45:40 CST


----- Original Message -----
发件人: Paul Rohr <paul@abisource.com>
收件人: hj <huangj@citiz.net>; patches <patches@abisource.com>; abiword-dev
<abiword-dev@abisource.com>
发送时间: 2000年3月31日 9:46
主题: Re: AbiWord Chinese version of Linux

> At 10:30 AM 3/27/00 +0800, hj wrote:
> > Top level window not support XIM. But s_ic and s_ic_attr must be
static
> >member. It will cause segment fault if I change to non-static. I don't
know
> >why.
> > All Chinese and English Characters are encoded in unicode in abw.
> >European languages are not encoded in unicode. In furture we display
> >different languages in one document. So unicode encoding is needed.If you
> >replace fonts.hj with european languages, Characters are unicode in abw.
> > Chinese font files are too large to ship. I don't distribute Chinese
> >fonts. I create a file "fonts.hj" in AbiWord font file that include
Chinese
> >printing font name, XLFD, printing font ascent, printing font descent and
> >printing font width.
> > All unixfonts are created as fontset not font. It can display both
> >English and Chinese Character. Printing program can print both English
and
> >Chinese Character.
> > We must resolve that keyval will be 0xffffff when I input Chinese
with
> >XIM. Chinese strings are stored in string not in keyval.
>
> Thanks for the patch. I'm very very impressed at how you've tackled
issues
> throughout the tree to get Chinese working for you on Linux. My goal now
is
> to figure out how to integrate the work you've done with the work that
will
> be needed to add true Unicode support for other languages and/or
platforms.
>
> At this point, I'd like feedback from other developers in the following
two
> areas:
>
> - people working on related i18n issues (Henrik Berg, Vadim Frolov)
> - a random GTK expert or two
>
> As soon as we've got some consensus that you all are heading in the same
> direction, we can start getting some or all of this code checked in.
>
> To get the discussion rolling, here are some observations (in no
particular
> order):
>
> 0. do you have a screen shot?
> ------------------------------
> I'd totally love to *see* your version running.
>
> 1. UI translation
> ------------------
> It's really cool to see that you've already translated most of the UI.
I'm
> presuming that the hex-encoded characters map directly to the appropriate
> Unicode characters, and not some other charset, right?

Chinese characters are MB in ap_Menu_LabelSet_ZhCN.h and
ap_TB_LabelSet_ZhCN.h.
Chinese characters are unicode in ZhCN.strings.

>
> src/wp/ap/xp/ap_Menu_LabelSet_Languages.h
> src/wp/ap/xp/ap_Menu_LabelSet_ZhCN.h
> src/wp/ap/xp/ap_TB_LabelSet_Languages.h
> src/wp/ap/xp/ap_TB_LabelSet_ZhCN.h
> user/wp/strings/ZhCN.strings
>
> How bad was it to do all the editing to generate an 8859-1 encoding of the
> strings file? Would it have been easier for you to use one of expat's
other
> supported encodings instead?
>
> http://www.jclark.com/xml/expatfaq.html
>
> For example, you can directly export UTF8 files from AbiWord. :-)
>
> 2. XIM on frame
> ----------------
> Thanks for digging out the GTK apis for XIM support. Is there anything
we'd
> need to know to make these changes work for other languages besides
Chinese?

XIM supports all other languages.

>
> src/af/xap/unix/xap_UnixFrame.cpp
> src/af/xap/unix/xap_UnixFrame.h
>
> Also, could you elaborate on what problems you were seeing with non-static
> ICs?

Just segment fault. There's no difference between static or non-static if
only invoke frame one time.

> Perhaps someone else on the list might be able to help.
>
> 3. coding style
> ----------------
> It looks like there are a number of places where you added files and/or
> functions, all of which had your initials as a prefix. Do you want your
> code to stand out like this, or was that just to make it easier to read
the
> patch?
>
> (We generally tend to try to write code so it all blends in together.
That
> way, you have to use Bonsai's cvsblame tool to see who was responsible for
a
> given line of code.)
>
> 4. files to ignore
> -------------------
> I noticed that there were a bunch of files in your patch which included
> changes which probably shouldn't be checked in. For example,
>
> src/af/xap/Makefile
> src/af/xap/unix/xap_UnixDlg_About.cpp
>
> In addition, a bunch of spurious diffs were generated by RCS_ID
variations.
> (Does anyone know of an option to suppress these?)
>
> 5. some languages don't ever get spell-checked
> -----------------------------------------------
> I also noticed that you've implemented quick hacks to avoid spell-checking
> chinese content.
>
> src/text/fmt/xp/fl_BlockLayout.cpp
> src/wp/ap/xp/ap_Dialog_Spell.cpp
>
> Is there a more general way to do this check? Do we want to explicitly
tag
> content by language (via the lang attribute), or will it be enough to just
> ignore certain Unicode ranges?

We should tag content by language.

>
> 6. pairing unrelated fonts
> ---------------------------
> This one's going to sound pretty ignorant, so please forgive me.
>
> I'm not sure I completely understand why you've implemented the logic to
> pair up English and Chinese fonts as if they were the same font (as far as
> the UI is concerned).
>
> src/af/xap/unix/xap_UnixFont.cpp
> src/af/xap/unix/xap_UnixFont.h
> src/af/xap/unix/xap_UnixFontManager.cpp
> src/af/xap/unix/xap_UnixFontManager.h
> src/af/xap/unix/xap_UnixPSGraphics.cpp
> src/af/xap/unix/xap_UnixPSGraphics.h
>
> I'm used to using WYSIWYG editors, where users choose to use one font at a
> time, switching to others as needed. Any time you use a character which
> isn't provided in that font, you get a slug character.

I do it just to not display slug characters. I think AbiWord should know
which language the unicoded character is in the future. And it could select
font automatically. It cann't affect Chinese character if I select Times New
Roman because of Times New Roman is English font. Printing are also.

>
> From what little I know of fontsets, the idea is that you explicitly
> assemble a collection of overlapping fonts and give that *set* of fonts a
> name. IIRC, GTK has mechanisms to do this, but I'm not sure whether that
> helps you much, since you have to generate PS output, too.
>
> (It's bad enough to do a 1-to-1 WYSIWYG mapping between screen fonts and
> printer fonts. Mapping collections of fontsets sounds like a nightmare.)
>
> Again, my goal here is to understand how to take what you've done and use
it
> to solve similar problems for other languages.
>

> 7. multibyte / wide character conversions
> ------------------------------------------
> I suspect that this stuff is likely to be the most controversial. There
are
> a number of places in the code where you've introduced locale-specific
> variants of UCS <--> char conversions via mbtowc() and wctomb().
>
> mbtowc
> ------
> src/af/ev/unix/ev_UnixKeyboard.cpp
>
> wctomb
> ------
> src/af/gr/unix/gr_UnixGraphics.cpp
>
> UCS <--> char (via wc/mb)
> -------------
> src/af/util/Makefile
> src/af/util/xp/Makefile
> src/af/util/xp/hj.cpp
> src/af/util/xp/hj.h
> src/af/util/xp/hj_mbtowc.cpp
> src/af/util/xp/hj_mbtowc.h
> src/af/util/xp/hj_wctomb.cpp
> src/af/util/xp/hj_wctomb.h
>
> src/text/fmt/xp/fp_TextRun.cpp
> src/wp/ap/unix/ap_UnixDialog_Replace.cpp
> src/wp/ap/xp/ap_EditMethods.cpp
>
> To be honest, I'm not sure how this approach compares to the
iconv-oriented
> stuff which Henrik and Vadim have been working on. I'm sure you're each
> working on real problems, but I frankly don't understand enough about what
> any of you are doing to be able to judge the merits of each approach.
>
> Could the three of you start a discussion to help get ignorant Americans
> like me up to speed? ;-)

mbtowc() and wctomb() are same as iconv. But mbtowc and wctomb just support
native language MB character <--> unicode. iconv can do other language MB
character<--> unicode.

>
> 8. should plain text be anything other than ASCII?
> ---------------------------------------------------
> On a similar note, it looks like you've extended a bunch of logic which
> currently reads Latin-1 files to also handle other encodings, albeit in a
> locale-specific way.
>
> src/af/xap/xp/xap_Strings.cpp
> src/wp/ap/xp/ap_Strings.cpp
> src/wp/impexp/xp/ie_exp_Text.cpp
> src/wp/impexp/xp/ie_imp_MsWord_97.cpp
> src/wp/impexp/xp/ie_imp_Text.cpp
>

MB characters are in text file. Text file denpends on locale. All others
are unicoded so that they are portable files.

> This makes me kind of nervous, because it means that the actual contents
of
> the files being read and written are interpreted as being in different
> charsets, depending on your locale settings at runtime.
>
> Up until now, we've been striving to create totally-portable files, which
> are always in the same encoding no matter where you read or write them.
> (Thus, for example, note how we've differentiated 7-bit text files from
UTF8
> text files.)
>
> bottom line
> -----------
> You've obviously put a lot of hard work into this patch, and I really
really
> want to be able to start bragging about the fact that we support Chinese
on
> at least one platform. That's *so* cool!
>
> To be honest, I'm not sure that all of the issues I've mentioned above are
> actually real. However, at the moment, I don't know enough to be able to
> decide how much of this patch to integrate into the tree.

XIM support can add into the tree. Others will be wait.

>
> Could the various folks working on i18n issues help clear up some of my
> confusion here?
>
> Thanks,
> Paul



This archive was generated by hypermail 2b25 : Sat Apr 01 2000 - 04:05:39 CST