Major problem with localization design


Subject: Major problem with localization design
From: Jakub Travnik (J.Travnik@sh.cvut.cz)
Date: Wed Feb 06 2002 - 15:22:31 CST


Hello,

I got recently involved in localization of abiword to cs-CZ
(Czech) locale. (btw: It was duplicated effort, Radek Vybiral
has done this too)

The problem:

I have done full translation of strings in .po file and still
there were some untranslated texts in result. So I have
investigated it.

First, I have found that ui-backport.pl does not convert last
line in .po file (patch should be already in cvs). But this
did not explained why much more strings are missing
(in .strings file).

Example of missing entry was:
DLG_UP_All

I have thought that problem is with xgettext (called from
update.pl) because when two files it is called with (they
are created in abi/po/tmp ) are in reverse order, the
DLG_UP_All entry is in output.

Now I understand the problem, it is not with xgettext,
it is abiword problem.

Explanation:

The .strings file contains entries like this:
IDENTIFIER="localized text"

The .po file contains entries like this:
#. IDENTIFIER
# there is a comment after string '# '
# there is a identifier after '#.' by convetion
msgid "original text"
msgstr "localized text"

The problem occur when there are two entries with same
value in original header files. (they would look like
this in .strings file):
IDENT1='bla'
IDENT2='bla'

xgettext will extract only the first, the second is ignored
so .po file will contain only IDENT1 and not IDENT2

IMHO there cannot be two same msgid in .po file. What will
happen when they are there? It is the question how other
programs for .po files will beave (other tools from gettext
package, kbabel, emacs po mode).

Examples of such entries:
src/wp/ap/xp/ap_String_Id.h:
  dcl(DLG_Styles_LBL_All, "All")
src/af/xap/xp/xap_String_Id.h:
  dcl(DLG_UP_All, "All")

Note that it may be good to have ability to translate same
original string to different localised strings.
i.e. "All" can be translated to Czech language as
"Vsechny", "Vsechna", "Vsichni" (without diacritics)
depending on context. This can be distinguished only
by .strings format.

But it is inconvenient to translate in .strings because
there are no tools available (except for plain text editors,
that works always :-)

Another problem found is how to get strings from header files.
i.e.:
file abi/src/af/xap/xp/xap_String_Id.h contains:
/* Default name for new, untitled document */
dcl(MSG_ShowUnixFontWarning, "AbiWord was not able to add its fonts to the X "
"font path. Please see \"Unix Font Path Problem\" in the FAQ section of "
"Abiword help file.")

but current conversion to .po leaves only parts of it
(file abi/po/abiword.pot):
#. MSG_ShowUnixFontWarning
#: po/tmp/xap_String_Id.h.h:27
msgid "AbiWord was not able to add its fonts to the X "
msgstr ""

This can be workarounded by program in C that would include
the files but with dcl macro redefined. Such compiled program
could produce correct results.
example of such macro:
#define dcl(x,y) printf(#x "=%s\n", escapestring(y));

Abiword developer community first need to agree on solution
(that would be acceptable for all affected language
translations) before further translation can occur.

Jakub Travnik
jabber://jtra@jabber.com
sometimes irc: irc.gnome.org, #abiword, nick: jtra



This archive was generated by hypermail 2b25 : Wed Feb 06 2002 - 15:19:16 CST