a thought -- fonts get us a long, long way

From: Paul Rohr (paul@abisource.com)
Date: Mon Apr 29 2002 - 20:04:40 EDT

  • Next message: Martin Sevior: "Re: branch tonight"

    One offshoot of the whole i18n/Pango discussion recently is that it finally
    dawned on me just how powerful our *existing* Unicode support in 1.0 already
    is -- without BiDi or Pango.

    Provided that users can locate appropriate fonts, that is.

    It might be helpful to segregate the languages we support into the following
    broad categories:

      1. easy
      2. easy, with the right font
      3. bidi
      4. complex shaping required (including combining characters)

    As the World.abw test document demonstrates, there are a *lot* of languages
    which fall into the first two categories.

    the "just fonts" languages
    --------------------------
    Not only are there thirty-some Latin-1 languages which definitely fall into
    the first category (most fonts support them), but some of the small,
    general-purpose Unicode fonts being deployed add "just enough" glyphs to
    support an even broader range of languages.

      http://www.abisource.com/mailinglists/abiword-dev/02/Apr/1036.html

    Indeed, after doing some more digging, we can support content in many more
    languages by just locating a font that includes enough glyphs in the
    appropriate Unicode range.

      http://www.alanwood.net/unicode/fonts.html

    For example, the government of Nunavut has recently created Unicode fonts
    for Inuktitut:

      http://www.assembly.nu.ca/unicode/fonts/
      http://www.assembly.nu.ca/unicode/fonts/beginner.html

    I can't read them, of course, but they sure look pretty. :-)

    the "harder" languages
    ----------------------
    Of course, there *are* languages for which we'll need more than just fonts.
    For example, Tomas has hand-coded a lot of support for bidi languages, a
    category which includes:

      ar, fa, he, ur, yi

    Now we're investigating Pango since, in addition to BiDi support, it should
    (eventually) encapsulate knowledge about the more complex typographic needs
    of languages which don't have discrete Unicode codepoints for all of the
    glyphs needed. Andrew keeps mentioning Vietnamese (vi-VN), and I know that
    other South Asian languages need this, but how extensive is the rest of this
    category?

    the question
    ------------
    OK, i18n experts ... is this a useful, clean distinction? If not, please
    let me know what I've garbled here.

    bottom line
    -----------
    I'm thrilled that we've got dedicated folks working on solving the "harder"
    language problems. However, I'd love to see some folks do more research on
    improving our support for "just fonts" languages as follows:

      - come up with a complete list of such languages
      - come up with a list of the fonts needed to support each of them

    Note that this is essentially a web research task, not a coding task. The
    ultimate goal would be to learn enough so that we could write a quick
    website entry for each language, telling users:

      - who's responsible for the translation
      - where to find dictionaries (if any)
      - where to find fonts
      - etc.

    For example, two sample entries might be

      Indonesian (id-ID)
      ------------------
      translators: Tim Allen, ...
      dictionary: (n/a)
      fonts: ...
      sample: (the UTF-8 gobbledygook from World.abw)
      picture: (screenshot of the same)

      Inuktitut (iu-CA)
      -----------------
      translators: (n/a)
      dictionary: (n/a)
      fonts: http://www.assembly.nu.ca/unicode/fonts/
      sample: (the UTF-8 gobbledygook from World.abw)
      picture: (screenshot of the same)

    Best of all, this could increase our language support for the 1.0.* series
    of products, while waiting for all the hard coding work to get done for the
    set of other languages which actually *do* need BiDi and/or Pango.

    Does this sound interesting? Is anyone interested in coordinating such an
    effort? It seems like a large task to write up as a uPOW.

    Paul



    This archive was generated by hypermail 2.1.4 : Mon Apr 29 2002 - 20:05:31 EDT