Re: commit (HEAD): win32 glyph presence detection

From: tomasfrydrych@yahoo.co.uk
Date: Wed Jun 25 2003 - 03:26:31 EDT

  • Next message: Jody Goldberg: "Re: POW - round trip XHTML. Minimum Gnome Office requirement."

    Hi Raphael,

    > 0. I think we are confusing ligatures, precomposed characters, and
    > compound characters. All are important, but they are different.
    They are not really that different, from the point of view of a display
    engine, they work pretty much the same.

    > 1. A ligature is a way to show several characters with a single glyph.
    > ... the Unicode name typically has the word LIGATURE in it. The
    > author of a text typically needs to choose whether or not to use a
    > ligature,
    That is actually not the case; most of the LIGATURE codepoints are in
    the various Arabic pages and with these it is not the reponsibility of the
    user to manually select them. This makes the distinction between
    ligature and precomposed character somewhat arbitrary. My
    understanding of ligature is that it is an alternative _preferrable_ way of
    displaying a combination of characters provided by the font designer.
    However, I am aware that not all codepoints called LIGATURE by
    Unicode fall into this category.

    > 2. A precomposed character is a way to show a character and a combining
    > accent in a single glyph. ... If the author (or the input method)
    > chooses not to use the precomposition, then the display should use the
    > precomposition (if one exists), because it looks better, but the stored
    > text should record the separate characters
    We never modify the stored text, the ligature handling is purely on the
    visual plain.

    > It is possible to simultaneously ligate and precompose, as in the glyph
    > ײַ (י ligated to י with accent ַ: HEBREW LIGATURE YIDDISH YOD YOD PATAH).
    Which equivalent to 3-character ligature; we do not support those.

    > 5. I think AbiWord might follow this example. (a) Characters should be
    > stored exactly as they are input (or presented by an input method).
    They are.

    > (b) The default input method should include a way to explicitly
    > request ligatures and arbitrary Unicode code points.
    To allow the insertion of an arbitrary character is really the job of the
    OS, not of AbiWord (we do need to make sure though that we support
    the OS input methods fully, and there is a certainly scope for
    improvement at least on win32). The only issue here is that if you
    insert ligature glyph as a single code-point, it will behave as a single
    codepoint. The only way around this is to devise special document
    markup and devise special run type for such characters, and that is not
    something I plan to venture into in the foreseeable future.

    (c) Precomposed characters
    > should be used for the display in all cases where the current font
    > allows, but the precomposition should never be stored unless it was
    > input.
    We do this, since that's how we treat ligatures.

    (d) If a precomposition is input that is not supported by the
    > font, then it should be displayed like any other missing character.
    We do this as well.

    > personally like the method that Yudit uses for displaying missing
    > characters: Yudit shows a box containing the Unicode code point (in 4
    > hex characters, 2 in each of 2 rows).
    We will not do that, as the remapping must currently be 1 to 1.

    > 6. This solution is not perfect. The ligature fi (fi) should most likely
    > never be stored in a document, because spell checkers will not accept it
    > as equivalent to fi. But the document will look better (in many fonts)
    > if this ligature is applied. Perhaps we need the concept of a
    > display-only ligature, selectable by the user, that automatically
    > ligates certain combinations for display purposes only.
    As I said, we only ligate for display, we never modify the contents of
    the document.

    Because the Unicode naming terminology is not unambiguously
    related to how people use ligatures/precombined chars, etc., the only
    way to achieve a reasonable behaviour is heuristic. We only support 7
    ligatures for Latin alphabet, these are 0xfb00-0xfb02, 0xfb05 and the
    various ! + ? combinations. We can adjust that adding and removing
    things as needed. The same goes for other alphabets; for instance, we
    probably do not want to use the aleph-lamed ligature for Hebrew.

    Tomas



    This archive was generated by hypermail 2.1.4 : Wed Jun 25 2003 - 03:37:47 EDT