notes on handling combining characters

From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Thu Aug 08 2002 - 09:55:12 EDT

  • Next message: Tomas Frydrych: "commit: PP_Property.cpp/h, fv_View.cpp"

    A 100% correct placing of combining marks can only be achieved
    with font technologies and rasterizers designed to do so, and sadly
    at the moment we are not in position to handle combining marks
    100% satisfactorily. In particular, I am disappointed with the results
    from the standard win32 API (e.g., on simple English win98), which
    is not
    up to the job (for instance Hebrew vowel points are handled as
    ordinary
    characters, placed NEXT to the base glyph).

    After experimenting a bit, I have come up with an approach that will
    allow
    us to handle combining characters in a way that will not be perfect
    but
    acceptable most of the time. I have made changes to the
    UT_isOverstriking
    function, so that we can distinguish between three types of
    combining
    marks: those centered over the base character (most), those
    appended to
    the right edge of the base character and those appended to the left
    edge
    of the base character.

    The platform specific ::measureUnremappedChar() function has to
    handle these three types differently. For the centered marks, it
    should return width of the glyph, but as a negative number. For
    those appended to the right edge it should return 0, and for those
    appended to the left edge, it should return 'width |
    GR_OC_LEFT_FLUSHED' (width as a positive number).

    The graphics function can determine which type of character it is
    dealing with by calling UT_isOverstrikingChar() and if the return
    value != UT_NOT_OVERSTRIKING it then needs to &=
    UT_OVERSTRIKING_TYPE and thest the result against the three
    types as defined in ut_OverstrikingChars.h, e.g.,

    UT_UCS4Char c;
    ...
    UT_uint32 iOver = UT_isOverstrikingChar(c);

    if(iOver != UT_NOT_OVERSTRIKING)
    {
     iOver &= UT_OVERSTRIKING_TYPE;

     switch(iOver)
     {
      case UT_OVERSTRIKING_LEFT:
       width = glyph_width | GR_OC_LEFT_FLUSHED;
       break;

      case UT_OVERSTRIKING_RIGHT:
       width = 0;
       break;

      case UT_OVERSTRIKING_CENTRE:
       width = -glyph_width;
       break;
     }
    }
    else
    {
     // normal char processing
    }

    For the whole thing to work, no character can be wider than
    GR_OC_MAX_WIDTH, 0x3FFFFFFF; that is not unreasonable, but
    assert to that effect should be inculde in the
    ::measureUnremappedChar() functions.

    Tomas



    This archive was generated by hypermail 2.1.4 : Thu Aug 08 2002 - 10:00:02 EDT