Re: commit: abi: UTF8String class

From: Karl Ove Hufthammer (huftis@bigfoot.com)
Date: Sun Apr 21 2002 - 11:19:23 EDT

  • Next message: Karl Ove Hufthammer: "Re: commit: abi: UTF8String class"

    Andrew Dunbar <hippietrail@yahoo.com> wrote in
    news:20020421150105.97083.qmail@web9608.mail.yahoo.com:

    > *May* map to a different glyph - but glyph is not the
    > correct term, I believe. You could have a c with an
    > acute accent and a cedilla, for instance, which would
    > need three codepoints but appear on the screen to be
    > one character. I don't have the proper definition for
    > glyph handy sorry.

    Neither do I, but I can try: A glyph is a graphical presentation
    form. I Unicode, there is neither not a one to one mapping from
    characters to glyphs, or the other way. One character can displayed
    as several glyphs and one glyph can be displayed as several
    characters. E.g. the greek letter pi and the mathematical symbol
    (usually) use the same glyph (graphical presentation), but they're
    different character. Sometimes a character is displayed in
    different ways depending on which language it is used in (e.g.
    Japanese vs. Chinese).

    But we also have combining characters in Unicode. For example, to
    write a é, you write and e, followed by a combining ´. This may be
    rendered as an e with ´ superimposed (usually looks bad), but
    usually a separate é glyph is used. Note that both, é, e and the
    combining ´ characters are defined in Unicode. This is mainly for
    backwards compatibility with older character sets (e.g. ISO-8859-
    1). Future characters will likely not feature any new pre-composed
    characters.

    Lastly, none of this has anything to do with surrogate characters,
    which completely matters even more! :)

    -- 
    Karl Ove Hufthammer
    


    This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 11:20:49 EDT