Re: commit: abi: UTF8String class

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Apr 21 2002 - 21:37:49 EDT

  • Next message: Andrew Dunbar: "Re: commit: abi: UTF8String class"

     --- Karl Ove Hufthammer <huftis@bigfoot.com> wrote: >
    Andrew Dunbar <hippietrail@yahoo.com> wrote in
    >
    news:20020421150105.97083.qmail@web9608.mail.yahoo.com:
    >
    > > *May* map to a different glyph - but glyph is not
    > the
    > > correct term, I believe. You could have a c with
    > an
    > > acute accent and a cedilla, for instance, which
    > would
    > > need three codepoints but appear on the screen to
    > be
    > > one character. I don't have the proper definition
    > for
    > > glyph handy sorry.
    >
    > Neither do I, but I can try: A glyph is a graphical
    > presentation
    > form. I Unicode, there is neither not a one to one
    > mapping from
    > characters to glyphs, or the other way. One
    > character can displayed
    > as several glyphs and one glyph can be displayed as
    > several
    > characters. E.g. the greek letter pi and the
    > mathematical symbol
    > (usually) use the same glyph (graphical
    > presentation), but they're
    > different character. Sometimes a character is
    > displayed in
    > different ways depending on which language it is
    > used in (e.g.
    > Japanese vs. Chinese).
    >
    > But we also have combining characters in Unicode.
    > For example, to
    > write a é, you write and e, followed by a combining
    > ´. This may be
    > rendered as an e with ´ superimposed (usually looks
    > bad), but
    > usually a separate é glyph is used. Note that both,
    > é, e and the
    > combining ´ characters are defined in Unicode. This
    > is mainly for
    > backwards compatibility with older character sets
    > (e.g. ISO-8859-
    > 1). Future characters will likely not feature any
    > new pre-composed
    > characters.

    This is a major point, and why we *have to* worry
    about combining characters now.

    > Lastly, none of this has anything to do with
    > surrogate characters,
    > which completely matters even more! :)

    Well surrogate characters are only an issue when using
    UTF-16 (some UCS-2 implementations are really UTF-16).
    So as long as our iconv can handle it and we always
    convert to/from UTF-16 using iconv, there's nothing to
    worry about here.

    If we continue to use UCS-2 like we do now, then we
    really have to worry about it.

    Andrew Dunbar.

    > --
    > Karl Ove Hufthammer

    =====
    http://linguaphile.sourceforge.net http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 21:39:03 EDT