Re: commit: abi: UTF8String class

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Apr 21 2002 - 21:37:49 EDT

Next message: Andrew Dunbar: "Re: commit: abi: UTF8String class"

Previous message: Martin Sevior: "IMPORTANT - PLEASE HELP! Re: plugins, build fails with unresolved external symbols"
In reply to: Karl Ove Hufthammer: "Re: commit: abi: UTF8String class"
Next in thread: Joaquin Cuenca Abela: "Re: commit: abi: UTF8String class"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

--- Karl Ove Hufthammer <huftis@bigfoot.com> wrote: >
Andrew Dunbar <hippietrail@yahoo.com> wrote in
>
news:20020421150105.97083.qmail@web9608.mail.yahoo.com:
>
> > *May* map to a different glyph - but glyph is not
> the
> > correct term, I believe. You could have a c with
> an
> > acute accent and a cedilla, for instance, which
> would
> > need three codepoints but appear on the screen to
> be
> > one character. I don't have the proper definition
> for
> > glyph handy sorry.
>
> Neither do I, but I can try: A glyph is a graphical
> presentation
> form. I Unicode, there is neither not a one to one
> mapping from
> characters to glyphs, or the other way. One
> character can displayed
> as several glyphs and one glyph can be displayed as
> several
> characters. E.g. the greek letter pi and the
> mathematical symbol
> (usually) use the same glyph (graphical
> presentation), but they're
> different character. Sometimes a character is
> displayed in
> different ways depending on which language it is
> used in (e.g.
> Japanese vs. Chinese).
>
> But we also have combining characters in Unicode.
> For example, to
> write a é, you write and e, followed by a combining
> ´. This may be
> rendered as an e with ´ superimposed (usually looks
> bad), but
> usually a separate é glyph is used. Note that both,
> é, e and the
> combining ´ characters are defined in Unicode. This
> is mainly for
> backwards compatibility with older character sets
> (e.g. ISO-8859-
> 1). Future characters will likely not feature any
> new pre-composed
> characters.

This is a major point, and why we *have to* worry
about combining characters now.

> Lastly, none of this has anything to do with
> surrogate characters,
> which completely matters even more! :)

Well surrogate characters are only an issue when using
UTF-16 (some UCS-2 implementations are really UTF-16).
So as long as our iconv can handle it and we always
convert to/from UTF-16 using iconv, there's nothing to
worry about here.

If we continue to use UCS-2 like we do now, then we
really have to worry about it.

Andrew Dunbar.

> --
> Karl Ove Hufthammer

=====
http://linguaphile.sourceforge.net http://www.abisource.com

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Next message: Andrew Dunbar: "Re: commit: abi: UTF8String class"
Previous message: Martin Sevior: "IMPORTANT - PLEASE HELP! Re: plugins, build fails with unresolved external symbols"
In reply to: Karl Ove Hufthammer: "Re: commit: abi: UTF8String class"
Next in thread: Joaquin Cuenca Abela: "Re: commit: abi: UTF8String class"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 21:39:03 EDT