Re: commit: abi: UTF8String class

From: F J Franklin (F.J.Franklin@sheffield.ac.uk)
Date: Sat Apr 20 2002 - 06:20:06 EDT

  • Next message: Rui Miguel Silva Seabra: "Re: Ready for the Big Time!"

    > wrote: > o new UTF8String class (untested)
    >
    > If this is part of the new unicodization to support
    > full-unicode, there's some stuff we need to discuss.

    Wasn't intended as such. phearbear says QNX wants to use UTF-8 whereas
    Abi uses UCS-2 and I decided to write the UTF8String class to facilitate
    the conversion. Strings are stored internally as UTF-8 byte sequences,
    and there is a home-made iterator for accessing the string sequence by
    sequence; and a fn. for converting current sequence to UCS-4.

    Currently conversion to UTF-8 is only from UCS-2, but conversion from
    UCS-4 would be a trivial change. (I'm assuming that UCS-2 is the first
    65536 codes of UCS-4 - is this correct?)

    As a string class it's not nearly as functional as the others, but it's
    not really intended as a replacement.

    > We need to design the system so that a string is not
    > built from a series of UTF-8 (or UTF-32) characters
    > directly, but a series of "composed character" which
    > in turn are a series of UTF-8 characters, the first
    > being the main character, the remainder being zero-
    > width modifiers. We need this to support proper
    > internationalization. We probably need much
    > discussion first actually.

    Not sure I understand this. Can you explain how to use zero-width
    modifiers?

    Frank

    Francis James Franklin
    F.J.Franklin@shef.ac.uk

    "No, she really likes me. She told me I look like Britney Spears, and why
    would you say that to somebody you don't like?"
                                                               --- Elle Woods



    This archive was generated by hypermail 2.1.4 : Sat Apr 20 2002 - 06:21:02 EDT