Re: commit: abi: UTF8String class

From: Martin Sevior (msevior@mccubbin.ph.unimelb.edu.au)
Date: Sun Apr 21 2002 - 10:51:54 EDT

  • Next message: Martin Sevior: "Next Generation Containers."

    > >
    > > UTF-8 is great for communicating between the
    > > piecetable and the widgets. I
    > > think we should definately do this. What I don't
    > > want is for us to store
    > > our text as UTF-8 in the piecetable. We have a *LOT*
    > > of code that expects
    > > that every position in the piecetable corresponds to
    > > an extra letter of text.
    >
    > How is this going to work for languages that need
    > combining characters? Isn't it going to need to be
    > changed anyway? Isn't now the time to do this
    > re-design?

    I don't understand this. Doesn't every glyph have a unique unicode code
    point? If so we still have a one-to one mapping of glyph to text location.

    >
    > > What I think we should do is store our unicode as
    > > UT_uint32 in the
    > > piecetable which can then be randomly accessed the
    > > same way we do things now.
    >
    > To randomly access what the user sees as a character
    > or to randomly acces what is internally one codepoint?

    OK I don't understand. Are you saying that two code points in a row map to
    a different glph? If so why not just insert the code point for this glyph?

    > These are not the same. But I don't know the
    > piecetable either so maybe it is the right thing to
    > do.
    > As long as we are thinking about it.

    Certainly the structure of the code makes lots of assumptions of one
    PT_DocPosition, one glyph. If unicode was at all sane this should not be a
    problem. Are you telling me that unicode is not sane and that certain
    glyphs can only be generated if two 32 bit numbers are presented
    consecutively?

    Cheers

    Martin



    This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 10:53:02 EDT