Re: commit: abi: UTF8String class

From: Joaquin Cuenca Abela (cuenca@pacaterie.u-psud.fr)
Date: Sun Apr 21 2002 - 11:39:19 EDT

  • Next message: Tomas Frydrych: "Re: commit: abi: UTF8String class"

    ----- Original Message -----
    From: "Martin Sevior" <msevior@mccubbin.ph.unimelb.edu.au>
    To: "Andrew Dunbar" <hippietrail@yahoo.com>
    Cc: <abiword-dev@abisource.com>
    Sent: Sunday, April 21, 2002 4:51 PM
    Subject: Re: commit: abi: UTF8String class

    > > >
    > > > UTF-8 is great for communicating between the
    > > > piecetable and the widgets. I
    > > > think we should definately do this. What I don't
    > > > want is for us to store
    > > > our text as UTF-8 in the piecetable. We have a *LOT*
    > > > of code that expects
    > > > that every position in the piecetable corresponds to
    > > > an extra letter of text.
    > >
    > > How is this going to work for languages that need
    > > combining characters? Isn't it going to need to be
    > > changed anyway? Isn't now the time to do this
    > > re-design?
    >
    > I don't understand this. Doesn't every glyph have a unique unicode code
    > point? If so we still have a one-to one mapping of glyph to text location.
    >
    > >
    > > > What I think we should do is store our unicode as
    > > > UT_uint32 in the
    > > > piecetable which can then be randomly accessed the
    > > > same way we do things now.
    > >
    > > To randomly access what the user sees as a character
    > > or to randomly acces what is internally one codepoint?
    >
    > OK I don't understand. Are you saying that two code points in a row map to
    > a different glph? If so why not just insert the code point for this glyph?
    >
    > > These are not the same. But I don't know the
    > > piecetable either so maybe it is the right thing to
    > > do.
    > > As long as we are thinking about it.
    >
    > Certainly the structure of the code makes lots of assumptions of one
    > PT_DocPosition, one glyph. If unicode was at all sane this should not be a
    > problem. Are you telling me that unicode is not sane and that certain
    > glyphs can only be generated if two 32 bit numbers are presented
    > consecutively?

    Martin, the problem here is that the "English/European/..." languages has a
    very little pack of glyphs to show, so you can do the 1 codepoint -> 1 glyph
    mapping for these languages and nobody will complain (for instance, in fonts
    you usually have different glyphs for accented characters for these
    languages).

    But if you try to do the same thing with other languages the number of
    glyphs that you need will literaly explose.

    If Abiword should become "The Word Processor", to everybody, whichever SO or
    language uses, this assumption should be removed from the code.

    Cheers,

    --
    Joaquin Cuenca Abela
    cuenca@pacaterie.u-psud.fr
    


    This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 11:36:49 EDT