Re: commit: abi: UTF8String class

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Apr 21 2002 - 22:29:09 EDT

  • Next message: Leonard Rosenthol: "Re: Pango? (was Re: commit: abi: UTF8String class)"

     --- Scott Rushfeldt <sirushfe@unity.ncsu.edu> wrote:
    >
    > ----- Original Message -----
    > From: "Karl Ove Hufthammer" <huftis@bigfoot.com>
    > To: <abiword-dev@abisource.com>
    > Sent: Sunday, April 21, 2002 11:19 AM
    > Subject: Re: commit: abi: UTF8String class
    >
    >
    > > Andrew Dunbar <hippietrail@yahoo.com> wrote in
    > >
    >
    news:20020421150105.97083.qmail@web9608.mail.yahoo.com:
    > >
    > > > *May* map to a different glyph - but glyph is
    > not the
    > > > correct term, I believe. You could have a c
    > with an
    > > > acute accent and a cedilla, for instance, which
    > would
    > > > need three codepoints but appear on the screen
    > to be
    > > > one character. I don't have the proper
    > definition for
    > > > glyph handy sorry.
    > >
    > > Neither do I, but I can try: A glyph is a
    > graphical presentation
    > > form. I Unicode, there is neither not a one to one
    > mapping from
    > > characters to glyphs, or the other way. One
    > character can displayed
    > > as several glyphs and one glyph can be displayed
    > as several
    > > characters. E.g. the greek letter pi and the
    > mathematical symbol
    > > (usually) use the same glyph (graphical
    > presentation), but they're
    > > different character. Sometimes a character is
    > displayed in
    > > different ways depending on which language it is
    > used in (e.g.
    > > Japanese vs. Chinese).
    > >
    > > But we also have combining characters in Unicode.
    > For example, to
    > > write a é, you write and e, followed by a
    > combining ´. This may be
    > > rendered as an e with ´ superimposed (usually
    > looks bad), but
    > > usually a separate é glyph is used. Note that
    > both, é, e and the
    > > combining ´ characters are defined in Unicode.
    > This is mainly for
    > > backwards compatibility with older character sets
    > (e.g. ISO-8859-
    > > 1). Future characters will likely not feature any
    > new pre-composed
    > > characters.
    > >
    > > Lastly, none of this has anything to do with
    > surrogate characters,
    > > which completely matters even more! :)
    > >
    > > --
    > > Karl Ove Hufthammer
    > >
    >
    >
    > I don't claim to be an expert on Unicode, but
    > from what I've read might
    > it work to store strings in the piece table as
    > arrays of glyph objects, with
    > each glyph object containing all the UTF-8
    > characters necessary to define
    > the glyph to be displayed. This (from what I
    > understand of Unicode) would
    > allow the piece table to maintain its random access
    > of glyphs, and still
    > allow UTF-8 character combination. I don't know how
    > this would effect
    > proccessing speed, but it should be possible to
    > eliminate calculations
    > within the glyph class at all times except when the
    > glyph is being edited.
    > This could be done by storing what glyph should be
    > displayed for each glyph
    > object(based on the UTF-8 characters contained in
    > the object). This would
    > allow the glyphs to be accessed even faster by the
    > piece table, and the
    > glyph would only be changed if a particular glyph
    > object is edited(I not

    This is pretty much what I was thinking and I think
    it's what is recommended by the IBM ICU (IBM Classes
    for Unicode). But I'm slightly worried that we would
    have to connect these small arrays with a myriad of
    pointers which would eat memory and dereferencing them
    when scanning through parts of the piecetable might
    add
    too much extra cost. Again my disclaimer about not
    knowing enough about the piecetable though.

    > sure how often this would happen). Hope this makes
    > some sense since I am
    > still very new to abiword and am not yet comfortable
    > enough with the code
    > IMO to do debugging. If this does seem liike an
    > interesting idea, but my
    > thinking on some point seems confusing please email
    > me or the list(I am
    > suscribed) and I would be happy to discuss my
    > thinking.

    Makes sense to me, thanks (:

    Andrew Dunbar.

    > Happy Coding,
    >
    > Scott Rushfeldt
    > sirushfe@unity.ncsu.edu
    >

    =====
    http://linguaphile.sourceforge.net http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 22:30:16 EDT