Re: commit: abi: UTF8String class

From: Joaquin Cuenca Abela (cuenca@pacaterie.u-psud.fr)
Date: Sun Apr 21 2002 - 11:39:19 EDT

Next message: Tomas Frydrych: "Re: commit: abi: UTF8String class"

Previous message: Karl Ove Hufthammer: "Re: abiword dtd"
In reply to: Martin Sevior: "Re: commit: abi: UTF8String class"
Next in thread: Leonard Rosenthol: "Pango? (was Re: commit: abi: UTF8String class)"
Next in thread: Tomas Frydrych: "Re: commit: abi: UTF8String class"
Next in thread: Joaquin Cuenca Abela: "Re: commit: abi: UTF8String class"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

----- Original Message -----
From: "Martin Sevior" <msevior@mccubbin.ph.unimelb.edu.au>
To: "Andrew Dunbar" <hippietrail@yahoo.com>
Cc: <abiword-dev@abisource.com>
Sent: Sunday, April 21, 2002 4:51 PM
Subject: Re: commit: abi: UTF8String class

> > >
> > > UTF-8 is great for communicating between the
> > > piecetable and the widgets. I
> > > think we should definately do this. What I don't
> > > want is for us to store
> > > our text as UTF-8 in the piecetable. We have a *LOT*
> > > of code that expects
> > > that every position in the piecetable corresponds to
> > > an extra letter of text.
> >
> > How is this going to work for languages that need
> > combining characters? Isn't it going to need to be
> > changed anyway? Isn't now the time to do this
> > re-design?
>
> I don't understand this. Doesn't every glyph have a unique unicode code
> point? If so we still have a one-to one mapping of glyph to text location.
>
> >
> > > What I think we should do is store our unicode as
> > > UT_uint32 in the
> > > piecetable which can then be randomly accessed the
> > > same way we do things now.
> >
> > To randomly access what the user sees as a character
> > or to randomly acces what is internally one codepoint?
>
> OK I don't understand. Are you saying that two code points in a row map to
> a different glph? If so why not just insert the code point for this glyph?
>
> > These are not the same. But I don't know the
> > piecetable either so maybe it is the right thing to
> > do.
> > As long as we are thinking about it.
>
> Certainly the structure of the code makes lots of assumptions of one
> PT_DocPosition, one glyph. If unicode was at all sane this should not be a
> problem. Are you telling me that unicode is not sane and that certain
> glyphs can only be generated if two 32 bit numbers are presented
> consecutively?

Martin, the problem here is that the "English/European/..." languages has a
very little pack of glyphs to show, so you can do the 1 codepoint -> 1 glyph
mapping for these languages and nobody will complain (for instance, in fonts
you usually have different glyphs for accented characters for these
languages).

But if you try to do the same thing with other languages the number of
glyphs that you need will literaly explose.

If Abiword should become "The Word Processor", to everybody, whichever SO or
language uses, this assumption should be removed from the code.

Cheers,

--
Joaquin Cuenca Abela
cuenca@pacaterie.u-psud.fr

Next message: Tomas Frydrych: "Re: commit: abi: UTF8String class"
Previous message: Karl Ove Hufthammer: "Re: abiword dtd"
In reply to: Martin Sevior: "Re: commit: abi: UTF8String class"
Next in thread: Leonard Rosenthol: "Pango? (was Re: commit: abi: UTF8String class)"
Next in thread: Tomas Frydrych: "Re: commit: abi: UTF8String class"
Next in thread: Joaquin Cuenca Abela: "Re: commit: abi: UTF8String class"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sun Apr 21 2002 - 11:36:49 EDT