Re: Problem with xkb


Subject: Re: Problem with xkb
From: Vlad Harchev (hvv@hippo.ru)
Date: Mon Sep 25 2000 - 03:05:33 CDT


On Mon, 25 Sep 2000, Henrik Berg wrote:

> > From: "Vlad Harchev" <hvv@hippo.ru>
> >
> > My quick analysis of abiword and already existing hackish solution to allow
> > input of cyrillic in it brings the following notes and questions:
> >
> > 1) AW holds all characters in unicode. But currently the code works only for
> > latin1 symbols since there is no explicit recoding from/to encoding of
> > current locale, that is needed in following cases:
>
> The XP code supports at least other 8bit codepages apart from latin1. So to
>make it work look into each platform. In Windows (at least before) there is
>locale keyboard encodeing and output to display and printer. This is not
>helping Unix platform, since most of the windows code only uses Win32 API.

 Seems the thing should be done properly, inspite of how much effort they will
require.

> > 1.1) when text is typed in - the characters should be converted from
> > current locale encoding to unicode. Same when pastingfrom
> > clipboard.
> > 1.2) when text is drawn out - the unicode text (from internal
> > representation) should be converted to text in current locale
> > so that XLib will choose correct glyphs from the font. Same for
> > cutting to clipboard.
> > 1.3) The text that is input into dialogs like "Find" - it should also
> > be converted to unicode (so that unicode text will be searched
> > for, notin current locale's encoding).
> > 1.4) When printing - unicode text should be converted to font's
> > encoding.
> > 1.5) When exporting/importing - but that's another story.
>
> When Abi files are saved, the for >255 characters are encoded into &#x????;
>instead of UFT8 or something else. This is not good for a document containing
>100% cyrillic characters. The size would be EIGHT times the number of chars,
>instaed on TWO times using UFT8.

 Yes, it's a pity - but the only solution to reduce size is to add an
attribute like "charset" to AW's file format. In this case, files in locale
encoding could be saved - so each cyrillic character will take exactly 1
byte when saved. All Unicode characters that can't be represented as
character in current locale will be written as &#x????; This is more
preferable IMO than UTF8 since it allows grep and sed for native text to be
used.
  Any chance to extend AW's file format this way? (Say if somebody provided
patch for this)?

> --hb
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Mon Sep 25 2000 - 03:18:42 CDT