single or multiple internal encodings

From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Tue Apr 23 2002 - 10:28:03 EDT

  • Next message: Tomas Frydrych: "Re: Pango? (was Re: commit: abi: UTF8String class)"

    > GUI: input comes in in whatever format the toolkit defines (GTK+ -
    > utf8, Win32: ucs2?). iconv() to convert that back to the backend. For
    > the times when we need strings/data from the backend, convert it to
    > whatever format the front-end needs.

    This seems to the most logical approach. Most of the gui strings
    are static, they need only one translation, at load time. The
    interaction between the GUI and backend is virtually nil, save
    keyboard input, but considering human typing speeds conversion
    to a different encoding is not a reall performance issue.

    > Disk/file: Do we want this to be the same encoding as in the PT? Do we
    > want to store this in a user's native locale? Do we want to store this
    > in UTF-8 everywhere and be done with it? There are darn good reasons
    > for all of these choices. Let's argue out their merits.

    Currently are not consistent. The Win32 build uses utf-8, Linux the
    encoding of current locale. The advantage of using the locale
    encoding is the size of the file, for unless you use only characters
    from basic ASCII, utf-8 needs at least two bytes for each. The other
    advantage of using the locale encoding is that the user can
    view/search, etc. the raw files. This is quite important to a number
    of users, and I think we should retain this. What, however, I would
    like to see, is for the user to be able to change any default to an
    encoding to his or her choice.

    > Piece Table: Do we use fixed or variable width encodings? I'd probably
    > be in favor of fixed-width encodings (UCS-2 or 32). Is processing
    > UTF-8 computentionally intensive? Probably a little, but nothing
    > outrageous. Will picking a fixed-width encoding just screw us over
    > down the road? Beats me.
    I strongly favour fixed width encoding, because it is so much easier
    to handle. Variable width would save some memory for some users
    and vaste it for others. Picking a fixed-width encoding will not limit
    us in any way in the future, and nor will variable-width encoding --
    they are just encodings, that's all.

    Tomas



    This archive was generated by hypermail 2.1.4 : Tue Apr 23 2002 - 10:33:54 EDT