Re: BiDi and script support


Subject: Re: BiDi and script support
From: Leonard Rosenthol (leonardr@lazerware.com)
Date: Mon Apr 03 2000 - 20:31:31 CDT


At 9:35 PM +0100 4/3/00, Tomas Frydrych wrote:
>Why Bi-directional support and support for scripted languages ?
>---------------------------------------------------------------
>Currently there is no wordprocessing application that would offer
>decent and flexible support for creation of documents that require these
>capabilities, commercial or free, on any platform, or at least I am
>not really aware of any and I have been looking for one for a while :-(.

        On Windows, Dagesh is considered to be the best multi-lingual
WP, and handles both Hebrew and Arabic quite nicely - as the company
is located in Israel, you'd expect that ;). For MacOS, NisusWriter
is the best game in town, and is considered quite powerful. On
Unix, check out Yudit or Omega.

>Flexibility is the key term here, it must be possible to add support for new
>languages by the user in a simple and non-technical manner, so that even
>small ethnic groups or academics with unusual field of work could use it (the
>approach suggested below would ensure that).

        Does this imply that you wish to support languages which are
not "installed" in the OS? Or can we assume that only installed
languages are supported?

>Bi-Directional support (BiDiS) - basic concepts
>---------------------------------------------
>(2) On the view level a decision is made for each chunk of text
>whether it is to be displayed from left to right or from right to left. This
>can be done either (a) using explicit text-formating attribute, or (b) each
>font can be internally associated with direction of writing; b is the
>better option, making it more transparent to the user and thus easier to use.

        Yes, but it's not always accurate since most/all non-Roman
fonts also have the Roman set in the bottom 128 characters - and it's
common to use those for small Roman words and numbers. There are
also good reasons (related to import/export) to use a formatting
attribute, even if it's done transparently to the user when switching
fonts/keyboards/script systems.

>The messy part is soft line breaking in mixed direction paragraphs.

        That's only one issue, and actually it's not the hardest...

        Assuming that the OS will deal with "ligaturization" for us
(for languages like Arabic), then you also need to deal with
selection of mixed direction text and full justification issues.

        Text selection always happens in text primary (aka logical
order) direction, which means that as you are selecting across a line
of text the selection "area" is no longer a contiguous rectangle but
a "region". This effects not just drag selection (though it's the
most complex), but also key-based selection. There's also the issue
of the "split cursor" that many applications implement when the
insertion point is between two text runs.

        Concerning full justification, you need to be aware that
Arabic (and some other languages) do not use the "add word/character
space" techniques when fully justifying text. Instead, it involves
changing letter forms on final letters and stretching them out to
fill the remaining space.

>(Use could be made of the FreeBiDi library, but, it might
>be more practical to write the necessary algorithms from scratch and to taylor
>them to the Abi framework)

        FreeBiDi has been pretty well tested in a number of open
source projects and I see no reason that we shouldn't try to use it
"as is" as much as possible.

>In addition to the BiDi support in the view, exporters into formats
>that do not
>support rtl text (most, if not all) should ideally be modified,

        Actually, many of the important exporters already support
RTL. HTML does, RTF does, Word does and TeX does. But yes, they
will need to be modified to make sure that they output the correct
info as well.

>Scripted alphabets - suggested approach
>---------------------------------------
>
>(1) each letter is represented in the doc by a single character irrespective
>of the context. I shall call this representation the 'underlying string'. The
>appearance is handled by a rendering engine built into the view, which will
>translate the character to the glyph in the font which is contextually
>correct. I will call this representation of the text the 'surface string'. The
>underlying string is handled by the doc, the surface string by the view.

        Why not let the OS handle this?

Leonard

-- 
----------------------------------------------------------------------------
                   You've got a SmartFriend in Pennsylvania
----------------------------------------------------------------------------
Leonard Rosenthol      			Internet:       leonardr@lazerware.com
					America Online: MACgician
Web Site: <http://www.lazerware.com/>
FTP Site: <ftp://ftp.lazerware.com/>
PGP Fingerprint: C76E 0497 C459 182D 0C6B  AB6B CA10 B4DF 8067 5E65



This archive was generated by hypermail 2b25 : Mon Apr 03 2000 - 20:32:29 CDT