Re: i18n of abiword -- line break algorithms


Subject: Re: i18n of abiword -- line break algorithms
From: Leonard Rosenthol (leonardr@lazerware.com)
Date: Fri Jan 14 2000 - 17:36:48 CST


At 1:13 PM -0800 1/14/00, Paul Rohr wrote:
>As pointed out in Pruet's original message, Thai can't use the existing
>whitespace-oriented fb_LineBreaker. I don't know whether any other
>languages share this problem.

        Chinese, Japanese and Korean all are non-whitespace breakers.
They use contextual information (like Thai) to know when to break.

>As Justin points out, our current implementation lacks a hyphenation API.
>If we had one, then in addition to our current word-oriented LineBreaker,
>we'd probably also have some WordBreaker classes which knew how and when to
>do hyphenation.

        You also need a WordBreaker class for many other things such
as spell checking, and "whole word" searches.

>However, I think we will want to have it be a LineBreaker (which detects
>word boundaries). Otherwise, how would we implement the word-level
>selection and motion primitives?
>
        Exactly.

        One thing you might want to think about here is taking
advantage of platform-centric services for this kind of thing.
Windows and MacOS (and BeOS, I believe) all provide API routines for
determining word & line breaking for a given text run in a particular
language/code page. I don't know if Unix offers the same...

Leonard
----------------------------------------------------------------------------
                   You've got a SmartFriend in Pennsylvania
----------------------------------------------------------------------------
Leonard Rosenthol Internet: leonardr@lazerware.com
                                        America Online: MACgician
Web Site: <http://www.lazerware.com/>
FTP Site: <ftp://ftp.lazerware.com/>
PGP Fingerprint: C76E 0497 C459 182D 0C6B AB6B CA10 B4DF 8067 5E65



This archive was generated by hypermail 2b25 : Fri Jan 14 2000 - 17:51:44 CST