From: Jordi Mas (jmas@softcatala.org)
Date: Mon Oct 27 2003 - 16:31:06 EST
Hello,
I may not be very accurate, I just want to bring attention in a problem that
we had around for a few time.
We had an unsolved issue for a long time. It is proper word breaking. Right
now, we are just assuming that words break as they basically do in English
language. However, this is not true for many languages.
This has to problems:
- Word Counting is not accurate for these languages
- Spellcheckers do not work properly (since words sent to them are not really
words)
For example, in Catalan you can write "il·lusion" (for ilLusion). The '·'
character is not considered to be part of the word. Unfortunally, this word is
counted as two word plus always thread as two separate words by the spell checker.
One simple idea to fix this problem will be to extend the system.profiles
files. We are already storing some parameters related to the language like
'DefaultDirectionRtl'. We can add entries to indicate which characters can be
part of a word.
What you guys think?
Jordi,
--Jordi Mas i Hernāndez (homepage http://www.softcatala.org/~jmas) http://www.softcatala.org
This archive was generated by hypermail 2.1.4 : Mon Oct 27 2003 - 16:32:26 EST