Re: Hebrew spellcheck

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Sep 08 2002 - 22:24:16 EDT

Next message: Andrew Dunbar: "Re: Hebrew in AbiWord"

Previous message: Andrew Dunbar: "Re: Commit: export tables, links, images to xsl-fo"
Next in thread: Tomas Frydrych: "Re: Hebrew spellcheck"
Reply: Tomas Frydrych: "Re: Hebrew spellcheck"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

--- Uri David Akavia <uridavid@netvision.net.il>
wrote:
> Shalom.

Shalom.

> I sent your message to the ivrix list. I'm sorry it
> took so long.

No problem. I'm CC'in the AbiWord dev list with this
answer since a couple of people there may have some-
thing to add.

> Since the problem came up with spell checking, why
> do you need these characters for it at all?

I'm sorry I don't have the original thread of the
conversation around since I'm out of inbox space ):

> Hebrew has two form of writing (I think you probably
> know this)
> KTIV MALEH (full writing) in which letters replace
> the marks
> KTIV HASER (lacking writing) in which there are no
> special letters

Yes I'm aware of this. In fact I think there's more
than two ways. I've read about this in a history of
Hebrew at my library. Originally there were no vowels
at all, then later, yod, vav, aleph, and ayin began to
be used to represent vowels with special rules.
They're usually called matres lectionis in English.
Then later still, the vowel points were invented and
I believe these are used in combination with the
matres lectionis.

> It is possible just to choose one (I prefer KTIV
> MALEH), which I believe is correct if you don't
> write the marks, which most people don't.
> Whatever is decided should just be written in the
> documentation somewhere. Besides, these marks have
> absolutely no value in spellchecking - no one
> actually checks them for correctness (it is much
> harder than checking spelling, since it has rules).
> So it is not a problem when you don't check them in
> the spellchecker.

Well I wish it was that simple. The problem is that
people do use them and will continue to use them.
Religious texts always seem to use them. This is an
important case for Hebrew and we do already have users
doing Biblical work in Hebrew with AbiWord.
Now if some text is marked as being Hebrew and it does
have vowel marks, they simply won't match the entries
in the dictionary at all. So they'll all be marked as
errors! The next step would be for us to filter out
all the vowel points before passing words to the
spell-checker. But now imagine we are editing a
section of Genisis which has full vowel points and
also some spelling errors. The spellchecker will tell
you the word has errors and offer some suggestions.
But all the suggestions will have no vowel points!
Perhaps the user will be able to fix it, perhaps not.
But the computer is a machine and ought to be able to
do exactly this type of work for us.
Also, the very reason the points are used in religious
works is to remove any ambiguity raised when two words
have the same consonants but different vowels.
A user would expect that if we support Hebrew we
support this. But when she has words with correct
consonants in the correct order but with incorrect
vowel points, no error will be shown and the user will
be lead to believe she has made no errors.

So the next step is to think, well I guess the Bible
is pretty important so maybe we should just have two
dictionaries, one without vowels that we can do now
and start using right away for most things, then
Biblical Hebrew can be treated as a separate language
with its own dictionary made by whoever needs to use
such a thing. Problem is, we're trying to stick to
standards so we used ISO 639 language codes to mark
sections of our documents as to which language they
belong to. ISO 639 can be a bit vague. It does now
have separate codes for Modern Greek (ell or gre), and
Ancient Greek (grc); but it still has only one code
for
Hebrew (heb).

Also, I collect foreign novels and the only one I have
in Hebrew, Memoirs of a Geisha by Arthur Golden seems
to have at least one word per 5 pages or so which is
using vowel points. If AbiWord is to be a
proffesional
quality word processor, it needs to be good enough
for the translators to have used it to create this
book.

I have created a bug report for AbiWord some time ago
suggesting that we need more flexibility in our use
of language codes so you might care to look into that:
http://bugzilla.abisource.com/show_bug.cgi?id=3227

It might seem like I'm fighting against Hebrew spell-
checking but I'm really not. I just want to do it
right - and it is doable. I'd love to implement it
myself.

What we can do is start building up a high quality
Hebrew wordlist. Probably as a plain text UTF-8
encoded file. We can start with just the vowelless
words and add the vowelled versions later.
But we really shouldn't lock in place a system which
isn't going to be flexible enough in the long-term.

If you think building a word-list is a good idea we
might be able to give it a place in AbiWord's CVS
somewhere. Perhaps creating a special project just
for this on SourceForge is a better idea.

Hope this helps.
Andrew Dunbar.

> Yours,
>
> Uri David
>

=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Next message: Andrew Dunbar: "Re: Hebrew in AbiWord"
Previous message: Andrew Dunbar: "Re: Commit: export tables, links, images to xsl-fo"
Next in thread: Tomas Frydrych: "Re: Hebrew spellcheck"
Reply: Tomas Frydrych: "Re: Hebrew spellcheck"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sun Sep 08 2002 - 22:27:21 EDT