From: Dom Lachowicz (doml@appligent.com)
Date: Tue Sep 24 2002 - 16:43:00 EDT
On Tuesday, September 24, 2002, at 04:26 PM, Jordi Mas wrote:
> <Barbarism
> word="tamany"
> suggestion1="mida"
> suggestion2="grandària"
> />
For what it's worth, Jordi and I have reached a good consensus on this.
I've made a suggestion to him regarding the XML grammar. It should look
something like this instead:
<barbarism word="tamany">
<suggestion word="mida" />
<suggestion word="grandària" />
<suggestion ... />
</barbarism>
Doing this will allow for a easily growable suggestion list and simpler
import logic.
> * Known problems in the design
>
> - We work at word level, not sentence level. We are just hacking a
> spell checker
I'll work on an interface and implementation for this which we can use
later. It will necessarily resemble:
Iterator Document::getParagraphIterator()
Iterator ParagraphIterator::getSentenceIterator()
string ParagraphIterator::getTarget()
Iterator SentenceIterator::getWordIterator()
string WordIterator::getTarget()
Once we have a reasonably working sentence iterator, we can start
hooking up grammar checkers. Once we have a sentence iterator, we'll
have a word iterator that might help clean up the massive amount of
garbage in our current spelling queuing code.
> - Words that can be declined have to be coded several times (plurals,
> verbs declinations, etc). At least in Catalan, this is not very > common.
On a related note, we may want to implement a multimap (1->Many)
structure to use here for efficiency concerns. We could probably get
away with using a UT_Map or UT_StringPtrMap here. The target would be a
UT_Vector containing UT_UTF8String pointers.
string barbarism -> string suggestion1, string suggestion2, ...
Cheers,
Dom
This archive was generated by hypermail 2.1.4 : Tue Sep 24 2002 - 16:47:56 EDT