abiword-dev Mailing List Archive: Re: Updated: Summary of what I

From: Kathiravelu Pradeeban <kk.pradeeban_at_gmail.com>
Date: Mon Aug 15 2011 - 19:13:29 CEST

Hi Chen,
Had a look. Looks fine.

On Mon, Aug 15, 2011 at 9:49 PM, Chen Xiajian <chenxiajian1985@gmail.com> wrote:
> Hi
> the attachment is the summary of what I have done in GSoc2011. Please
> check it. We can discus more detailed tomorrow same time as usual.

Sure. Let's discuss further the usual time tomorrow.

Thank you.
Regards,
Pradeeban.
>
> I have build the User interface to manage hyphenation. user can enable
> or disable hyphenation function in user interface (GUI). Tomorrow I
> will focus on Linux GTK version.
>
> Best Regard!
>
> Chen Xiajian
>
>
>
> ======================================================
> Summary of What I have done in GSoc2011 1
>
> Until now, my works in GSoc2011 including four parts as following:
> 1．Hyphenation module in Enchant
>  Read and get totally understand the source code of Enchant
>  Reuse the abstract layer of Enchant and add Hyphenation function in
> Enchant, so that we can add more language easily
>  Deal with more languages
>  Add five backend implementation, including ispell, myspell,
> zemberek, voikko, uspell
>  Deal with the spelling-checking module
>
> 2．Call the Hyphenation function in Abiword.
>  Find split info using enchant_dict_hyphenate
>  Split Text_Run to split word pass the line width and keep their format
>  Deal with user's operation(select, delete, cut, paste)
>  User can select weather to enable the hyphenation function
>
> 3. Simple Implementation of Chinese Spell-Checking in Enchant
>  Add a simple spell-check framework for Chinese in Enchant
>  Add library to support
>  Some survey about Chinese Spell-checking
>
> 4. Code Re-factor and debug
>  Code Re-factor, include keep the code flexible
>  Debug coding problem
>
>
> The detail things:
> 1 Hyphenation module in Enchant
> 1.1 Add hyphenation function in Enchant
> Firstly, I add hyphenation method in Enchant:
> ================the code===========
> I think we can combine the hyphenation with spell-checking together,
> So that we can make the code more flexible. In my opinion, the
> hyphenation function defines as following:
> EnchantDict* enchant_broker_request_dict (EnchantBroker* broker, const
> char *const lang); //same as spell-checking
> char *enchant_dict_hyphenate(EnchantDict *dict, const char *const
> word,size_t len);
>
> In order to achieve the function and implement in abstract layer, we
> need to add hyphenation function in EnchantDict. something like, just
> as a function pointer:
> char* (*hyphenate) (struct str_enchant_dict * me,
> const char *const word, size_t len,
> size_t * out_n_suggs);
>
> and the function is implement by the backend. Take “ispell” as example:
> static char * ispell_dict_hyphenate (EnchantDict * me, const char *const word,
> size_t len, size_t * out_n_suggs)
> {
> ISpellChecker * checker;
> checker = (ISpellChecker *) me->user_data;
> return checker->hyphenate (word, len, out_n_suggs);
> }
>
> Finally, we set the connetion
> dict->hyphenate = ispell_dict_hyphenate;
> dict->suggest = hspell_dict_hyphenate;
> dict->suggest = zemberek_dict_hyphenate;
>
> 1.2 Add five backends to support hyphenation
> including ispell, myspell, zemberek, voikko, uspell
>  Hunspell: using seperated dictionary: such as hyph_en_us.dic. we
> can download dic from internet
>  Libhyphenaiton: the dictionary is provided by author, sometimes limited
>  Zemberek: for Turkis
>  Voikko: for Finnish
>
> the changes:
> 1 deleted the unneed connection, such as HSpell
> 2 add hunspell(myspell) hyphenation code
> 3 implement hyphenation using hunspell
> 4 implement hyphenation using Zemberek
>
> ======1 deleted the unneed connection, such as HSpell===========
> Hebrew don’t need any hyphenation
> Yiddish don’t need any hyphenation
> =======2 Implement hyphenation using hunspell
> In order to use libhyphenation. We need to add files:
> hyphen/hnjalloc.h
> hyphen/hnjalloc.c
> hyphen/hyph_en_US.dic
> hyphen/hyphen.c
> hyphen/hyphen.gyp
> hyphen/hyphen.h
> hyphen/hyphen.patch
> hyphen/hyphen.tex
>
> ========3 Implement hyphenation using Zemberek
> just using dbus_g_proxy_call the same as Spell-Check in Zemberek:
> the hyphenation is as following
> char* Zemberek::hyphenate(const char* word)
> {
> char* result;
> GError *Error = NULL;
> if (!dbus_g_proxy_call (proxy, "hecele", &Error,
> G_TYPE_STRING,word,G_TYPE_INVALID,
> G_TYPE_STRV, &result,G_TYPE_INVALID)) {
> g_error_free (Error);
> return NULL;
> }
> char*result=0;
> return result;
> }
>
> 1.3 ISpell
> I used Libhyphenation in ISpell. The simple code is just like this:
> static char *
> ispell_dict_hyphenate (EnchantDict * me, const char *const word)
> {
> ISpellChecker * checker;
>
> checker = (ISpellChecker *) me->user_data;
> if(me->tag!="")
> return checker->hyphenate (word,me->tag);
> return checker->hyphenate (word,"en_us");
> }
> The concrete code in ISpellChecker is :
> char *
> ISpellChecker::hyphenate(const char * const utf8Word, const char *const tag)
> { //we must choose the right language tag
> char* param_value = enchant_broker_get_param (m_broker,
> "enchant.ispell.hyphenation.dictionary.path");
> if(languageMap[tag]!="")
> {
> string result=Hyphenator(RFC_3066::Language(languageMap[tag]),param_value).hyphenate(utf8Word).c_str();
>
> char* temp=new char[result.length()];
> strcpy(temp,result.c_str());
> return temp;
> }
> return NULL;
> }
> 1.4 MySpell
> I used Libhyphenate in ISpell. The simple code is just like this:
> char*
> MySpellChecker::hyphenate (const char* const word, size_t len,char* tag)
> {
> if(len==-1) len=strlen(word);
> if (len > MAXWORDLEN
> || !g_iconv_is_valid(m_translate_in)
> || !g_iconv_is_valid(m_translate_out))
> return 0;
> char* result=0;
> myspell->hyphenate(word,result,tag);
> return result;
> }
> The concrete code in MySpellChecker is :
> void Hunspell::hyphenate( const char* const word, char* result, char* tag )
> {
> HyphenDict *dict;
> char buf[BUFSIZE + 1];
> char *hyphens=new char[BUFSIZE + 1];
> char ** rep;
> int * pos;
> int * cut;
> /* load the hyphenation dictionary */
> string filePath="hyph_";
> filePath+=tag;
> filePath+=".dic";
> if ((dict = hnj_hyphen_load(filePath.c_str())) == NULL) {
> fprintf(stderr, "Couldn't find file %s\n",tag);
> fflush(stderr);
> exit(1);
> }
> int len=strlen(word);
> if (hnj_hyphen_hyphenate2(dict, word, len-1, hyphens, NULL, &rep,
> &pos, &cut)) {
> free(hyphens);
> fprintf(stderr, "hyphenation error\n");
> exit(1);
> }
>
> hnj_hyphen_free(dict);
> result=hyphens;
> }
>
> 1.5 zemberek
> The way in Zemberek is same with the two above:
> static char*
> zemberek_dict_hyphenate (EnchantDict * me, const char *const word)
> {
> Zemberek *checker;
> checker = (Zemberek *) me->user_data;
> return checker->hyphenate (word);
> }
> But the way for the concrete implementation is different from the two.
> We use zemberek_service
> char* Zemberek::hyphenate(const char* word)
> {
> char* result;
> GError *Error = NULL;
> if (!dbus_g_proxy_call (proxy, "hecele", &Error,
> G_TYPE_STRING,word,G_TYPE_INVALID,
> G_TYPE_STRV, &result,G_TYPE_INVALID)) {
> g_error_free (Error);
> return NULL;
> }
>
> char*result=0;
> return result;
> }
> 1.6 voikko
> The hyphenation implementation in Voikko is easy since Voikko has
> hyphenaiton’s API.
> static char **
> voikko_dict_suggest (EnchantDict * me, const char *const word,
> size_t len, size_t * out_n_suggs)
> {
> char **sugg_arr;
> int voikko_handle;
>
> voikko_handle = (long) me->user_data;
> sugg_arr = voikko_suggest_cstr(voikko_handle, word);
> if (sugg_arr == NULL)
> return NULL;
> for (*out_n_suggs = 0; sugg_arr[*out_n_suggs] != NULL; (*out_n_suggs)++);
> return sugg_arr;
> }
>
> 1.7 Deploy of enchant in Abiword
> I just copy the buliding result of enchant to the right place in Abiword:
> enchant\bin\Debug\libenchant_myspell.dll
> ---->abiword\msvc2008\Debug\lib\enchant\libenchant_myspell.dll
> enchant\bin\Debug\libenchant_ispell.dll
> ---->abiword\msvc2008\Debug\lib\enchant\libenchant_ispell.dll
> enchant\bin\Debug\libenchant.dll---->
> abiword\msvc2008\Debug\bin\ibenchant.dll
>
> 1.8 Test in Linux
> I have test the Enchant module in RedHat. It works fine for me.
>
> 2 Call the Hyphenation function in Abiword.
>  Split run to split word and keep the format
>  Find split info
>  Deal with user's operation(select, delete, cut, paste)
>
> Main Goal: call hyphenation module of enchant to display the
> hyphenation result in abiword. After user's operation, refresh the
> hyphenation-result accordingly include user adding new word, delete
> word, copy word, cut word
>
> The main code is adding in the format function in LineBreaker.h(cpp)
> // find the split point
> while (pRunToBump && pLine->getNumRunsInLine() && (pLine->getLastRun()
> != m_pLastRunToKeep))
> {
> UT_ASSERT(pRunToBump->getLine() == pLine);
> if(!pLine->removeRun(pRunToBump))
> {
> pRunToBump->setLine(NULL);
> }
> UT_ASSERT(pLine->getLastRun()->getType() != FPRUN_ENDOFPARAGRAPH);
> if(pLine->getLastRun()->getType() == FPRUN_ENDOFPARAGRAPH)
> {
> fp_Run * pNuke = pLine->getLastRun();
> pLine->removeRun(pNuke);
> }
> pRunToBump->printText(); //trace out debug message & run two time
> pNextLine->insertRun(pRunToBump); //called when create new line
> // to get the split word
> if (!(pRunToBump->getPrevRun() && pLine->getNumRunsInLine() &&
> (pLine->getLastRun() != m_pLastRunToKeep)))
> {
> pRunToSplit=pRunToBump;
> PD_StruxIterator text(pRunToBump->getBlock()->getStruxDocHandle(),
> pRunToBump->getBlockOffset() + fl_BLOCK_STRUX_OFFSET);
>
> text.setUpperLimit(text.getPosition() + pRunToBump->getLength() - 1);
> UT_ASSERT_HARMLESS( text.getStatus() == UTIter_OK );
> UT_UTF8String sTmp;
> while(text.getStatus() == UTIter_OK)
> {
> UT_UCS4Char c = text.getChar();
> UT_DEBUGMSG(("| %d |",c));
> if(c >= ' ' && c <128)
> sTmp += static_cast<char>(c);
> ++text;
> }
> UT_DEBUGMSG(("The Split Text |%s| \n",sTmp.utf8_str()));
> if(sTmp.utf8_str()!=0)
> {
> pWordToSplit=sTmp;
> UT_DEBUGMSG(("wordToSplit |%s| \n",pWordToSplit.utf8_str()));
> }
> }
> pRunToBump = pRunToBump->getPrevRun();
> UT_DEBUGMSG(("Next runToBump %x \n",pRunToBump));
> }
> }
> //modify src/text/fmt/xp/fb_LineBreaker.cpp to place hypernation points
> //spit the word
> if(pWordToSplit.length()!=NULL)
> {
> pWordHyphenationResult=pBlock->_hyphenateWord(pWordToSplit.ucs4_str().ucs4_str(),0,0);
> int tickLeft=pLine->getAvailableWidth();
> if (pWordHyphenationResult && *pWordHyphenationResult){
> gchar *c = g_ucs4_to_utf8(pWordHyphenationResult, -1, NULL, NULL, NULL);
> for(int index=g_utf8_strlen(c,NULL);index>=0;--index)
> {
> if(pWordHyphenationResult[index]=='-'&&index<tickLeft)
> {
> pBreakPoint=index;
> fp_TextRun* textout=static_cast<fp_TextRun*>(pRunToSplit);
> textout->split(pBreakPoint);
> }
> }
> }
> }
>
>
> 3 Simple Implementation of Chinese Spell-Check in Enchant
> After GSoc2011, I would like to add Chinese Spell-Check in Enchant.
> Chinese Spell-Check is also a very important issue in Word-Processor.
> I found some lib to support; I just build a simple framework since
> time is limit.
> The main function:
>
>
> 4 Code Re-factor and debug
> 5. Still to improve
>  Code Re-Factor
>  Deal with more language
>  include more user's operation(such as operate with picture may
> influence the hyphenation result)
>
> more:
>  Fully Support hyphenation in Abiword
>  Support more language
>  More tests in Linux(Unix)
>  Finish the Implementation of Chinese Spell-Check in Enchant
>  User interface about Hyphenation
>

-- 
Kathiravelu Pradeeban.
Software Engineer.
WSO2 Inc.
Blog: [Llovizna] http://kkpradeeban.blogspot.com/

Received on Mon Aug 15 19:14:02 2011

This archive was generated by hypermail 2.1.8 : Mon Aug 15 2011 - 19:14:02 CEST

Re: Updated: Summary of what I have done in GSoc2011_chenxiajian