abiword-dev Mailing List Archive: Re: Updated: Summary of what I

From: Chen Xiajian <chenxiajian1985_at_gmail.com>
Date: Tue Aug 16 2011 - 11:06:43 CEST

OK.

I focus on the last part of my project—— User' Interface both windows
and Linux. It is very important very much. I just ignore this part
before, sorry
User can enable or disable hyphenation function in user interface (GUI).

Best Regards
chen xiajian

2011/8/16 Kathiravelu Pradeeban <kk.pradeeban@gmail.com>
>
> Hi Chen,
> Had a look. Looks fine.
>
> On Mon, Aug 15, 2011 at 9:49 PM, Chen Xiajian <chenxiajian1985@gmail.com> wrote:
> > Hi
> > the attachment is the summary of what I have done in GSoc2011. Please
> > check it. We can discus more detailed tomorrow same time as usual.
>
> Sure. Let's discuss further the usual time tomorrow.
>
> Thank you.
> Regards,
> Pradeeban.
> >
> > I have build the User interface to manage hyphenation. user can enable
> > or disable hyphenation function in user interface (GUI). Tomorrow I
> > will focus on Linux GTK version.
> >
> > Best Regard!
> >
> > Chen Xiajian
> >
> >
> >
> > ======================================================
> > Summary of What I have done in GSoc2011 1
> >
> > Until now, my works in GSoc2011 including four parts as following:
> > 1．Hyphenation module in Enchant
> >  Read and get totally understand the source code of Enchant
> >  Reuse the abstract layer of Enchant and add Hyphenation function in
> > Enchant, so that we can add more language easily
> >  Deal with more languages
> >  Add five backend implementation, including ispell, myspell,
> > zemberek, voikko, uspell
> >  Deal with the spelling-checking module
> >
> > 2．Call the Hyphenation function in Abiword.
> >  Find split info using enchant_dict_hyphenate
> >  Split Text_Run to split word pass the line width and keep their format
> >  Deal with user's operation(select, delete, cut, paste)
> >  User can select weather to enable the hyphenation function
> >
> > 3. Simple Implementation of Chinese Spell-Checking in Enchant
> >  Add a simple spell-check framework for Chinese in Enchant
> >  Add library to support
> >  Some survey about Chinese Spell-checking
> >
> > 4. Code Re-factor and debug
> >  Code Re-factor, include keep the code flexible
> >  Debug coding problem
> >
> >
> > The detail things:
> > 1 Hyphenation module in Enchant
> > 1.1 Add hyphenation function in Enchant
> > Firstly, I add hyphenation method in Enchant:
> > ================the code===========
> > I think we can combine the hyphenation with spell-checking together,
> > So that we can make the code more flexible. In my opinion, the
> > hyphenation function defines as following:
> > EnchantDict* enchant_broker_request_dict (EnchantBroker* broker, const
> > char *const lang); //same as spell-checking
> > char *enchant_dict_hyphenate(EnchantDict *dict, const char *const
> > word,size_t len);
> >
> > In order to achieve the function and implement in abstract layer, we
> > need to add hyphenation function in EnchantDict. something like, just
> > as a function pointer:
> > char* (*hyphenate) (struct str_enchant_dict * me,
> > const char *const word, size_t len,
> > size_t * out_n_suggs);
> >
> > and the function is implement by the backend. Take “ispell” as example:
> > static char * ispell_dict_hyphenate (EnchantDict * me, const char *const word,
> > size_t len, size_t * out_n_suggs)
> > {
> > ISpellChecker * checker;
> > checker = (ISpellChecker *) me->user_data;
> > return checker->hyphenate (word, len, out_n_suggs);
> > }
> >
> > Finally, we set the connetion
> > dict->hyphenate = ispell_dict_hyphenate;
> > dict->suggest = hspell_dict_hyphenate;
> > dict->suggest = zemberek_dict_hyphenate;
> >
> > 1.2 Add five backends to support hyphenation
> > including ispell, myspell, zemberek, voikko, uspell
> >  Hunspell: using seperated dictionary: such as hyph_en_us.dic. we
> > can download dic from internet
> >  Libhyphenaiton: the dictionary is provided by author, sometimes limited
> >  Zemberek: for Turkis
> >  Voikko: for Finnish
> >
> > the changes:
> > 1 deleted the unneed connection, such as HSpell
> > 2 add hunspell(myspell) hyphenation code
> > 3 implement hyphenation using hunspell
> > 4 implement hyphenation using Zemberek
> >
> > ======1 deleted the unneed connection, such as HSpell===========
> > Hebrew don’t need any hyphenation
> > Yiddish don’t need any hyphenation
> > =======2 Implement hyphenation using hunspell
> > In order to use libhyphenation. We need to add files:
> > hyphen/hnjalloc.h
> > hyphen/hnjalloc.c
> > hyphen/hyph_en_US.dic
> > hyphen/hyphen.c
> > hyphen/hyphen.gyp
> > hyphen/hyphen.h
> > hyphen/hyphen.patch
> > hyphen/hyphen.tex
> >
> > ========3 Implement hyphenation using Zemberek
> > just using dbus_g_proxy_call the same as Spell-Check in Zemberek:
> > the hyphenation is as following
> > char* Zemberek::hyphenate(const char* word)
> > {
> > char* result;
> > GError *Error = NULL;
> > if (!dbus_g_proxy_call (proxy, "hecele", &Error,
> > G_TYPE_STRING,word,G_TYPE_INVALID,
> > G_TYPE_STRV, &result,G_TYPE_INVALID)) {
> > g_error_free (Error);
> > return NULL;
> > }
> > char*result=0;
> > return result;
> > }
> >
> > 1.3 ISpell
> > I used Libhyphenation in ISpell. The simple code is just like this:
> > static char *
> > ispell_dict_hyphenate (EnchantDict * me, const char *const word)
> > {
> > ISpellChecker * checker;
> >
> > checker = (ISpellChecker *) me->user_data;
> > if(me->tag!="")
> > return checker->hyphenate (word,me->tag);
> > return checker->hyphenate (word,"en_us");
> > }
> > The concrete code in ISpellChecker is :
> > char *
> > ISpellChecker::hyphenate(const char * const utf8Word, const char *const tag)
> > { //we must choose the right language tag
> > char* param_value = enchant_broker_get_param (m_broker,
> > "enchant.ispell.hyphenation.dictionary.path");
> > if(languageMap[tag]!="")
> > {
> > string result=Hyphenator(RFC_3066::Language(languageMap[tag]),param_value).hyphenate(utf8Word).c_str();
> >
> > char* temp=new char[result.length()];
> > strcpy(temp,result.c_str());
> > return temp;
> > }
> > return NULL;
> > }
> > 1.4 MySpell
> > I used Libhyphenate in ISpell. The simple code is just like this:
> > char*
> > MySpellChecker::hyphenate (const char* const word, size_t len,char* tag)
> > {
> > if(len==-1) len=strlen(word);
> > if (len > MAXWORDLEN
> > || !g_iconv_is_valid(m_translate_in)
> > || !g_iconv_is_valid(m_translate_out))
> > return 0;
> > char* result=0;
> > myspell->hyphenate(word,result,tag);
> > return result;
> > }
> > The concrete code in MySpellChecker is :
> > void Hunspell::hyphenate( const char* const word, char* result, char* tag )
> > {
> > HyphenDict *dict;
> > char buf[BUFSIZE + 1];
> > char *hyphens=new char[BUFSIZE + 1];
> > char ** rep;
> > int * pos;
> > int * cut;
> > /* load the hyphenation dictionary */
> > string filePath="hyph_";
> > filePath+=tag;
> > filePath+=".dic";
> > if ((dict = hnj_hyphen_load(filePath.c_str())) == NULL) {
> > fprintf(stderr, "Couldn't find file %s\n",tag);
> > fflush(stderr);
> > exit(1);
> > }
> > int len=strlen(word);
> > if (hnj_hyphen_hyphenate2(dict, word, len-1, hyphens, NULL, &rep,
> > &pos, &cut)) {
> > free(hyphens);
> > fprintf(stderr, "hyphenation error\n");
> > exit(1);
> > }
> >
> > hnj_hyphen_free(dict);
> > result=hyphens;
> > }
> >
> > 1.5 zemberek
> > The way in Zemberek is same with the two above:
> > static char*
> > zemberek_dict_hyphenate (EnchantDict * me, const char *const word)
> > {
> > Zemberek *checker;
> > checker = (Zemberek *) me->user_data;
> > return checker->hyphenate (word);
> > }
> > But the way for the concrete implementation is different from the two.
> > We use zemberek_service
> > char* Zemberek::hyphenate(const char* word)
> > {
> > char* result;
> > GError *Error = NULL;
> > if (!dbus_g_proxy_call (proxy, "hecele", &Error,
> > G_TYPE_STRING,word,G_TYPE_INVALID,
> > G_TYPE_STRV, &result,G_TYPE_INVALID)) {
> > g_error_free (Error);
> > return NULL;
> > }
> >
> > char*result=0;
> > return result;
> > }
> > 1.6 voikko
> > The hyphenation implementation in Voikko is easy since Voikko has
> > hyphenaiton’s API.
> > static char **
> > voikko_dict_suggest (EnchantDict * me, const char *const word,
> > size_t len, size_t * out_n_suggs)
> > {
> > char **sugg_arr;
> > int voikko_handle;
> >
> > voikko_handle = (long) me->user_data;
> > sugg_arr = voikko_suggest_cstr(voikko_handle, word);
> > if (sugg_arr == NULL)
> > return NULL;
> > for (*out_n_suggs = 0; sugg_arr[*out_n_suggs] != NULL; (*out_n_suggs)++);
> > return sugg_arr;
> > }
> >
> > 1.7 Deploy of enchant in Abiword
> > I just copy the buliding result of enchant to the right place in Abiword:
> > enchant\bin\Debug\libenchant_myspell.dll
> > ---->abiword\msvc2008\Debug\lib\enchant\libenchant_myspell.dll
> > enchant\bin\Debug\libenchant_ispell.dll
> > ---->abiword\msvc2008\Debug\lib\enchant\libenchant_ispell.dll
> > enchant\bin\Debug\libenchant.dll---->
> > abiword\msvc2008\Debug\bin\ibenchant.dll
> >
> > 1.8 Test in Linux
> > I have test the Enchant module in RedHat. It works fine for me.
> >
> > 2 Call the Hyphenation function in Abiword.
> >  Split run to split word and keep the format
> >  Find split info
> >  Deal with user's operation(select, delete, cut, paste)
> >
> > Main Goal: call hyphenation module of enchant to display the
> > hyphenation result in abiword. After user's operation, refresh the
> > hyphenation-result accordingly include user adding new word, delete
> > word, copy word, cut word
> >
> > The main code is adding in the format function in LineBreaker.h(cpp)
> > // find the split point
> > while (pRunToBump && pLine->getNumRunsInLine() && (pLine->getLastRun()
> > != m_pLastRunToKeep))
> > {
> > UT_ASSERT(pRunToBump->getLine() == pLine);
> > if(!pLine->removeRun(pRunToBump))
> > {
> > pRunToBump->setLine(NULL);
> > }
> > UT_ASSERT(pLine->getLastRun()->getType() != FPRUN_ENDOFPARAGRAPH);
> > if(pLine->getLastRun()->getType() == FPRUN_ENDOFPARAGRAPH)
> > {
> > fp_Run * pNuke = pLine->getLastRun();
> > pLine->removeRun(pNuke);
> > }
> > pRunToBump->printText(); //trace out debug message & run two time
> > pNextLine->insertRun(pRunToBump); //called when create new line
> > // to get the split word
> > if (!(pRunToBump->getPrevRun() && pLine->getNumRunsInLine() &&
> > (pLine->getLastRun() != m_pLastRunToKeep)))
> > {
> > pRunToSplit=pRunToBump;
> > PD_StruxIterator text(pRunToBump->getBlock()->getStruxDocHandle(),
> > pRunToBump->getBlockOffset() + fl_BLOCK_STRUX_OFFSET);
> >
> > text.setUpperLimit(text.getPosition() + pRunToBump->getLength() - 1);
> > UT_ASSERT_HARMLESS( text.getStatus() == UTIter_OK );
> > UT_UTF8String sTmp;
> > while(text.getStatus() == UTIter_OK)
> > {
> > UT_UCS4Char c = text.getChar();
> > UT_DEBUGMSG(("| %d |",c));
> > if(c >= ' ' && c <128)
> > sTmp += static_cast<char>(c);
> > ++text;
> > }
> > UT_DEBUGMSG(("The Split Text |%s| \n",sTmp.utf8_str()));
> > if(sTmp.utf8_str()!=0)
> > {
> > pWordToSplit=sTmp;
> > UT_DEBUGMSG(("wordToSplit |%s| \n",pWordToSplit.utf8_str()));
> > }
> > }
> > pRunToBump = pRunToBump->getPrevRun();
> > UT_DEBUGMSG(("Next runToBump %x \n",pRunToBump));
> > }
> > }
> > //modify src/text/fmt/xp/fb_LineBreaker.cpp to place hypernation points
> > //spit the word
> > if(pWordToSplit.length()!=NULL)
> > {
> > pWordHyphenationResult=pBlock->_hyphenateWord(pWordToSplit.ucs4_str().ucs4_str(),0,0);
> > int tickLeft=pLine->getAvailableWidth();
> > if (pWordHyphenationResult && *pWordHyphenationResult){
> > gchar *c = g_ucs4_to_utf8(pWordHyphenationResult, -1, NULL, NULL, NULL);
> > for(int index=g_utf8_strlen(c,NULL);index>=0;--index)
> > {
> > if(pWordHyphenationResult[index]=='-'&&index<tickLeft)
> > {
> > pBreakPoint=index;
> > fp_TextRun* textout=static_cast<fp_TextRun*>(pRunToSplit);
> > textout->split(pBreakPoint);
> > }
> > }
> > }
> > }
> >
> >
> > 3 Simple Implementation of Chinese Spell-Check in Enchant
> > After GSoc2011, I would like to add Chinese Spell-Check in Enchant.
> > Chinese Spell-Check is also a very important issue in Word-Processor.
> > I found some lib to support; I just build a simple framework since
> > time is limit.
> > The main function:
> >
> >
> > 4 Code Re-factor and debug
> > 5. Still to improve
> >  Code Re-Factor
> >  Deal with more language
> >  include more user's operation(such as operate with picture may
> > influence the hyphenation result)
> >
> > more:
> >  Fully Support hyphenation in Abiword
> >  Support more language
> >  More tests in Linux(Unix)
> >  Finish the Implementation of Chinese Spell-Check in Enchant
> >  User interface about Hyphenation
> >
>
>
>
> --
> Kathiravelu Pradeeban.
> Software Engineer.
> WSO2 Inc.
>
> Blog: [Llovizna] http://kkpradeeban.blogspot.com/
Received on Tue Aug 16 11:06:50 2011

This archive was generated by hypermail 2.1.8 : Tue Aug 16 2011 - 11:06:50 CEST

Re: Updated: Summary of what I have done in GSoc2011_chenxiajian