Re: Updated: Summary of what I have done in GSoc2011_chenxiajian

From: Chen Xiajian <chenxiajian1985_at_gmail.com>
Date: Tue Aug 16 2011 - 11:06:43 CEST

OK.

I focus on the last part of my project—— User' Interface both windows
and Linux. It is very important very much. I just ignore this part
before, sorry
User can enable or disable hyphenation function in user interface (GUI).

Best Regards
chen xiajian

2011/8/16 Kathiravelu Pradeeban <kk.pradeeban@gmail.com>
>
> Hi Chen,
> Had a look. Looks fine.
>
> On Mon, Aug 15, 2011 at 9:49 PM, Chen Xiajian <chenxiajian1985@gmail.com> wrote:
> > Hi
> > the attachment is the summary of what I have done in GSoc2011. Please
> > check it.  We can discus more detailed  tomorrow same time as usual.
>
> Sure. Let's discuss further the usual time tomorrow.
>
> Thank you.
> Regards,
> Pradeeban.
> >
> > I have build the User interface to manage hyphenation. user can enable
> > or disable hyphenation function in user interface (GUI). Tomorrow I
> > will focus on Linux GTK version.
> >
> > Best Regard!
> >
> > Chen Xiajian
> >
> >
> >
> > ======================================================
> > Summary of What I have done in GSoc2011 1
> >
> > Until now, my works in GSoc2011 including four parts as following:
> > 1.Hyphenation module in Enchant
> >        Read and get totally understand the source code of Enchant
> >        Reuse the abstract layer of Enchant and add Hyphenation function in
> > Enchant, so that we can add more language easily
> >        Deal with more languages
> >        Add five backend implementation, including ispell, myspell,
> > zemberek, voikko, uspell
> >        Deal with the spelling-checking module
> >
> > 2.Call the Hyphenation function in Abiword.
> >        Find split info using enchant_dict_hyphenate
> >        Split Text_Run to split word pass the line width and keep their format
> >        Deal with user's operation(select, delete, cut, paste)
> >        User can select weather to enable the hyphenation function
> >
> > 3. Simple Implementation of Chinese Spell-Checking in Enchant
> >        Add a simple spell-check framework for Chinese in Enchant
> >        Add library to support
> >        Some survey about Chinese Spell-checking
> >
> > 4. Code Re-factor and debug
> >        Code Re-factor, include keep the code flexible
> >        Debug coding problem
> >
> >
> > The detail things:
> > 1 Hyphenation module in Enchant
> > 1.1 Add hyphenation function in Enchant
> > Firstly, I add hyphenation method in Enchant:
> > ================the code===========
> > I think we can combine the hyphenation with spell-checking together,
> > So that we can make the code more flexible. In my opinion, the
> > hyphenation function defines as following:
> > EnchantDict* enchant_broker_request_dict (EnchantBroker* broker, const
> > char *const lang); //same as spell-checking
> > char *enchant_dict_hyphenate(EnchantDict *dict, const char *const
> > word,size_t len);
> >
> > In order to achieve the function and implement in abstract layer, we
> > need to add hyphenation function in EnchantDict. something like, just
> > as a function pointer:
> > char* (*hyphenate) (struct str_enchant_dict * me,
> >                          const char *const word, size_t len,
> >                          size_t * out_n_suggs);
> >
> > and the function is implement by the backend. Take “ispell” as example:
> > static char * ispell_dict_hyphenate (EnchantDict * me, const char *const word,
> >                    size_t len, size_t * out_n_suggs)
> > {
> >       ISpellChecker * checker;
> >       checker = (ISpellChecker *) me->user_data;
> >       return checker->hyphenate (word, len, out_n_suggs);
> > }
> >
> > Finally, we set the connetion
> >  dict->hyphenate = ispell_dict_hyphenate;
> >  dict->suggest = hspell_dict_hyphenate;
> > dict->suggest = zemberek_dict_hyphenate;
> >
> > 1.2 Add five backends to support hyphenation
> >  including ispell, myspell, zemberek, voikko, uspell
> >        Hunspell: using seperated dictionary: such as hyph_en_us.dic.  we
> > can download dic from internet
> >        Libhyphenaiton: the dictionary is provided by author, sometimes limited
> >        Zemberek: for Turkis
> >        Voikko: for Finnish
> >
> > the changes:
> > 1 deleted the unneed connection, such as HSpell
> > 2 add hunspell(myspell) hyphenation code
> > 3 implement hyphenation using hunspell
> > 4 implement hyphenation using Zemberek
> >
> > ======1 deleted the unneed connection, such as HSpell===========
> > Hebrew don’t need any hyphenation
> > Yiddish don’t need any hyphenation
> > =======2 Implement hyphenation using hunspell
> > In order to use libhyphenation. We need to add files:
> > hyphen/hnjalloc.h
> > hyphen/hnjalloc.c
> > hyphen/hyph_en_US.dic
> > hyphen/hyphen.c
> > hyphen/hyphen.gyp
> > hyphen/hyphen.h
> > hyphen/hyphen.patch
> > hyphen/hyphen.tex
> >
> > ========3 Implement hyphenation using Zemberek
> >  just using dbus_g_proxy_call the same as Spell-Check in Zemberek:
> > the hyphenation is as following
> >  char* Zemberek::hyphenate(const char* word)
> > {
> >       char* result;
> >       GError *Error = NULL;
> >       if (!dbus_g_proxy_call (proxy, "hecele", &Error,
> >               G_TYPE_STRING,word,G_TYPE_INVALID,
> >               G_TYPE_STRV, &result,G_TYPE_INVALID)) {
> >                       g_error_free (Error);
> >                       return NULL;
> >       }
> >       char*result=0;
> >       return result;
> > }
> >
> > 1.3 ISpell
> > I used Libhyphenation in ISpell. The simple code is just like this:
> > static char *
> > ispell_dict_hyphenate (EnchantDict * me, const char *const word)
> > {
> >        ISpellChecker * checker;
> >
> >        checker = (ISpellChecker *) me->user_data;
> >        if(me->tag!="")
> >          return checker->hyphenate (word,me->tag);
> >    return checker->hyphenate (word,"en_us");
> > }
> > The concrete code in ISpellChecker is :
> > char *
> > ISpellChecker::hyphenate(const char * const utf8Word, const char *const tag)
> > {  //we must choose the right language tag
> >        char* param_value = enchant_broker_get_param (m_broker,
> > "enchant.ispell.hyphenation.dictionary.path");
> >        if(languageMap[tag]!="")
> >        {
> >                string result=Hyphenator(RFC_3066::Language(languageMap[tag]),param_value).hyphenate(utf8Word).c_str();
> >
> >                char* temp=new char[result.length()];
> >                strcpy(temp,result.c_str());
> >                return temp;
> >        }
> >        return NULL;
> > }
> > 1.4 MySpell
> > I used Libhyphenate in ISpell. The simple code is just like this:
> > char*
> > MySpellChecker::hyphenate (const char* const word, size_t len,char* tag)
> > {
> >        if(len==-1) len=strlen(word);
> >        if (len > MAXWORDLEN
> >                || !g_iconv_is_valid(m_translate_in)
> >                || !g_iconv_is_valid(m_translate_out))
> >                return 0;
> >        char* result=0;
> >        myspell->hyphenate(word,result,tag);
> >        return result;
> > }
> > The concrete code in MySpellChecker is :
> > void Hunspell::hyphenate( const char* const word, char* result, char* tag )
> > {
> >        HyphenDict *dict;
> >        char buf[BUFSIZE + 1];
> >        char *hyphens=new char[BUFSIZE + 1];
> >        char ** rep;
> >        int * pos;
> >        int * cut;
> >        /* load the hyphenation dictionary */
> >        string filePath="hyph_";
> >        filePath+=tag;
> >        filePath+=".dic";
> >        if ((dict = hnj_hyphen_load(filePath.c_str())) == NULL) {
> >                fprintf(stderr, "Couldn't find file %s\n",tag);
> >                fflush(stderr);
> >                exit(1);
> >        }
> >     int len=strlen(word);
> >     if (hnj_hyphen_hyphenate2(dict, word, len-1, hyphens, NULL, &rep,
> > &pos, &cut)) {
> >                                free(hyphens);
> >                                fprintf(stderr, "hyphenation error\n");
> >                                exit(1);
> >                }
> >
> >        hnj_hyphen_free(dict);
> >        result=hyphens;
> > }
> >
> > 1.5 zemberek
> > The way in Zemberek is same with the two above:
> > static char*
> > zemberek_dict_hyphenate (EnchantDict * me, const char *const word)
> > {
> >        Zemberek *checker;
> >        checker = (Zemberek *) me->user_data;
> >        return checker->hyphenate (word);
> > }
> > But the way for the concrete implementation is different from the two.
> > We use zemberek_service
> > char* Zemberek::hyphenate(const char* word)
> > {
> >        char* result;
> >        GError *Error = NULL;
> >        if (!dbus_g_proxy_call (proxy, "hecele", &Error,
> >                G_TYPE_STRING,word,G_TYPE_INVALID,
> >                G_TYPE_STRV, &result,G_TYPE_INVALID)) {
> >                        g_error_free (Error);
> >                        return NULL;
> >        }
> >
> >        char*result=0;
> >        return result;
> > }
> > 1.6 voikko
> > The hyphenation implementation in Voikko is easy since Voikko has
> > hyphenaiton’s API.
> > static char **
> > voikko_dict_suggest (EnchantDict * me, const char *const word,
> >                     size_t len, size_t * out_n_suggs)
> > {
> >        char **sugg_arr;
> >        int voikko_handle;
> >
> >        voikko_handle = (long) me->user_data;
> >        sugg_arr = voikko_suggest_cstr(voikko_handle, word);
> >        if (sugg_arr == NULL)
> >                return NULL;
> >        for (*out_n_suggs = 0; sugg_arr[*out_n_suggs] != NULL; (*out_n_suggs)++);
> >        return sugg_arr;
> > }
> >
> > 1.7 Deploy of enchant in Abiword
> > I just copy the buliding result of enchant to the right place in Abiword:
> > enchant\bin\Debug\libenchant_myspell.dll
> > ---->abiword\msvc2008\Debug\lib\enchant\libenchant_myspell.dll
> > enchant\bin\Debug\libenchant_ispell.dll
> > ---->abiword\msvc2008\Debug\lib\enchant\libenchant_ispell.dll
> > enchant\bin\Debug\libenchant.dll---->
> > abiword\msvc2008\Debug\bin\ibenchant.dll
> >
> > 1.8 Test in Linux
> > I have test the Enchant module in RedHat.  It works fine for me.
> >
> > 2 Call the Hyphenation function in Abiword.
> >        Split run to split word and keep the format
> >        Find split info
> >        Deal with user's operation(select, delete, cut, paste)
> >
> > Main Goal: call hyphenation module of enchant to display the
> > hyphenation result in abiword. After user's operation, refresh the
> > hyphenation-result accordingly include user adding new word, delete
> > word, copy word, cut word
> >
> > The main code is adding in the format function in LineBreaker.h(cpp)
> > // find the split point
> > while (pRunToBump && pLine->getNumRunsInLine() && (pLine->getLastRun()
> > != m_pLastRunToKeep))
> >                {
> >                        UT_ASSERT(pRunToBump->getLine() == pLine);
> >                        if(!pLine->removeRun(pRunToBump))
> >                        {
> >                                pRunToBump->setLine(NULL);
> >                        }
> >                        UT_ASSERT(pLine->getLastRun()->getType() != FPRUN_ENDOFPARAGRAPH);
> >                        if(pLine->getLastRun()->getType() == FPRUN_ENDOFPARAGRAPH)
> >                        {
> >                                fp_Run * pNuke = pLine->getLastRun();
> >                                pLine->removeRun(pNuke);
> >                        }
> >                pRunToBump->printText();  //trace out debug message & run two time
> >                pNextLine->insertRun(pRunToBump);  //called when create new line
> >                        // to get the split word
> >                        if (!(pRunToBump->getPrevRun() && pLine->getNumRunsInLine() &&
> > (pLine->getLastRun() != m_pLastRunToKeep)))
> >                        {
> >                                pRunToSplit=pRunToBump;
> >                                PD_StruxIterator text(pRunToBump->getBlock()->getStruxDocHandle(),
> >                                        pRunToBump->getBlockOffset() + fl_BLOCK_STRUX_OFFSET);
> >
> >                                text.setUpperLimit(text.getPosition() + pRunToBump->getLength() - 1);
> >                                UT_ASSERT_HARMLESS( text.getStatus() == UTIter_OK );
> >                                UT_UTF8String sTmp;
> >                                while(text.getStatus() == UTIter_OK)
> >                                {
> >                                        UT_UCS4Char c = text.getChar();
> >                                        UT_DEBUGMSG(("| %d |",c));
> >                                        if(c >= ' ' && c <128)
> >                                                sTmp +=  static_cast<char>(c);
> >                                        ++text;
> >                                }
> >                                UT_DEBUGMSG(("The Split Text |%s| \n",sTmp.utf8_str()));
> >                                if(sTmp.utf8_str()!=0)
> >                                {
> >                    pWordToSplit=sTmp;
> >                                        UT_DEBUGMSG(("wordToSplit |%s| \n",pWordToSplit.utf8_str()));
> >                                }
> >                        }
> >                        pRunToBump = pRunToBump->getPrevRun();
> >                        UT_DEBUGMSG(("Next runToBump %x \n",pRunToBump));
> >                }
> >        }
> >        //modify src/text/fmt/xp/fb_LineBreaker.cpp to place hypernation points
> >        //spit the word
> >        if(pWordToSplit.length()!=NULL)
> >        {
> >        pWordHyphenationResult=pBlock->_hyphenateWord(pWordToSplit.ucs4_str().ucs4_str(),0,0);
> >                int tickLeft=pLine->getAvailableWidth();
> >                if (pWordHyphenationResult && *pWordHyphenationResult){
> >                        gchar *c = g_ucs4_to_utf8(pWordHyphenationResult, -1, NULL, NULL, NULL);
> >                        for(int index=g_utf8_strlen(c,NULL);index>=0;--index)
> >                        {
> >                                if(pWordHyphenationResult[index]=='-'&&index<tickLeft)
> >                                {
> >                                        pBreakPoint=index;
> >                                        fp_TextRun* textout=static_cast<fp_TextRun*>(pRunToSplit);
> >                                        textout->split(pBreakPoint);
> >                                }
> >                        }
> >                }
> >        }
> >
> >
> > 3 Simple Implementation of Chinese Spell-Check in Enchant
> > After GSoc2011, I would like to add Chinese Spell-Check in Enchant.
> > Chinese Spell-Check is also a very important issue in Word-Processor.
> > I found some lib to support; I just build a simple framework since
> > time is limit.
> > The main function:
> >
> >
> > 4 Code Re-factor and debug
> > 5. Still to improve
> >        Code Re-Factor
> >        Deal with more language
> >        include more user's operation(such as operate with picture may
> > influence the hyphenation result)
> >
> > more:
> >        Fully Support hyphenation in Abiword
> >        Support more language
> >        More tests in Linux(Unix)
> >        Finish the Implementation of Chinese Spell-Check in Enchant
> >        User interface about Hyphenation
> >
>
>
>
> --
> Kathiravelu Pradeeban.
> Software Engineer.
> WSO2 Inc.
>
> Blog: [Llovizna] http://kkpradeeban.blogspot.com/
Received on Tue Aug 16 11:06:50 2011

This archive was generated by hypermail 2.1.8 : Tue Aug 16 2011 - 11:06:50 CEST