Re: Integrating OpenMedSpel word list with Abiword's spell checker

From: Vidh <vidhu2366_at_gmail.com>
Date: Mon Apr 29 2013 - 07:31:32 CEST

Hi Martin & Dom,

Are you guys available sometime on IRC to discuss this?
thanks!

On Tue, Apr 9, 2013 at 8:10 PM, Dominic Lachowicz
<domlachowicz@gmail.com> wrote:
> Hi,
>
> My thought was to create a new enchant method called "Dict
> enchant_dict_compose(Dict 1, Dict 2, ...)". What it would do is
> intersect the words in a medical dictionary with one in an (eg.)
> English or Dutch dictionary. We'd need to create an internal
> CompositeDict object which would perform some boolean operation on the
> words - eg: OR or AND. You could use this as a building block to
> support mixed-language texts (eg: English + Spanish). "Medical" here
> is really just a domain-specific language.
>
> Thanks,
> Dom
>
> On Tue, Apr 9, 2013 at 9:59 AM, Vidh <vidhu2366@gmail.com> wrote:
>> Martin & Dom,
>>
>> Any thoughts about this?
>> I would like to take this to completion as a feature for abiword.
>>
>> thanks!
>>
>> On Tue, Mar 26, 2013 at 11:53 PM, Vidh <vidhu2366@gmail.com> wrote:
>>> +screenshot
>>>
>>> On Tue, Mar 26, 2013 at 11:52 PM, Vidh <vidhu2366@gmail.com> wrote:
>>>> Martin,
>>>>
>>>> Thanks!
>>>>
>>>> Both the features you have listed works.
>>>> Since Aspell already is built for spell check and suggestion, once the
>>>> word list is integrated, these are automatically taken care of.
>>>>
>>>> Please check the screen shot attached showing the functional state.
>>>>
>>>> I want to focus on taking this work to closure and possible extending
>>>> it as required.
>>>> I would love to know what Dom thinks abt this and hear his suggestions.
>>>>
>>>> thanks & regards,
>>>>
>>>> On Tue, Mar 26, 2013 at 6:10 PM, Martin Sevior <msevior@gmail.com> wrote:
>>>>> HI Vidh,
>>>>>
>>>>> Dom is more the expert on openmedspell and spell checking than me.
>>>>> Would you like to comment Dom?
>>>>>
>>>>> At first glance this looks really impressive. I guess the next step
>>>>> would be to type in some words that almost match those in openmedspell
>>>>> and see if they are
>>>>> 1. underlined
>>>>> 2. Have the correct openmedspell word offered as a suggestion from the
>>>>> drop down list.
>>>>>
>>>>> Congrats on making such fast progress!
>>>>>
>>>>> Cheers
>>>>>
>>>>> Martin
>>>>>
>>>>> On Mon, Mar 25, 2013 at 8:11 PM, Vidh <vidhu2366@gmail.com> wrote:
>>>>>> Hi Martin,
>>>>>> Any thoughts on this?
>>>>>>
>>>>>> On Sat, Mar 23, 2013 at 12:16 AM, Vidh <vidhu2366@gmail.com> wrote:
>>>>>>> Integrating OpenMedSpel word list with Abiword's spell checker
>>>>>>> ====================================================
>>>>>>>
>>>>>>> I tried doing random things related to Abiword in order to relate with
>>>>>>> source code.
>>>>>>> I got things moving on this GSOC idea:
>>>>>>> http://www.abisource.com/wiki/Google_Summer_of_Code_2013#OpenMedSpel_plugin_for_AbiWord
>>>>>>>
>>>>>>> I am writing this mail to report what I have done related to this idea
>>>>>>> and to present some results.
>>>>>>>
>>>>>>> The goal is to make Abiword simultaneously spell check against
>>>>>>> OpenMedSpel and the default language dictionary.
>>>>>>> I have achieved the same result by making few changes to "enchant" and "aspell".
>>>>>>>
>>>>>>> Here are the steps I followed:
>>>>>>> ========================
>>>>>>> 1)Downloaded OpenMedSpel wordlist from:
>>>>>>> http://www.e-medtools.com/openmedspel100.zip
>>>>>>>
>>>>>>> This had the following files:
>>>>>>>
>>>>>>> vidhoon@vidhoonv:/usr/lib/aspell$ ls -l ~/Downloads/openmedspel100
>>>>>>> total 2968
>>>>>>> -rw-rw-r-- 1 vidhoon vidhoon 607041 Feb 14 2007 OpenMedSpel 100.csv
>>>>>>> -rw-rw-r-- 1 vidhoon vidhoon 558312 Mar 14 01:38 OpenMedSpel 100.txt
>>>>>>> -rw-rw-r-- 1 vidhoon vidhoon 1169 Feb 14 2007 README_OpenMedSpel.txt
>>>>>>>
>>>>>>> 2) The txt file in the download had DOS characters in it. Hence, I had
>>>>>>> to do this:
>>>>>>>
>>>>>>> $dos2unix OpenMedSpel\ 100.txt OpenMedSpelunix.txt
>>>>>>>
>>>>>>> 3) Now I created a wordlist for aspell using the command below:
>>>>>>>
>>>>>>> $aspell --lang=en create master ./openmedspel.rws <
>>>>>>> ~/Downloads/openmedspel100/OpenMedSpel\ 100.txt
>>>>>>>
>>>>>>> (why I chose Aspell? - I could easily find relevant documentation to
>>>>>>> solve this problem)
>>>>>>> This link was really helpful:
>>>>>>> http://aspell.net/0.50-doc/man-html/5_Working.html#SECTION00640000000000000000
>>>>>>>
>>>>>>> 4) After this, I located Aspell in my local system and found it at:
>>>>>>> /usr/lib/aspell
>>>>>>>
>>>>>>> In this location I could find all "multi" dictionary files and "rws"
>>>>>>> lists included in them.
>>>>>>> I copied the openmedspel.rws list to this location.
>>>>>>>
>>>>>>> 5) I took the en_US.multi (since OpenMedSpel wordlist is also USA
>>>>>>> english) and found it to contain "en_US-wo_accents.multi"
>>>>>>> Then I opened en_US-wo_accents.multi and added the new wordlist
>>>>>>> created as shown below:
>>>>>>>
>>>>>>> vidhoon@vidhoonv:/usr/lib/aspell$ cat en_US-wo_accents.multi
>>>>>>> # Generated with Aspell Dicts "proc" script version 0.60.2
>>>>>>> add en-common.rws
>>>>>>> add en_US-wo_accents-only.rws
>>>>>>> add openmedspel.rws
>>>>>>>
>>>>>>> 6) I understood that Aspell is exercised through "enchant" by Abiword
>>>>>>> and located enchant in my local system:
>>>>>>> /usr/share/enchant
>>>>>>>
>>>>>>> 7) I found the "private" enchant.ordering file that determines the way
>>>>>>> spell checker and dictionaries are chosen for each language.
>>>>>>> I learned more about this file from this link:
>>>>>>> https://listman.redhat.com/archives/xdg-list/2003-July/msg00188.htmlhttp://accounts.unimelb.edu.au
>>>>>>>
>>>>>>> For time being, I hacked this file to place "aspell" ahead of
>>>>>>> "myspell" from the ordering so that aspell gets picked for en_US and
>>>>>>> this would contain OpenMedSpel wordlist also.
>>>>>>>
>>>>>>> vidhoon@vidhoonv:/usr/lib/aspell$ cat /usr/share/enchant/enchant.ordering
>>>>>>> *:aspell,myspell,ispell //-> order changed in this line
>>>>>>> fi:voikko,ispell,myspell,aspell
>>>>>>> fi_FI:voikko,ispell,myspell,aspell
>>>>>>> he:hspell,myspell
>>>>>>> he_IL:hspell,myspell
>>>>>>> yi:uspell
>>>>>>> tr:zemberek
>>>>>>> tr_TR:zemberek
>>>>>>>
>>>>>>> Now I can see that abiword spell check does not underline words from
>>>>>>> OpenMedSpel list which indicates that the goal is achieved. That is,
>>>>>>> Abiword does spell check for English US - normal words and OpenSpelMed
>>>>>>> words.
>>>>>>>
>>>>>>> I have attached screenshot of abiword illustrating this.
>>>>>>>
>>>>>>> So coming to the next steps,
>>>>>>> =======================
>>>>>>> 1)I would like to know how these steps involving enchant and aspell
>>>>>>> (purely! abiword need not even be compiled) can be transformed into an
>>>>>>> abiword plugin.
>>>>>>> 2)Have I achieved the required result in the right way?
>>>>>>>
>>>>>>> Or is it that a UI based option from Abiword for users to add their
>>>>>>> own word lists for a language dictionary is needed?
>>>>>>> I think this might fit in the "plugin" definition!
>>>>>>>
>>>>>>> Any help, suggestions and thoughts appreciated.
>>>>>>> thanks & regards!
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Vidhoon Viswanathan
>>>>>>> +91 7760711773
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Vidhoon Viswanathan
>>>>>> +91 7760711773
>>>>
>>>>
>>>>
>>>> --
>>>> Vidhoon Viswanathan
>>>> +91 7760711773
>>>
>>>
>>>
>>> --
>>> Vidhoon Viswanathan
>>> +91 7760711773
>>
>>
>>
>> --
>> Vidhoon Viswanathan
>> +91 7760711773
>
>
>
> --
> "I like to pay taxes. With them, I buy civilization." -- Oliver Wendell Holmes

-- 
Vidhoon Viswanathan
+91 7760711773
Received on Mon Apr 29 07:31:43 2013

This archive was generated by hypermail 2.1.8 : Mon Apr 29 2013 - 07:31:43 CEST