Re: TSearch2: Problems with compound words and stop words - Mailing list pgsql-general
From | Timo Haberkern |
---|---|
Subject | Re: TSearch2: Problems with compound words and stop words |
Date | |
Msg-id | 419B5B76.5080300@emedia-office.de Whole thread Raw |
In response to | Re: TSearch2: Problems with compound words and stop words (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: TSearch2: Problems with compound words and stop words
|
List | pgsql-general |
sorry for the late answer, i was on holyday, see my remarks below Oleg Bartunov wrote: > On Fri, 5 Nov 2004, Timo Haberkern wrote: > >> Oleg, >> >> i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword >> patch yesterday. The configuration changed a little bit but the >> result is the same. I get no compound words. I'm using the locale >> de_DE with encoding ISO8859-1 for the database. >> >> I think i spell is working correctly except the compound words. If i try >> >> SELECT lexize('de_ispell', 'springt') >> >> i get >> >> lexize >> {springen,springen} >> >> which seems correct. >> >> >> But a SELECT lexize('de_ispell', 'Autobahn') >> >> results in >> >> lexize >> {autobahn} >> >> i would expect {auto,bahn, autobahn} > > > Hmm, have you checked 'Autobahn' in ispell dictionary ? Does > dictionary you used supports 'Z' flag for compound words ? Autobahn is in the ispell dictionary. What does a ispell dictionary need to support the Z flag??? Timo > > >> >> The new configuration after the compound word patch: >> > > Seems you overestimate my capabilities :) > > >> >> Actions dict_name >> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1> >> dict_init >> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1> >> dict_initoption >> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1> >> dict_lexize >> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1> >> dict_comment >> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1> >> Edit >> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> Delete >> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> simple dex_init(text) /NULL/ >> dex_lexize(internal,internal,integer) Simple example of dictionary. >> Edit >> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> Delete >> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> en_stem snb_en_init(text) >> /usr/local/pgsql/share/contrib/english.stop >> snb_lexize(internal,internal,integer) English Stemmer. Snowball. >> Edit >> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> Delete >> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> ru_stem snb_ru_init(text) >> /usr/local/pgsql/share/contrib/russian.stop >> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. >> Edit >> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> Delete >> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> ispell_template spell_init(text) /NULL/ >> spell_lexize(internal,internal,integer) ISpell interface. Must >> have .dict and .aff files >> Edit >> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> Delete >> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> synonym syn_init(text) /NULL/ >> syn_lexize(internal,internal,integer) Example of synonym dictionary >> Edit >> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> Delete >> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> >> de_ispell spell_init(text) >> DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict", >> AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff", >> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" >> spell_lexize(internal,internal,integer) /NULL/ >> >> >> >> Timo >> >> >> Oleg Bartunov wrote: >> >>> Timo, >>> >>> please, check you apply patch for compound word support. >>> What is version of postgresql ? >>> Does ispell dict works for non-compound words ? >>> >>> Oleg >>> >>> On Fri, 5 Nov 2004, Timo Haberkern wrote: >>> >>>> Hi there, >>>> >>>> i have some troubles with my TSearch2 Installation. I have done this >>>> installation as described in >>>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words >>>> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words> >>>> >>>> I used the german myspell dictionary from >>>> http://lingucomponent.openoffice.org/spell_dic.html and converted >>>> it with >>>> my2ispell >>>> >>>> Nearly everything is working fine so far, except two problems: >>>> >>>> 1.) The stopword-file seems to be ignored: If i try it with SELECT >>>> to_tsvector("default_german", "ein Haus") i get "ein":1 "haus":2 >>>> >>>> ein should be a Stopword for german (and is defined the german.stop >>>> file as >>>> well) >>>> >>>> 2.) The compound words feature doesn"t work too. I have tried a lot >>>> of words, >>>> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german", >>>> "Fehlermeldung") >>>> i only get >>>> "fehlermeldung":1 but i would expect "fehler" and "meldung" as >>>> seperated >>>> entries. Is there anything wrong with the dictonary or my >>>> configuration? >>>> >>>> >>>> My current configuration: >>>> >>>> pg_ts_cfg: >>>> >>>> default default C >>>> default_russian default ru_RU.KOI8-R >>>> simple default NULL >>>> default_german default de_DE.ISO8859-1 >>>> pg_ts_cfgmap: >>>> >>>> default_german host {simple} >>>> default_german hword {simple} >>>> default_german int {simple} >>>> default_german nlhword {simple} >>>> default_german nlpart_hword {simple} >>>> default_german nlword {simple} >>>> default_german part_hword {simple} >>>> default_german sfloat {simple} >>>> default_german uint {simple} >>>> default_german uri {simple} >>>> default_german url {simple} >>>> default_german version {simple} >>>> default_german word {simple} >>>> default_german lpart_hword {de_ispell,german_snowball} >>>> default_german lword {de_ispell,german_snowball} >>>> default_german lhword {de_ispell,german_snowball} >>>> >>>> >>>> pg_ts_dict: >>>> >>>> de_ispell | 17166 | >>>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict", >>>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff", >>>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" >>>> | 17167 | NULL >>>> german_snowball | 17357 | NULL | 17162 | Snowball stemmer for >>>> german >>>> >>>> >>>> >>>> Can anyone help me? >>>> >>>> regards >>>> >>>> Timo >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 4: Don't 'kill -9' the postmaster >>>> >>> >>> Regards, >>> Oleg >>> _____________________________________________________________ >>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>> Sternberg Astronomical Institute, Moscow University (Russia) >>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(095)939-16-83, +007(095)939-23-83 >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 2: you can get off all lists at once with the unregister command >>> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >>> >>> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
pgsql-general by date: