Re: TSearch2: Problems with compound words and stop words - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: TSearch2: Problems with compound words and stop words |
Date | |
Msg-id | Pine.GSO.4.61.0411051415250.29410@ra.sai.msu.su Whole thread Raw |
In response to | Re: TSearch2: Problems with compound words and stop words (Timo Haberkern <thaberkern@emedia-office.de>) |
Responses |
Re: TSearch2: Problems with compound words and stop words
|
List | pgsql-general |
On Fri, 5 Nov 2004, Timo Haberkern wrote: > Oleg, > > i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword patch > yesterday. The configuration changed a little bit but the result is the same. > I get no compound words. I'm using the locale de_DE with encoding ISO8859-1 > for the database. > > I think i spell is working correctly except the compound words. If i try > > SELECT lexize('de_ispell', 'springt') > > i get > > lexize > {springen,springen} > > which seems correct. > > > But a SELECT lexize('de_ispell', 'Autobahn') > > results in > > lexize > {autobahn} > > i would expect {auto,bahn, autobahn} Hmm, have you checked 'Autobahn' in ispell dictionary ? Does dictionary you used supports 'Z' flag for compound words ? > > The new configuration after the compound word patch: > Seems you overestimate my capabilities :) > > Actions dict_name > <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1> > dict_init > <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1> > dict_initoption > <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1> > dict_lexize > <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1> > dict_comment > <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1> > Edit > <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > Delete > <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > simple dex_init(text) /NULL/ dex_lexize(internal,internal,integer) Simple > example of dictionary. > Edit > <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > Delete > <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > en_stem snb_en_init(text) /usr/local/pgsql/share/contrib/english.stop > snb_lexize(internal,internal,integer) English Stemmer. Snowball. > Edit > <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > Delete > <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > ru_stem snb_ru_init(text) /usr/local/pgsql/share/contrib/russian.stop > snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. > Edit > <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > Delete > <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > ispell_template spell_init(text) /NULL/ > spell_lexize(internal,internal,integer) ISpell interface. Must have > .dict and .aff files > Edit > <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > Delete > <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > synonym syn_init(text) /NULL/ syn_lexize(internal,internal,integer) > Example of synonym dictionary > Edit > <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > Delete > <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> > de_ispell spell_init(text) > DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict", > AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff", > StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" > spell_lexize(internal,internal,integer) /NULL/ > > > > Timo > > > Oleg Bartunov wrote: > >> Timo, >> >> please, check you apply patch for compound word support. >> What is version of postgresql ? >> Does ispell dict works for non-compound words ? >> >> Oleg >> >> On Fri, 5 Nov 2004, Timo Haberkern wrote: >> >>> Hi there, >>> >>> i have some troubles with my TSearch2 Installation. I have done this >>> installation as described in >>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words >>> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words> >>> >>> I used the german myspell dictionary from >>> http://lingucomponent.openoffice.org/spell_dic.html and converted it with >>> my2ispell >>> >>> Nearly everything is working fine so far, except two problems: >>> >>> 1.) The stopword-file seems to be ignored: If i try it with SELECT >>> to_tsvector("default_german", "ein Haus") i get "ein":1 "haus":2 >>> >>> ein should be a Stopword for german (and is defined the german.stop file >>> as >>> well) >>> >>> 2.) The compound words feature doesn"t work too. I have tried a lot of >>> words, >>> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german", >>> "Fehlermeldung") >>> i only get >>> "fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated >>> entries. Is there anything wrong with the dictonary or my configuration? >>> >>> >>> My current configuration: >>> >>> pg_ts_cfg: >>> >>> default default C >>> default_russian default ru_RU.KOI8-R >>> simple default NULL >>> default_german default de_DE.ISO8859-1 >>> pg_ts_cfgmap: >>> >>> default_german host {simple} >>> default_german hword {simple} >>> default_german int {simple} >>> default_german nlhword {simple} >>> default_german nlpart_hword {simple} >>> default_german nlword {simple} >>> default_german part_hword {simple} >>> default_german sfloat {simple} >>> default_german uint {simple} >>> default_german uri {simple} >>> default_german url {simple} >>> default_german version {simple} >>> default_german word {simple} >>> default_german lpart_hword {de_ispell,german_snowball} >>> default_german lword {de_ispell,german_snowball} >>> default_german lhword {de_ispell,german_snowball} >>> >>> >>> pg_ts_dict: >>> >>> de_ispell | 17166 | >>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict", >>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff", >>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" | 17167 >>> | NULL >>> german_snowball | 17357 | NULL | 17162 | Snowball stemmer for german >>> >>> >>> >>> Can anyone help me? >>> >>> regards >>> >>> Timo >>> >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Don't 'kill -9' the postmaster >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >> Sternberg Astronomical Institute, Moscow University (Russia) >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(095)939-16-83, +007(095)939-23-83 >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: