Thread: Tsearch2 crashes my backend, ouch !
Hello, I have just ditched Gentoo and installed a brand new kubuntu system (was tired of the endless compiles). I have a problem with crashing tsearch2. This appeared both on Gentoo and the brand new kubuntu. I will describe all my install procedure, maybe I'm doing something wrong. Cluster is newly created and empty. initdb was done with UNICODE encoding & locales. # from postgresql.conf # These settings are initialized by initdb -- they might be changed lc_messages = 'fr_FR.UTF-8' # locale for system error message strings lc_monetary = 'fr_FR.UTF-8' # locale for monetary formatting lc_numeric = 'fr_FR.UTF-8' # locale for number formatting lc_time = 'fr_FR.UTF-8' # locale for time formatting peufeu@apollo13:~$ locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" etc... First import needed .sql files from contrib and check that the default tsearch2 config works for English $ createdb -U postgres test $ psql -U postgres test <tsearch2.sql and other contribs I use $ psql -U postgres test test=# select lexize( 'en_stem', 'flying' ); lexize -------- {fli} test=# select to_tsvector('default', 'flying ducks'); to_tsvector ------------------ 'fli':1 'duck':2 OK, seems to work very nicely, now install French. Since this is Kubuntu there is no source, so download source, then : - apply patch_tsearch_snowball_82 from tsearch2 website ./configure --prefix=/usr/lib/postgresql/8.2/ --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python cd contrib/tsearch2 make cd gendict (copy french stem.c and stem.h from the snowball website) ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h -C'Snowball stemmer for French' cd ../../dict_fr make clean && make sudo make install Now we have : /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql '/usr/share/postgresql/8.2/contrib' /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 '/usr/lib/postgresql/8.2/lib/dict_fr.so' Okay... - download and install UTF8 french dictionaries from http://www.davidgis.fr/download/tsearch2_french_files.zip and put them in contrib directory (the files delivered by debian package ifrench are ISO8859, bleh) - import french shared libs psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql Then : test=# select lexize( 'en_stem', 'flying' ); lexize -------- {fli} And : test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; dict_name | dict_init | dict_initoption | dict_lexize | dict_comment -----------+-----------------------+----------------------+---------------------------------------+----------------------------- en_stem | snb_en_init(internal) | contrib/english.stop | snb_lexize(internal,internal,integer) | English Stemmer. Snowball. fr | dinit_fr(internal) | | snb_lexize(internal,internal,integer) | Snowball stemmer for French test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly BLAM ! Try something else : test=# UPDATE pg_ts_dict SET dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE dict_name = 'fr'; UPDATE 1 test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly Try other options : dict_name | fr_ispell dict_init | spell_init(internal) dict_initoption | DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop" dict_lexize | spell_lexize(internal,internal,integer) dict_comment | test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', 'voyageur' ); -[ RECORD 1 ]------- lexize | {travel} lexize | {voyageuse} Now it works (kinda) but stemming doesn't stem for French (since snowball is out). It should return 'voyage' (=travel) instead of 'voyageuse' (=female traveler) That's now what I want ; i want to use snowball to stem French words. I'm going to make a debug build and try to debug it, but if anyone can help, you're really, really welcome.
which version of pgsql exactly? Listmail wrote: > > Hello, > > I have just ditched Gentoo and installed a brand new kubuntu system > (was tired of the endless compiles). > I have a problem with crashing tsearch2. This appeared both on > Gentoo and the brand new kubuntu. > > I will describe all my install procedure, maybe I'm doing something > wrong. > > Cluster is newly created and empty. > > initdb was done with UNICODE encoding & locales. > > # from postgresql.conf > > # These settings are initialized by initdb -- they might be changed > lc_messages = 'fr_FR.UTF-8' # locale for system > error message strings > lc_monetary = 'fr_FR.UTF-8' # locale for monetary > formatting > lc_numeric = 'fr_FR.UTF-8' # locale for number > formatting > lc_time = 'fr_FR.UTF-8' # locale for time > formatting > > peufeu@apollo13:~$ locale > LANG=fr_FR.UTF-8 > LC_CTYPE="fr_FR.UTF-8" > LC_NUMERIC="fr_FR.UTF-8" > etc... > > First import needed .sql files from contrib and check that the > default tsearch2 config works for English > > $ createdb -U postgres test > $ psql -U postgres test <tsearch2.sql and other contribs I use > $ psql -U postgres test > > test=# select lexize( 'en_stem', 'flying' ); > lexize > -------- > {fli} > > test=# select to_tsvector('default', 'flying ducks'); > to_tsvector > ------------------ > 'fli':1 'duck':2 > > OK, seems to work very nicely, now install French. > Since this is Kubuntu there is no source, so download source, then : > > - apply patch_tsearch_snowball_82 from tsearch2 website > > ./configure --prefix=/usr/lib/postgresql/8.2/ > --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python > cd contrib/tsearch2 > make > cd gendict > (copy french stem.c and stem.h from the snowball website) > ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h > -C'Snowball stemmer for French' > cd ../../dict_fr > make clean && make > sudo make install > > Now we have : > > /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql > '/usr/share/postgresql/8.2/contrib' > /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 > '/usr/lib/postgresql/8.2/lib/dict_fr.so' > > Okay... > > - download and install UTF8 french dictionaries from > http://www.davidgis.fr/download/tsearch2_french_files.zip and put them > in contrib directory > (the files delivered by debian package ifrench are ISO8859, bleh) > > - import french shared libs > psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql > > Then : > > test=# select lexize( 'en_stem', 'flying' ); > lexize > -------- > {fli} > > And : > > test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; > dict_name | dict_init | dict_initoption | > dict_lexize | dict_comment > -----------+-----------------------+----------------------+---------------------------------------+----------------------------- > > en_stem | snb_en_init(internal) | contrib/english.stop | > snb_lexize(internal,internal,integer) | English Stemmer. Snowball. > fr | dinit_fr(internal) | | > snb_lexize(internal,internal,integer) | Snowball stemmer for French > > test=# select lexize( 'fr', 'voyageur' ); > server closed the connection unexpectedly > > BLAM ! Try something else : > > test=# UPDATE pg_ts_dict SET > dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE > dict_name = 'fr'; > UPDATE 1 > test=# select lexize( 'fr', 'voyageur' ); > server closed the connection unexpectedly > > Try other options : > > dict_name | fr_ispell > dict_init | spell_init(internal) > dict_initoption | > DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop" > > dict_lexize | spell_lexize(internal,internal,integer) > dict_comment | > > test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', > 'voyageur' ); > -[ RECORD 1 ]------- > lexize | {travel} > lexize | {voyageuse} > > Now it works (kinda) but stemming doesn't stem for French (since > snowball is out). It should return 'voyage' (=travel) instead of > 'voyageuse' (=female traveler) > That's now what I want ; i want to use snowball to stem French words. > > I'm going to make a debug build and try to debug it, but if anyone > can help, you're really, really welcome. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> (copy french stem.c and stem.h from the snowball website) Take french stemmer from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/stemmer/stemmer_utf8_french.tar.gz At least, it works for me. Sorry, but Snowball's interfaces are changed very quickly and unpredictable and Snowball doesn't use version mark or something similar. So, downloaded Snowball core and stemmers in different time may be incompatible :(. Our tsearch_core patch (moving tsearch into core of pgsql) solves that problem - it contains all possible snowball stemmers. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
OK, I've solved my problem... thanks for the hint ! Anyway, just to signal that tsearch2 crashes if SELECT is not granted to pg_ts_dict (other tables give a proper error message when not GRANTed).On Fri, 30 Mar 2007 13:20:30 +0200, Listmail <lists@peufeu.com> wrote: > > Hello, > > I have just ditched Gentoo and installed a brand new kubuntu system > (was tired of the endless compiles). > I have a problem with crashing tsearch2. This appeared both on Gentoo > and the brand new kubuntu. > > I will describe all my install procedure, maybe I'm doing something > wrong. > > Cluster is newly created and empty. > > initdb was done with UNICODE encoding & locales. > > # from postgresql.conf > > # These settings are initialized by initdb -- they might be changed > lc_messages = 'fr_FR.UTF-8' # locale for system > error message strings > lc_monetary = 'fr_FR.UTF-8' # locale for monetary > formatting > lc_numeric = 'fr_FR.UTF-8' # locale for number > formatting > lc_time = 'fr_FR.UTF-8' # locale for time > formatting > > peufeu@apollo13:~$ locale > LANG=fr_FR.UTF-8 > LC_CTYPE="fr_FR.UTF-8" > LC_NUMERIC="fr_FR.UTF-8" > etc... > > First import needed .sql files from contrib and check that the default > tsearch2 config works for English > > $ createdb -U postgres test > $ psql -U postgres test <tsearch2.sql and other contribs I use > $ psql -U postgres test > > test=# select lexize( 'en_stem', 'flying' ); > lexize > -------- > {fli} > > test=# select to_tsvector('default', 'flying ducks'); > to_tsvector > ------------------ > 'fli':1 'duck':2 > > OK, seems to work very nicely, now install French. > Since this is Kubuntu there is no source, so download source, then : > > - apply patch_tsearch_snowball_82 from tsearch2 website > > ./configure --prefix=/usr/lib/postgresql/8.2/ > --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python > cd contrib/tsearch2 > make > cd gendict > (copy french stem.c and stem.h from the snowball website) > ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h > -C'Snowball stemmer for French' > cd ../../dict_fr > make clean && make > sudo make install > > Now we have : > > /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql > '/usr/share/postgresql/8.2/contrib' > /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 > '/usr/lib/postgresql/8.2/lib/dict_fr.so' > > Okay... > > - download and install UTF8 french dictionaries from > http://www.davidgis.fr/download/tsearch2_french_files.zip and put them > in contrib directory > (the files delivered by debian package ifrench are ISO8859, bleh) > > - import french shared libs > psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql > > Then : > > test=# select lexize( 'en_stem', 'flying' ); > lexize > -------- > {fli} > > And : > > test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; > dict_name | dict_init | dict_initoption > | dict_lexize | dict_comment > -----------+-----------------------+----------------------+---------------------------------------+----------------------------- > en_stem | snb_en_init(internal) | contrib/english.stop | > snb_lexize(internal,internal,integer) | English Stemmer. Snowball. > fr | dinit_fr(internal) | | > snb_lexize(internal,internal,integer) | Snowball stemmer for French > > test=# select lexize( 'fr', 'voyageur' ); > server closed the connection unexpectedly > > BLAM ! Try something else : > > test=# UPDATE pg_ts_dict SET > dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE > dict_name = 'fr'; > UPDATE 1 > test=# select lexize( 'fr', 'voyageur' ); > server closed the connection unexpectedly > > Try other options : > > dict_name | fr_ispell > dict_init | spell_init(internal) > dict_initoption | > DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop" > dict_lexize | spell_lexize(internal,internal,integer) > dict_comment | > > test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', > 'voyageur' ); > -[ RECORD 1 ]------- > lexize | {travel} > lexize | {voyageuse} > > Now it works (kinda) but stemming doesn't stem for French (since > snowball is out). It should return 'voyage' (=travel) instead of > 'voyageuse' (=female traveler) > That's now what I want ; i want to use snowball to stem French words. > > I'm going to make a debug build and try to debug it, but if anyone can > help, you're really, really welcome. > > >
On Fri, 30 Mar 2007, Listmail wrote: > > OK, I've solved my problem... thanks for the hint ! > > Anyway, just to signal that tsearch2 crashes if SELECT is not granted > to pg_ts_dict (other tables give a proper error message when not GRANTed).On I don't understand this. Are sure on this ? From prompt in your select examples I see you have superuser's rights and you have successfully select from pg_ts_dict column. Oleg > Fri, 30 Mar 2007 13:20:30 +0200, Listmail <lists@peufeu.com> wrote: > >> >> Hello, >> >> I have just ditched Gentoo and installed a brand new kubuntu system >> (was tired of the endless compiles). >> I have a problem with crashing tsearch2. This appeared both on Gentoo >> and the brand new kubuntu. >> >> I will describe all my install procedure, maybe I'm doing something >> wrong. >> >> Cluster is newly created and empty. >> >> initdb was done with UNICODE encoding & locales. >> >> # from postgresql.conf >> >> # These settings are initialized by initdb -- they might be changed >> lc_messages = 'fr_FR.UTF-8' # locale for system error >> message strings >> lc_monetary = 'fr_FR.UTF-8' # locale for monetary >> formatting >> lc_numeric = 'fr_FR.UTF-8' # locale for number >> formatting >> lc_time = 'fr_FR.UTF-8' # locale for time >> formatting >> >> peufeu@apollo13:~$ locale >> LANG=fr_FR.UTF-8 >> LC_CTYPE="fr_FR.UTF-8" >> LC_NUMERIC="fr_FR.UTF-8" >> etc... >> >> First import needed .sql files from contrib and check that the >> default tsearch2 config works for English >> >> $ createdb -U postgres test >> $ psql -U postgres test <tsearch2.sql and other contribs I use >> $ psql -U postgres test >> >> test=# select lexize( 'en_stem', 'flying' ); >> lexize >> -------- >> {fli} >> >> test=# select to_tsvector('default', 'flying ducks'); >> to_tsvector >> ------------------ >> 'fli':1 'duck':2 >> >> OK, seems to work very nicely, now install French. >> Since this is Kubuntu there is no source, so download source, then : >> >> - apply patch_tsearch_snowball_82 from tsearch2 website >> >> ./configure --prefix=/usr/lib/postgresql/8.2/ >> --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python >> cd contrib/tsearch2 >> make >> cd gendict >> (copy french stem.c and stem.h from the snowball website) >> ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h -C'Snowball >> stemmer for French' >> cd ../../dict_fr >> make clean && make >> sudo make install >> >> Now we have : >> >> /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql >> '/usr/share/postgresql/8.2/contrib' >> /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 >> '/usr/lib/postgresql/8.2/lib/dict_fr.so' >> >> Okay... >> >> - download and install UTF8 french dictionaries from >> http://www.davidgis.fr/download/tsearch2_french_files.zip and put them in >> contrib directory >> (the files delivered by debian package ifrench are ISO8859, bleh) >> >> - import french shared libs >> psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql >> >> Then : >> >> test=# select lexize( 'en_stem', 'flying' ); >> lexize >> -------- >> {fli} >> >> And : >> >> test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; >> dict_name | dict_init | dict_initoption | >> dict_lexize | dict_comment >> -----------+-----------------------+----------------------+---------------------------------------+----------------------------- >> en_stem | snb_en_init(internal) | contrib/english.stop | >> snb_lexize(internal,internal,integer) | English Stemmer. Snowball. >> fr | dinit_fr(internal) | | >> snb_lexize(internal,internal,integer) | Snowball stemmer for French >> >> test=# select lexize( 'fr', 'voyageur' ); >> server closed the connection unexpectedly >> >> BLAM ! Try something else : >> >> test=# UPDATE pg_ts_dict SET >> dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE >> dict_name = 'fr'; >> UPDATE 1 >> test=# select lexize( 'fr', 'voyageur' ); >> server closed the connection unexpectedly >> >> Try other options : >> >> dict_name | fr_ispell >> dict_init | spell_init(internal) >> dict_initoption | >> DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop" >> dict_lexize | spell_lexize(internal,internal,integer) >> dict_comment | >> >> test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', >> 'voyageur' ); >> -[ RECORD 1 ]------- >> lexize | {travel} >> lexize | {voyageuse} >> >> Now it works (kinda) but stemming doesn't stem for French (since >> snowball is out). It should return 'voyage' (=travel) instead of >> 'voyageuse' (=female traveler) >> That's now what I want ; i want to use snowball to stem French words. >> >> I'm going to make a debug build and try to debug it, but if anyone >> can help, you're really, really welcome. >> >> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
>> Anyway, just to signal that tsearch2 crashes if SELECT is not granted >> to pg_ts_dict (other tables give a proper error message when not >> GRANTed).On > > I don't understand this. Are sure on this ? From prompt in your select > examples I see you have superuser's rights and you have successfully > select from pg_ts_dict column. It was tricky to reproduce... I thought I had hallucinations but here it is : - open two psql windows (one postgres user, one normal unprivileged user), see > or # in prompt for which window I use to type commands/ - first let's lock ourselves up : caillaudangers=# REVOKE select ON pg_ts_dict FROM caillaudangers ; REVOKE caillaudangers=# REVOKE select ON pg_ts_cfg FROM caillaudangers ; REVOKE caillaudangers=# REVOKE select ON pg_ts_cfgmap FROM caillaudangers ; REVOKE caillaudangers=# REVOKE select ON pg_ts_parser FROM caillaudangers ; REVOKE - then try to access : caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_dict CONTEXT: instruction SQL «select dict_init, dict_initoption, dict_lexize from public.pg_ts_dict where oid = $1» caillaudangers=# GRANT select ON pg_ts_dict TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: No dictionary with id 138493128 Strange error message ?? caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_cfg CONTEXT: instruction SQL «select prs_name from public.pg_ts_cfg where oid = $1» Proper error message now. Let's go back. caillaudangers=# REVOKE select ON pg_ts_dict FROM caillaudangers ; REVOKE Now try to select to_tsvector and each time a permission is denied, grant the needed table. caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_cfg CONTEXT: instruction SQL «select prs_name from public.pg_ts_cfg where oid = $1» caillaudangers=# GRANT select ON pg_ts_cfg TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_cfgmap CONTEXT: instruction SQL «select lt.tokid, map.dict_name from public.pg_ts_cfgmap as map, public.pg_ts_cfg as cfg, public.token_type( $1 ) as lt where lt.alias = map.tok_alias and map.ts_name = cfg.ts_name and cfg.oid= $2 order by lt.tokid desc;» caillaudangers=# GRANT select ON pg_ts_cfgmap TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_parser CONTEXT: instruction SQL «select prs_start, prs_nexttoken, prs_end, prs_lextype, prs_headline from public.pg_ts_parser where oid = $1» caillaudangers=# GRANT select ON pg_ts_parser TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. La connexion au serveur a été perdue. Tentative de réinitialisation : Echec. There it crashes. It's bizarre.
Fixed. Thanks for the report. >>> Anyway, just to signal that tsearch2 crashes if SELECT is not >>> granted to pg_ts_dict (other tables give a proper error message when >>> not GRANTed).On -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Hello Teodor, Am 2007-03-30 16:49:19, schrieb Teodor Sigaev: > Our tsearch_core patch (moving tsearch into core of pgsql) solves that > problem - it contains all possible snowball stemmers. I have problems migrating my 7.4 to 8.2 since Debian contain only 8.1. Applaying tsearch2 is strange too. Where can I get the "tsearch_core patch" patch? Greetings Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ ##################### Debian GNU/Linux Consultant ##################### Michelle Konzack Apt. 917 ICQ #328449886 50, rue de Soultz MSN LinuxMichi 0033/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)