Thread: Stemming not working with tsearch2() function

Stemming not working with tsearch2() function

From
"psql psql"
Date:
Anyone know why to_tsvector('sausages') might return "sausages" while to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the tsearch2() function used in my trigger is not specifying "default" when creating the tsvector since the words be put into the vector are not correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other info:

postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and prior to that from a 7.x version although i reinstalled tsearch2)

SELECT * from pg_ts_cfg;
ts_name     | prs_name |    locale    
-----------------+----------+--------------
default_russian | default  | ru_RU.KOI8-R
utf8_russian    | default  | ru_RU.UTF-8
simple          | default  | en_US.UTF-8
default         | default  | en_US.UTF-8


lc_collate                      | en_US.UTF-8            
lc_ctype                        | en_US.UTF-8         
lc_messages                     | en_US.UTF-8                  
lc_monetary                     | en_US.UTF-8                     
lc_numeric                      | en_US.UTF-8                      
lc_time                         | en_US.UTF-8        

Re: Stemming not working with tsearch2() function

From
Oleg Bartunov
Date:
On Mon, 30 Apr 2007, psql psql wrote:

> Anyone know why to_tsvector('sausages') might return "sausages" while
> to_tsvector('default','sausages') correctly returns "sausag"?
>
> This is causing me a fairly major headache. I am guessing that the
> tsearch2() function used in my trigger is not specifying "default" when
> creating the tsvector since the words be put into the vector are not
> correctly stemmed (if that is the correct term).
>
> I figure this may be something to do with locale settings, other info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

>
> postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and
> prior to that from a 7.x version although i reinstalled tsearch2)
>
> SELECT * from pg_ts_cfg;
> ts_name     | prs_name |    locale
> -----------------+----------+--------------
> default_russian | default  | ru_RU.KOI8-R
> utf8_russian    | default  | ru_RU.UTF-8
> simple          | default  | en_US.UTF-8
> default         | default  | en_US.UTF-8
>
>
> lc_collate                      | en_US.UTF-8
> lc_ctype                        | en_US.UTF-8
> lc_messages                     | en_US.UTF-8
> lc_monetary                     | en_US.UTF-8
> lc_numeric                      | en_US.UTF-8
> lc_time                         | en_US.UTF-8
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Stemming not working with tsearch2() function

From
"psql psql"
Date:
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:

> Anyone know why to_tsvector('sausages') might return "sausages" while
> to_tsvector('default','sausages') correctly returns "sausag"?
>
> This is causing me a fairly major headache. I am guessing that the
> tsearch2() function used in my trigger is not specifying "default" when
> creating the tsvector since the words be put into the vector are not
> correctly stemmed (if that is the correct term).
>
> I figure this may be something to do with locale settings, other info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Thanks for the link.

select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8


That's helped me understand that the default config used by the tsearch2() function is not 'default' but 'simple' but I still don't understand why 'simple' is not working when both default and simple have the same locale set in pg_ts_cfg (en_US.UTF-8). Am i missing something?

>
> postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and
> prior to that from a 7.x version although i reinstalled tsearch2)
>
> SELECT * from pg_ts_cfg;
> ts_name     | prs_name |    locale
> -----------------+----------+--------------
> default_russian | default  | ru_RU.KOI8-R
> utf8_russian    | default  | ru_RU.UTF-8
> simple          | default  | en_US.UTF-8
> default         | default  | en_US.UTF-8
>
>
> lc_collate                      | en_US.UTF-8
> lc_ctype                        | en_US.UTF-8
> lc_messages                     | en_US.UTF-8
> lc_monetary                     | en_US.UTF-8
> lc_numeric                      | en_US.UTF-8
> lc_time                         | en_US.UTF-8
>

        Regards,
                Oleg
______________________________
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Stemming not working with tsearch2() function

From
Oleg Bartunov
Date:
On Mon, 30 Apr 2007, psql psql wrote:

> On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>
>> On Mon, 30 Apr 2007, psql psql wrote:
>>
>> > Anyone know why to_tsvector('sausages') might return "sausages" while
>> > to_tsvector('default','sausages') correctly returns "sausag"?
>> >
>> > This is causing me a fairly major headache. I am guessing that the
>> > tsearch2() function used in my trigger is not specifying "default" when
>> > creating the tsvector since the words be put into the vector are not
>> > correctly stemmed (if that is the correct term).
>> >
>> > I figure this may be something to do with locale settings, other info:
>>
>> it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
>
>
> Thanks for the link.
>
> select * from pg_ts_cfg where oid=show_curcfg();
> ts_name | prs_name | locale
> ---------+----------+-------------
> simple | default | en_US.UTF-8
>
>
> That's helped me understand that the default config used by the
> tsearch2() function
> is not 'default' but 'simple' but I still don't understand why 'simple' is
> not working when both default and simple have the same locale set in
> pg_ts_cfg
> (en_US.UTF-8). Am i missing something?

at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html

>
>>
>> > postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6
>> and
>> > prior to that from a 7.x version although i reinstalled tsearch2)
>> >
>> > SELECT * from pg_ts_cfg;
>> > ts_name     | prs_name |    locale
>> > -----------------+----------+--------------
>> > default_russian | default  | ru_RU.KOI8-R
>> > utf8_russian    | default  | ru_RU.UTF-8
>> > simple          | default  | en_US.UTF-8
>> > default         | default  | en_US.UTF-8
>> >
>> >
>> > lc_collate                      | en_US.UTF-8
>> > lc_ctype                        | en_US.UTF-8
>> > lc_messages                     | en_US.UTF-8
>> > lc_monetary                     | en_US.UTF-8
>> > lc_numeric                      | en_US.UTF-8
>> > lc_time                         | en_US.UTF-8
>> >
>>
>>         Regards,
>>                 Oleg
>> ______________________________
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Stemming not working with tsearch2() function

From
"psql psql"
Date:


On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:

> On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>
>> On Mon, 30 Apr 2007, psql psql wrote:
>>
>> > Anyone know why to_tsvector('sausages') might return "sausages" while
>> > to_tsvector('default','sausages') correctly returns "sausag"?
>> >
>> > This is causing me a fairly major headache. I am guessing that the
>> > tsearch2() function used in my trigger is not specifying "default" when
>> > creating the tsvector since the words be put into the vector are not
>> > correctly stemmed (if that is the correct term).
>> >
>> > I figure this may be something to do with locale settings, other info:
>>
>> it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
>
>
> Thanks for the link.
>
> select * from pg_ts_cfg where oid=show_curcfg();
> ts_name | prs_name | locale
> ---------+----------+-------------
> simple | default | en_US.UTF-8
>
>
> That's helped me understand that the default config used by the
> tsearch2() function
> is not 'default' but 'simple' but I still don't understand why 'simple' is
> not working when both default and simple have the same locale set in
> pg_ts_cfg
> (en_US.UTF-8). Am i missing something?

at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html

Ah thanks.
Is tsearch2() hard coded to use 'simple', or could i delete 'simple' and just use 'default' somehow?
It's not a big issue if I have to use simple, I will just have to redeploy some code that is currently using 'default'.
Matt.

Re: Stemming not working with tsearch2() function

From
Oleg Bartunov
Date:
On Mon, 30 Apr 2007, psql psql wrote:

> On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>
>> On Mon, 30 Apr 2007, psql psql wrote:
>>
>> > On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
>> >>
>> >> On Mon, 30 Apr 2007, psql psql wrote:
>> >>
>> >> > Anyone know why to_tsvector('sausages') might return "sausages" while
>> >> > to_tsvector('default','sausages') correctly returns "sausag"?
>> >> >
>> >> > This is causing me a fairly major headache. I am guessing that the
>> >> > tsearch2() function used in my trigger is not specifying "default"
>> when
>> >> > creating the tsvector since the words be put into the vector are not
>> >> > correctly stemmed (if that is the correct term).
>> >> >
>> >> > I figure this may be something to do with locale settings, other
>> info:
>> >>
>> >> it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
>> >
>> >
>> > Thanks for the link.
>> >
>> > select * from pg_ts_cfg where oid=show_curcfg();
>> > ts_name | prs_name | locale
>> > ---------+----------+-------------
>> > simple | default | en_US.UTF-8
>> >
>> >
>> > That's helped me understand that the default config used by the
>> > tsearch2() function
>> > is not 'default' but 'simple' but I still don't understand why 'simple'
>> is
>> > not working when both default and simple have the same locale set in
>> > pg_ts_cfg
>> > (en_US.UTF-8). Am i missing something?
>>
>> at present, having several configurations matching the same locale leads
>> to unpredictable results. Leave only one.
>> In 8.3 we have special flag to mark fts config
>> which could be selectable as default.
>> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html
>
>
> Ah thanks.
> Is tsearch2() hard coded to use 'simple', or could i delete 'simple'
> and just use 'default'
> somehow?
> It's not a big issue if I have to use simple, I will just have to redeploy
> some code that is currently using 'default'.
> Matt.

Matt, just update table to save simple cfg for future

  update pg_ts_cfg set locale='some_en_US.UTF-8' where ts_name='simple';

>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83