Thread: TSEARCH2: disable stemming in indexes and triggers

TSEARCH2: disable stemming in indexes and triggers

From
Erwin Moller
Date:
Hi all,

I installed TSEARCH2 on Postgres8.1 (Debian).
It runs all fine.
I have 2 tables indexed, and created triggers to keep the vectorcolumns
up
to date.


However, the text I indexed is a mix of Dutch and English and German.
The default stemmingprocess is an annoyance for me.
I would like to disable it.
I found out that using 'simple' instead of 'default' when using
to_tsvector() does excactly that, but I don't know how to change my
triggers and indexes to keep doing the same (using 'simple').


My TSEARCH2 and trigger-skills are developing, but I lack the confidence
to
change thing on the site (which is live), so I thought I check in here.


I have a table named tblvacature, that contains a bunch of columns that
need
to be indexed by TSEARCH.


This is what I did (and it works, except for the fact I want to disable
stemming)


1) ALTER TABLE tblvacature ADD COLUMN idxFTI tsvector;
2) UPDATE tblvacature SET idxFTI=to_tsvector('simple',
coalesce(title,'') ||' '||
coalesce(shortintro,'') ||' '||
coalesce(werkgever,'') ||' '||
coalesce(vacaturesteller,'') ||' '||
coalesce(standplaats,'') ||' '||
coalesce(divafdelingwerkgever,'') );


3) VACUUM FULL ANALYZE;


4) CREATE INDEX idxFTIvacture_idx ON tblvacature USING gist(idxFTI);


5) VACUUM FULL ANALYZE;


6) CREATE TRIGGER tsvectorupdate_vacature BEFORE UPDATE OR INSERT ON
tblvacature FOR EACH ROW EXECUTE
PROCEDURE tsearch2(idxFTI, title, shortintro, werkgever,
vacaturesteller,
standplaats, divafdelingwerkgever);


Step 2 worked fine: no stemming.
But how do I cange my step6 (trigger) so it keeps using 'simple'?
Or do I have to change the index itself (step4) too? Or both?


Please advise.


Thanks for your time.


Regards,
Erwin Moller

-----------------------
PS: This is my first posting to pgsql-general: if I screw something up,
please let me know.



Re: TSEARCH2: disable stemming in indexes and triggers

From
Oleg Bartunov
Date:
On Thu, 31 May 2007, Erwin Moller wrote:

> Hi all,
>
> I installed TSEARCH2 on Postgres8.1 (Debian).
> It runs all fine.
> I have 2 tables indexed, and created triggers to keep the vectorcolumns
> up
> to date.
>
>
> However, the text I indexed is a mix of Dutch and English and German.
> The default stemmingprocess is an annoyance for me.
> I would like to disable it.
> I found out that using 'simple' instead of 'default' when using
> to_tsvector() does excactly that, but I don't know how to change my
> triggers and indexes to keep doing the same (using 'simple').
>
>
> My TSEARCH2 and trigger-skills are developing, but I lack the confidence
> to
> change thing on the site (which is live), so I thought I check in here.
>
>
> I have a table named tblvacature, that contains a bunch of columns that
> need
> to be indexed by TSEARCH.
>
>
> This is what I did (and it works, except for the fact I want to disable
> stemming)
>
>
> 1) ALTER TABLE tblvacature ADD COLUMN idxFTI tsvector;
> 2) UPDATE tblvacature SET idxFTI=to_tsvector('simple',
> coalesce(title,'') ||' '||
> coalesce(shortintro,'') ||' '||
> coalesce(werkgever,'') ||' '||
> coalesce(vacaturesteller,'') ||' '||
> coalesce(standplaats,'') ||' '||
> coalesce(divafdelingwerkgever,'') );
>
>
> 3) VACUUM FULL ANALYZE;
>
>
> 4) CREATE INDEX idxFTIvacture_idx ON tblvacature USING gist(idxFTI);
>
>
> 5) VACUUM FULL ANALYZE;
>
>
> 6) CREATE TRIGGER tsvectorupdate_vacature BEFORE UPDATE OR INSERT ON
> tblvacature FOR EACH ROW EXECUTE
> PROCEDURE tsearch2(idxFTI, title, shortintro, werkgever,
> vacaturesteller,
> standplaats, divafdelingwerkgever);
>
>
> Step 2 worked fine: no stemming.
> But how do I cange my step6 (trigger) so it keeps using 'simple'?
> Or do I have to change the index itself (step4) too? Or both?
>
>
> Please advise.


tsearch trigger is just an example, you could write your own function !
for example,

create function my_update() returns trigger as
$$
BEGIN
   NEW.idxFTI=to_tsvector('simple',
   coalesce(title,'') ||' '||
   coalesce(shortintro,'') ||' '||
   coalesce(werkgever,'') ||' '||
   coalesce(vacaturesteller,'') ||' '||
   coalesce(standplaats,'') ||' '||
   coalesce(divafdelingwerkgever,'') );
   RETURN NEW;
END;
$$
language plpgsql;


btw, why do you don't use setweight() function to assign different weights to
the different attributes ?


>
>
> Thanks for your time.
>
>
> Regards,
> Erwin Moller
>
> -----------------------
> PS: This is my first posting to pgsql-general: if I screw something up,
> please let me know.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: TSEARCH2: disable stemming in indexes and triggers

From
Teodor Sigaev
Date:
> I found out that using 'simple' instead of 'default' when using
> to_tsvector() does excactly that, but I don't know how to change my
> triggers and indexes to keep doing the same (using 'simple').

Suppose, your database is initialized with C locale. So, just mark
simple configuration as default:

# update pg_ts_cfg set locale=null where ts_name='default';
# update pg_ts_cfg set locale='C' where ts_name='simple';

If your locale setting is not C then mark needed configuration with your
  locale.

Re: TSEARCH2: disable stemming in indexes and triggers

From
Erwin Moller
Date:
On Thu, 2007-05-31 at 20:27, Teodor Sigaev wrote:
> > I found out that using 'simple' instead of 'default' when using
> > to_tsvector() does excactly that, but I don't know how to change my
> > triggers and indexes to keep doing the same (using 'simple').
>
> Suppose, your database is initialized with C locale. So, just mark
> simple configuration as default:
>
> # update pg_ts_cfg set locale=null where ts_name='default';
> # update pg_ts_cfg set locale='C' where ts_name='simple';

Thanks Teodor.

That did the trick. :-)
At first I didn't see a change, but after reconnecting to the database
it worked for some reason beyound my meager knowledge.

Thanks to Oleg Bartunov too for his suggestion about writing my own
procedure.
This solution seemed simpler and worked the first time right away.

Thanks for your help!

Regards,
Erwin Moller

>
> If your locale setting is not C then mark needed configuration with your
>   locale.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>