Re: Tsearch vector not stored by update/set - Mailing list pgsql-general

From Andrew J. Kopciuch
Subject Re: Tsearch vector not stored by update/set
Date
Msg-id 200503211647.35493.akopciuch@bddf.ca
Whole thread Raw
In response to Re: Tsearch vector not stored by update/set  ("Justin L. Kennedy" <jk289@mail.gatech.edu>)
Responses Re: Tsearch vector not stored by update/set
List pgsql-general
> It seems to be selective of only numbers, words with numbers in them,
> words with '.' or '/' characters.  It completely ignores any other words
> or text in any of the 3 fields.
>

This is a very big hint to your problem.

> You requested the pg_ts_* tables:
> On the Linux-redhat, pg7.3.2
>
> pg_ts_cfgmap(73 rows)
> ts_name  tok_alias dict_name
> "default" "lword" "{en_stem}"
> "default" "nlword" "{simple}"
> "default" "word" "{simple}"
> "default" "email" "{simple}"
> "default" "url" "{simple}"
> "default" "host" "{simple}"
> "default" "sfloat" "{simple}"
> "default" "version" "{simple}"
> "default" "part_hword" "{simple}"
> "default" "nlpart_hword" "{simple}"
> "default" "lpart_hword" "{en_stem}"
> "default" "hword" "{simple}"
> "default" "lhword" "{en_stem}"
> "default" "nlhword" "{simple}"
> "default" "uri" "{simple}"
> "default" "file" "{simple}"
> "default" "float" "{simple}"
> "default" "int" "{simple}"
> "default" "uint" "{simple}"
> "default_russian" "lword"  "{en_stem}"
> "default_russian" "nlword" "{ru_stem}"
> "default_russian" "word" "{ru_stem}"
> "default_russian" "email" "{simple}"
> "default_russian" "url" "{simple}"
> "default_russian" "host" "{simple}"
> "default_russian" "sfloat" "{simple}"
> "default_russian" "version" "{simple}"
> "default_russian" "part_hword" "{simple}"
> "default_russian" "nlpart_hword" "{ru_stem}"
> "default_russian" "lpart_hword" "{en_stem}"
> "default_russian" "hword" "{ru_stem}"
> "default_russian" "lhword" "{en_stem}"
> "default_russian" "nlhword" "{ru_stem}"
> "default_russian" "uri" "{simple}"
> "default_russian" "file" "{simple}"
> "default_russian" "float" "{simple}"
> "default_russian" "int" "{simple}"
> "default_russian" "uint" "{simple}"
> "simple" "lword" "{simple}"
> "simple" "nlword" "{simple}"
> "simple" "word" "{simple}"
> "simple" "email" "{simple}"
> "simple" "url" "{simple}"
> "simple" "host" "{simple}"
> "simple" "sfloat" "{simple}"
> "simple" "version" "{simple}"
> "simple" "part_hword" "{simple}"
> "simple" "nlpart_hword" "{simple}"
> "simple" "lpart_hword" "{simple}"
> "simple" "hword" "{simple}"
> "simple" "lhword" "{simple}"
> "simple" "nlhword" "{simple}"
> "simple" "uri" "{simple}"
> "simple" "file" "{simple}"
> "simple" "float" "{simple}"
> "simple" "int" "{simple}"
> "simple" "uint" "{simple}"
> "default_english" "url" "{simple}"
> "default_english" "host" "{simple}"
> "default_english" "sfloat" "{simple}"
> "default_english" "uri" "{simple}"
> "default_english" "int" "{simple}"
> "default_english" "float" "{simple}"
> "default_english" "email" "{simple}"
> "default_english" "word" "{simple}"
> "default_english" "hword" "{simple}"
> "default_english" "nlword" "{simple}"
> "default_english" "nlpart_hword" "{simple}"
> "default_english" "part_hword" "{simple}"
> "default_english" "nlhword" "{simple}"
> "default_english" "file" "{simple}"
> "default_english" "uint" "{simple}"
> "default_english" "version" "{simple}"
>

I am assuming that your cluster is running created with en_US for the locale,
and that you have set the matching tsearch2 configuration to be your default
(Or curcfg for each process running).

If you look at your config mappings for the "default_english" you will notice
that you have 16 records, as opposed to 19 records like every other
configuration mapping.  From some more in depth observations, I noticed you
are missing entries for the 'lword', 'lhword' and ''lpart_hword'.  That means
that tokens found to be of types 'Latin Words', 'Latin Hyphenated Words' and
'Latin Part Hyphenated Words' are just dropped because you do not have a
configuration mapping set up for them.

This is why only numbers (or other lexem types) would show (They are returned
as lexem_types : int, uint, float, url, etc. for which you have mappings).
Most regular words are simply discarded due to missing entries.  If you fix
your configurations the triggers should work properly.

Your examples worked before, simply because you specified the 'default'
configuration on the insert statement.  Which is not the same as the
'default_english' configuration which is used by the trigger based on your
server encoding (en_US).

> I have made a single change to it from its default installation.  When I
> was working with the rank_cd() function on the 8.0.0 machine, it had
> errors due to a non-existant english stop file, so I changed
> pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'.  The indexing
> system was working fine both before and after the change to the pg_ts_dict
> table.  I also propagated the change to the 7.3.2 machine even though it
> didn't have the error message (the stop file didn't exist on that computer
> either, but it never gave an error message about it).

I would not recommend this.  The stop file should is most likely on the system
somewhere.  It will change depending on your installation.  Look for
english.stop on the computer(s).  If it is not there, you can grab the one
out of the source distribution and put it wherever you want.  Then just
update the settings to the location you used.


good luck,


Andy

pgsql-general by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: Tracking row updates - race condition
Next
From: "Sean Davis"
Date:
Subject: Re: Time Stamp