Re: Tsearch vector not stored by update/set - Mailing list pgsql-general
From | Andrew J. Kopciuch |
---|---|
Subject | Re: Tsearch vector not stored by update/set |
Date | |
Msg-id | 200503211647.35493.akopciuch@bddf.ca Whole thread Raw |
In response to | Re: Tsearch vector not stored by update/set ("Justin L. Kennedy" <jk289@mail.gatech.edu>) |
Responses |
Re: Tsearch vector not stored by update/set
|
List | pgsql-general |
> It seems to be selective of only numbers, words with numbers in them, > words with '.' or '/' characters. It completely ignores any other words > or text in any of the 3 fields. > This is a very big hint to your problem. > You requested the pg_ts_* tables: > On the Linux-redhat, pg7.3.2 > > pg_ts_cfgmap(73 rows) > ts_name tok_alias dict_name > "default" "lword" "{en_stem}" > "default" "nlword" "{simple}" > "default" "word" "{simple}" > "default" "email" "{simple}" > "default" "url" "{simple}" > "default" "host" "{simple}" > "default" "sfloat" "{simple}" > "default" "version" "{simple}" > "default" "part_hword" "{simple}" > "default" "nlpart_hword" "{simple}" > "default" "lpart_hword" "{en_stem}" > "default" "hword" "{simple}" > "default" "lhword" "{en_stem}" > "default" "nlhword" "{simple}" > "default" "uri" "{simple}" > "default" "file" "{simple}" > "default" "float" "{simple}" > "default" "int" "{simple}" > "default" "uint" "{simple}" > "default_russian" "lword" "{en_stem}" > "default_russian" "nlword" "{ru_stem}" > "default_russian" "word" "{ru_stem}" > "default_russian" "email" "{simple}" > "default_russian" "url" "{simple}" > "default_russian" "host" "{simple}" > "default_russian" "sfloat" "{simple}" > "default_russian" "version" "{simple}" > "default_russian" "part_hword" "{simple}" > "default_russian" "nlpart_hword" "{ru_stem}" > "default_russian" "lpart_hword" "{en_stem}" > "default_russian" "hword" "{ru_stem}" > "default_russian" "lhword" "{en_stem}" > "default_russian" "nlhword" "{ru_stem}" > "default_russian" "uri" "{simple}" > "default_russian" "file" "{simple}" > "default_russian" "float" "{simple}" > "default_russian" "int" "{simple}" > "default_russian" "uint" "{simple}" > "simple" "lword" "{simple}" > "simple" "nlword" "{simple}" > "simple" "word" "{simple}" > "simple" "email" "{simple}" > "simple" "url" "{simple}" > "simple" "host" "{simple}" > "simple" "sfloat" "{simple}" > "simple" "version" "{simple}" > "simple" "part_hword" "{simple}" > "simple" "nlpart_hword" "{simple}" > "simple" "lpart_hword" "{simple}" > "simple" "hword" "{simple}" > "simple" "lhword" "{simple}" > "simple" "nlhword" "{simple}" > "simple" "uri" "{simple}" > "simple" "file" "{simple}" > "simple" "float" "{simple}" > "simple" "int" "{simple}" > "simple" "uint" "{simple}" > "default_english" "url" "{simple}" > "default_english" "host" "{simple}" > "default_english" "sfloat" "{simple}" > "default_english" "uri" "{simple}" > "default_english" "int" "{simple}" > "default_english" "float" "{simple}" > "default_english" "email" "{simple}" > "default_english" "word" "{simple}" > "default_english" "hword" "{simple}" > "default_english" "nlword" "{simple}" > "default_english" "nlpart_hword" "{simple}" > "default_english" "part_hword" "{simple}" > "default_english" "nlhword" "{simple}" > "default_english" "file" "{simple}" > "default_english" "uint" "{simple}" > "default_english" "version" "{simple}" > I am assuming that your cluster is running created with en_US for the locale, and that you have set the matching tsearch2 configuration to be your default (Or curcfg for each process running). If you look at your config mappings for the "default_english" you will notice that you have 16 records, as opposed to 19 records like every other configuration mapping. From some more in depth observations, I noticed you are missing entries for the 'lword', 'lhword' and ''lpart_hword'. That means that tokens found to be of types 'Latin Words', 'Latin Hyphenated Words' and 'Latin Part Hyphenated Words' are just dropped because you do not have a configuration mapping set up for them. This is why only numbers (or other lexem types) would show (They are returned as lexem_types : int, uint, float, url, etc. for which you have mappings). Most regular words are simply discarded due to missing entries. If you fix your configurations the triggers should work properly. Your examples worked before, simply because you specified the 'default' configuration on the insert statement. Which is not the same as the 'default_english' configuration which is used by the trigger based on your server encoding (en_US). > I have made a single change to it from its default installation. When I > was working with the rank_cd() function on the 8.0.0 machine, it had > errors due to a non-existant english stop file, so I changed > pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'. The indexing > system was working fine both before and after the change to the pg_ts_dict > table. I also propagated the change to the 7.3.2 machine even though it > didn't have the error message (the stop file didn't exist on that computer > either, but it never gave an error message about it). I would not recommend this. The stop file should is most likely on the system somewhere. It will change depending on your installation. Look for english.stop on the computer(s). If it is not there, you can grab the one out of the source distribution and put it wherever you want. Then just update the settings to the location you used. good luck, Andy
pgsql-general by date: