Re: Tsearch vector not stored by update/set - Mailing list pgsql-general

From Justin L. Kennedy
Subject Re: Tsearch vector not stored by update/set
Date
Msg-id Pine.SOL.4.33.0503211626030.16947-100000@acmez.gatech.edu
Whole thread Raw
In response to Tsearch vector not stored by update/set  ("Justin L. Kennedy" <jk289@prism.gatech.edu>)
Responses Re: Tsearch vector not stored by update/set  ("Andrew J. Kopciuch" <akopciuch@bddf.ca>)
List pgsql-general
On Sun, 20 Mar 2005, Andrew J. Kopciuch wrote:

> On Thursday 17 March 2005 17:55, you wrote:
> > The short question is why does this:
> >
> > select to_tsvector('default', coalesce(name, '') ||' '||
> > coalesce(description, '') ||' '|| coalesce(keywords,'')) from link_items;
> >
> > give different results than this:
> >
> > update link_items set linksfti=to_tsvector('default', coalesce(name, '')
> >
> > ||' '|| coalesce(description, '') ||' '|| coalesce(keywords,''));
> >
> > select linksfti from link_items;
> >
>
>
> It shouldn't.  I just tested with some of your data on my machine, and the
> results were fine.  (PG 7.4.6).  I can not see why they would be any
> different for 7.3.2.
>
> Your data looks rather strange. (re "'60':1 '000':2").  Is that really all
> that was inserted? Or have you just left some out for your email?

About 95% of the 1900 insertions ended up with empty strings (not NULLs),
the other 5% looked like that above.  Either just numbers, or occasionally
words (defined by consecutive non-whitespace characters separated by
whitespace) that had numbers or symbols in them.  Like: "U.S. Senate" was
transformed in such a way that "Senate" was dropped completely and "U.S."
became lowercased "u.s.".  Another example was a URL that happened to be
in the description column of one was captured, but the rest of the text
was not.  Another had a name of "World T.E.A.M. Sports" and all that was
stored in the vector was "t.e.a.m."

It seems to be selective of only numbers, words with numbers in them,
words with '.' or '/' characters.  It completely ignores any other words
or text in any of the 3 fields.

> I could see this being a configuration issue possibly.  What do your pg_ts
> tables look like?  Have you made modifications there?

This morning, I decided to remove the following trigger from the
link_items table:
CREATE TRIGGER updateprodtbl
  BEFORE INSERT OR UPDATE
  ON link_items
  FOR EACH ROW
  EXECUTE PROCEDURE tsearch2('linksfti', 'name', 'description',
'keywords');

Now the UPDATE command I listed above works, so apparently there is
something about this trigger that is blocking the search vector from being
stored.  This trigger was copied and pasted (with only changes to the
column names) from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
in the INDEXING FIELDS IN A TABLE section.

This does fix the immediate problem of getting the search results for the
live website we are running, but now we have the problem of keeping it up
to date.  I could run a nightly cronjob to update all the rows, but it
seems inefficient, and I would really like to know why the trigger would
keep it from working on 7.3, but not 8.0.

You requested the pg_ts_* tables:
On the Linux-redhat, pg7.3.2

pg_ts_cfg: (4 rows)
oid    ts_name            prs_name    locale
106407    "default"        "default"    "C"
988004    "default_english"    "default"    "en_US"
106408    "default_russian"    "default"    "ru_RU.KOI8-R"
106409    "simple"        "default"    NULL

pg_ts_dict: (5 rows)
oid    dict_name    dict_initoption    dict_init    dict_lexize
106356    en_stem        ""        snb_en_init    snb_lexize
106361    ispell_template    NULL        spell_init    spell_lexize
106358    ru_stem        "/usr/local/pgsql/share/contrib/russion.stop"
snb_ru_init    snb_lexize
106353    simple        NULL        dex_init    dex_lexize
106364    synonym        NULL        syn_init    syn_lexize

pg_ts_parser: (1 row)
oid    prs_name    prs_start    prs_nexttoken    prs_end
prs_headline    prs_lextype
106389    "default"    prsd_start    prsd_getlexeme    prsd_end
prsd_headline    prsd_lextype

pg_ts_cfgmap(73 rows)
ts_name        tok_alias    dict_name
"default"    "lword"    "{en_stem}"
"default"    "nlword"    "{simple}"
"default"    "word"    "{simple}"
"default"    "email"    "{simple}"
"default"    "url"    "{simple}"
"default"    "host"    "{simple}"
"default"    "sfloat"    "{simple}"
"default"    "version"    "{simple}"
"default"    "part_hword"    "{simple}"
"default"    "nlpart_hword"    "{simple}"
"default"    "lpart_hword"    "{en_stem}"
"default"    "hword"    "{simple}"
"default"    "lhword"    "{en_stem}"
"default"    "nlhword"    "{simple}"
"default"    "uri"    "{simple}"
"default"    "file"    "{simple}"
"default"    "float"    "{simple}"
"default"    "int"    "{simple}"
"default"    "uint"    "{simple}"
"default_russian"    "lword"        "{en_stem}"
"default_russian"    "nlword"    "{ru_stem}"
"default_russian"    "word"    "{ru_stem}"
"default_russian"    "email"    "{simple}"
"default_russian"    "url"    "{simple}"
"default_russian"    "host"    "{simple}"
"default_russian"    "sfloat"    "{simple}"
"default_russian"    "version"    "{simple}"
"default_russian"    "part_hword"    "{simple}"
"default_russian"    "nlpart_hword"    "{ru_stem}"
"default_russian"    "lpart_hword"    "{en_stem}"
"default_russian"    "hword"    "{ru_stem}"
"default_russian"    "lhword"    "{en_stem}"
"default_russian"    "nlhword"    "{ru_stem}"
"default_russian"    "uri"    "{simple}"
"default_russian"    "file"    "{simple}"
"default_russian"    "float"    "{simple}"
"default_russian"    "int"    "{simple}"
"default_russian"    "uint"    "{simple}"
"simple"    "lword"    "{simple}"
"simple"    "nlword"    "{simple}"
"simple"    "word"    "{simple}"
"simple"    "email"    "{simple}"
"simple"    "url"    "{simple}"
"simple"    "host"    "{simple}"
"simple"    "sfloat"    "{simple}"
"simple"    "version"    "{simple}"
"simple"    "part_hword"    "{simple}"
"simple"    "nlpart_hword"    "{simple}"
"simple"    "lpart_hword"    "{simple}"
"simple"    "hword"    "{simple}"
"simple"    "lhword"    "{simple}"
"simple"    "nlhword"    "{simple}"
"simple"    "uri"    "{simple}"
"simple"    "file"    "{simple}"
"simple"    "float"    "{simple}"
"simple"    "int"    "{simple}"
"simple"    "uint"    "{simple}"
"default_english"    "url"    "{simple}"
"default_english"    "host"    "{simple}"
"default_english"    "sfloat"    "{simple}"
"default_english"    "uri"    "{simple}"
"default_english"    "int"    "{simple}"
"default_english"    "float"    "{simple}"
"default_english"    "email"    "{simple}"
"default_english"    "word"    "{simple}"
"default_english"    "hword"    "{simple}"
"default_english"    "nlword"    "{simple}"
"default_english"    "nlpart_hword"    "{simple}"
"default_english"    "part_hword"    "{simple}"
"default_english"    "nlhword"    "{simple}"
"default_english"    "file"    "{simple}"
"default_english"    "uint"    "{simple}"
"default_english"    "version"    "{simple}"


On the 8.0.0-beta5 Windows 2000 machine:
pg_ts_cfg: (4 rows, identical to 7.3.2)
oid    ts_name            prs_name    locale
370162    "default"        "default"    "C"
370165    "default_english"    "default"    "en_US"
370163    "default_russian"    "default"    "ru_RU.KOI8-R"
370164    "simple"        "default"    NULL

pg_ts_cfgmap (73 rows, identical to 7.3.2)
Not listed again, all identical except for OIDs

pg_ts_dict (5 rows, identical to 7.3.2 except for OIDS)

pg_ts_parser (1 row, identical to 7.3.2 except for OIDs)

I have made a single change to it from its default installation.  When I
was working with the rank_cd() function on the 8.0.0 machine, it had
errors due to a non-existant english stop file, so I changed
pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'.  The indexing
system was working fine both before and after the change to the pg_ts_dict
table.  I also propagated the change to the 7.3.2 machine even though it
didn't have the error message (the stop file didn't exist on that computer
either, but it never gave an error message about it).


pgsql-general by date:

Previous
From: Paul Moore
Date:
Subject: Re: New user: Windows, Postgresql, Python
Next
From: Harald Fuchs
Date:
Subject: Re: Tracking row updates - race condition