Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: default_text_search_config and expression indexes
Date
Msg-id 200707310125.l6V1P3M11437@momjian.us
Whole thread Raw
In response to default_text_search_config and expression indexes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: default_text_search_config and expression indexes
Re: default_text_search_config and expression indexes
List pgsql-hackers
Bruce Momjian wrote:
> We have to decide if we want a GUC default_text_search_config, and if so
> when can it be changed.
> 
> Right now there are three ways to create a tsvector (or tsquery)
> 
>     ::tsvector
>     to_tsvector(value)
>     to_tsvector(config, value)
> 
> (ignoring plainto_tsvector)
> 
> Only the last one specifies the configuration. The others use the
> configuration specified by default_text_search_config.  (We had an
> previous discussion on what the default value of
> default_text_search_config should be, and it was decided it should be
> set via initdb based on a flag or the locale.)
> 
> Now, because most people use a single configuration, they can just set
> default_text_search_config and there is no need to specify the
> configuration name.
> 
> However, expression indexes cause a problem here:
> 
>     http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
> 
> We recommend that users create an expression index on the column they
> want to do a full text search on, e.g.
> 
>     CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));
> 
> However, the big problem is that the expressions used in expression
> indexes should not change their output based on the value of a GUC
> variable (because it would corrupt the index), but in the case above,
> default_text_search_config controls what configuration is used, and
> hence the output of to_tsvector is changed if default_text_search_config
> changes.
> 
> We have a few possible options:
> 
>     1) Document the problem and do nothing else.
>     2) Make default_text_search_config a postgresql.conf-only
>        setting, thereby making it impossible to change by non-super
>        users, or make it a super-user-only setting.
>     3) Remove default_text_search_config and require the
>        configuration to be specified in each function call.
> 
> If we remove default_text_search_config, it would also make ::tsvector
> casting useless as well.

OK, I just found a case that I think is going to make #3 a requirement
(remove default_text_search_config).

How is a CREATE INDEX ... to_tsvector(col) going to restore from a
pg_dump?  I see no way of guaranteeing that the
default_text_search_config is correct on the restore, and in fact I
don't think we have any way of knowing the default_text_search_config
used for the index.

And if we have to require the configuration name in CREATE INDEX, it has
to be used in WHERE, so we might as well just remove the default
capability and always require the configuration name.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: twraney@comcast.net
Date:
Subject: Building Hash Index by Presorting Tuples
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Machine available for community use