Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: default_text_search_config and expression indexes
Date
Msg-id 20070727084208.GH2908@svr2.hagander.net
Whole thread Raw
In response to default_text_search_config and expression indexes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: default_text_search_config and expression indexes
List pgsql-hackers
On Thu, Jul 26, 2007 at 06:23:51PM -0400, Bruce Momjian wrote:
> Oleg Bartunov wrote:
> > >> Second, I can't figure out how to reference a non-default
> > >> configuration.
> > >
> > > See the multi-argument versions of to_tsvector etc.
> > >
> > > I do see a problem with having to_tsvector(config, text) plus
> > > to_tsvector(text) where the latter implicitly references a config
> > > selected by a GUC variable: how can you tell whether a query using the
> > > latter matches a particular index using the former?  There isn't
> > > anything in the current planner mechanisms that would make that work.
> > 
> > Probably, having default text search configuration is not a good idea
> > and we could just require it as a mandatory parameter, which could
> > eliminate many confusion with selecting text search configuration.
> 
> We have to decide if we want a GUC default_text_search_config, and if so
> when can it be changed.
> 
> Right now there are three ways to create a tsvector (or tsquery)
> 
>     ::tsvector
>     to_tsvector(value)
>     to_tsvector(config, value)
> 
> (ignoring plainto_tsvector)
> 
> Only the last one specifies the configuration. The others use the
> configuration specified by default_text_search_config.  (We had an
> previous discussion on what the default value of
> default_text_search_config should be, and it was decided it should be
> set via initdb based on a flag or the locale.)
> 
> Now, because most people use a single configuration, they can just set
> default_text_search_config and there is no need to specify the
> configuration name.
> 
> However, expression indexes cause a problem here:
> 
>     http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
> 
> We recommend that users create an expression index on the column they
> want to do a full text search on, e.g.
> 
>     CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));
> 
> However, the big problem is that the expressions used in expression
> indexes should not change their output based on the value of a GUC
> variable (because it would corrupt the index), but in the case above,
> default_text_search_config controls what configuration is used, and
> hence the output of to_tsvector is changed if default_text_search_config
> changes.

It wuoldn't actually *corrupt* the index, right? You could end up with
wrong results, which might be regarded as corruption in one way, but as
long as you change the value back the index still works, no?


> We have a few possible options:
> 
>     1) Document the problem and do nothing else.
>     2) Make default_text_search_config a postgresql.conf-only
>        setting, thereby making it impossible to change by non-super
>        users, or make it a super-user-only setting.
>     3) Remove default_text_search_config and require the
>        configuration to be specified in each function call.
> 
> If we remove default_text_search_config, it would also make ::tsvector
> casting useless as well.

I think 3 is a really bad solution.

2 is a half-bad solution. Do we have a way to say that it can be set at
database-level for example, but not at user session? Making it
superuser-only to change it but not postgresql.conf-only could accomplish
that, along with warnings in the docs for the super user about the effects
on current indexes by changing it.

//Magnus


pgsql-hackers by date:

Previous
From: "Simon Riggs"
Date:
Subject: Re: stats_block_level
Next
From: Alvaro Herrera
Date:
Subject: Re: Quick idea for reducing VACUUM contention