Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: default_text_search_config and expression indexes
Date
Msg-id Pine.LNX.4.64.0707312219070.18739@sn.sai.msu.ru
Whole thread Raw
In response to Re: default_text_search_config and expression indexes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: default_text_search_config and expression indexes
List pgsql-hackers
On Mon, 30 Jul 2007, Bruce Momjian wrote:

> Bruce Momjian wrote:
>> We have to decide if we want a GUC default_text_search_config, and if so
>> when can it be changed.
>>
>> Right now there are three ways to create a tsvector (or tsquery)
>>
>>     ::tsvector
>>     to_tsvector(value)
>>     to_tsvector(config, value)
>>
>> (ignoring plainto_tsvector)
>>
>> Only the last one specifies the configuration. The others use the
>> configuration specified by default_text_search_config.  (We had an
>> previous discussion on what the default value of
>> default_text_search_config should be, and it was decided it should be
>> set via initdb based on a flag or the locale.)
>>
>> Now, because most people use a single configuration, they can just set
>> default_text_search_config and there is no need to specify the
>> configuration name.
>>
>> However, expression indexes cause a problem here:
>>
>>     http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
>>
>> We recommend that users create an expression index on the column they
>> want to do a full text search on, e.g.
>>
>>     CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));
>>
>> However, the big problem is that the expressions used in expression
>> indexes should not change their output based on the value of a GUC
>> variable (because it would corrupt the index), but in the case above,
>> default_text_search_config controls what configuration is used, and
>> hence the output of to_tsvector is changed if default_text_search_config
>> changes.
>>
>> We have a few possible options:
>>
>>     1) Document the problem and do nothing else.
>>     2) Make default_text_search_config a postgresql.conf-only
>>        setting, thereby making it impossible to change by non-super
>>        users, or make it a super-user-only setting.
>>     3) Remove default_text_search_config and require the
>>        configuration to be specified in each function call.
>>
>> If we remove default_text_search_config, it would also make ::tsvector
>> casting useless as well.
>
> OK, I just found a case that I think is going to make #3 a requirement
> (remove default_text_search_config).
>
> How is a CREATE INDEX ... to_tsvector(col) going to restore from a
> pg_dump?  I see no way of guaranteeing that the
> default_text_search_config is correct on the restore, and in fact I
> don't think we have any way of knowing the default_text_search_config
> used for the index.
>
> And if we have to require the configuration name in CREATE INDEX, it has
> to be used in WHERE, so we might as well just remove the default
> capability and always require the configuration name.

this is very rare use case for text searching 
1. expression index without configuration name
2. default_text_search_config can be changed by somebody

If somebody really need it, then he should be adviced to use configuration 
name, else we don't guarantee that somebody could change 
default_text_search_config  variable and this could lead to 
incorrect dump/restore.

I don't think we should remove default_text_search_config because of 
this rare case.
    Regards,        Oleg

PS.

Bruce, I'm in the mountains the Northern Caucasia and internet is
a bit  unreliable :(

_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: stats_block_level
Next
From: Tom Lane
Date:
Subject: Re: stats_block_level