Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: default_text_search_config and expression indexes
Date
Msg-id Pine.LNX.4.64.0708090935260.18739@sn.sai.msu.ru
Whole thread Raw
In response to Re: default_text_search_config and expression indexes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: default_text_search_config and expression indexes  (Bruce Momjian <bruce@momjian.us>)
Re: default_text_search_config and expression indexes  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
On Wed, 8 Aug 2007, Bruce Momjian wrote:

> Heikki Linnakangas wrote:
>>>>> Sure, but you have make sure you use the right configuration in the
>>>>> trigger, no?  Does the tsquery have to use the same configuration?
>>>> I wish I knew this myself. :-)   Whatever I had done happened to work
>>>> but that was largely through people on IRC walking me through it.
>>>
>>> This illustrates the major issue --- that this has to be simple for
>>> people to get started, while keeping the capabilities for experienced
>>> users.
>>>
>>> I am now thinking that making users always specify the configuration
>>> name and not allowing :: casting is going to be the best approach.  We
>>> can always add more in 8.4 after it is in wide use.
>>
>> I just read the docs and I'm trying to get a grip of the problem here.
>>
>> If I understood correctly, the basic issue is that a tsvector datum
>> created using configuration A is incompatible with a tsquery datum
>> created using configuration B, in the sense that you won't get
>> reasonable results if you use the tsquery to search the tsvector, or do
>> ranking or highlighting. If the configurations happen to be similar
>> enough, it can work, but not in general.
>
> Right.

not fair. There are many cases when one can intentionally use different
configurations. But I agree, this is not for beginners.

>
>> That underlying issue manifests itself in many ways, including:
>> - if you create table with a field of type tsvector, typically kept
>> up-to-date by triggers, and do a search on it using a different
>> configuration, you get incorrect results.
>
> Right.

again, you might want to use different configuration.

>
>> - using an expression index instead of a tsvector-field, and always
>> explicitly specifying the configuration, you can avoid that problem (a
>> query with a different configuration won't use the index). But an
>> expression index, without explicitly specifying the configuration, will
>> get corrupted if you change the default configuration.
>
> Right.

the same problem if you drop constrain from table (accidently) and then
gets surprised by select results.

>
>> Removing the default configuration setting altogether removes the 2nd
>> problem, but that's not good from a usability point of view. And it
>> doesn't solve the general issue, you can still do things like:
>> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
>> to_tsquery('confB', 'query');
>
> True, but in that case you are specifically naming different
> configurations, so it is hopefully obvious you have a mismatch.
>
>> ISTM we should have a separate tsvector and tsquery data type for each
>> configuration, and throw an error if you try to mix and match them in a
>> query. to_tsquery and to_tsvector would be new kind of polymorphic
>> functions that work with the types. Or we could automatically create a
>> copy of them when you create a new configuration. We could have a
>> default configuration setting and rewrite queries that don't explicitly
>> specify a configuration to use the default.
>
> That is going to make multiple configurations quite complex in the
> backend, and I think for little value.
>
>> You could still get into trouble if you alter the configuration after
>> starting to use it. We could solve that by not allowing you to ALTER
>> CONFIGURATION, at least not if it's used in tables or indexes. Forcing
>> people to create a new configuration, and to recreate all indexes and
>> tsvector columns every time you add a word to a stop-list, for example,
>> seems too onerous, though. Not sure what to do about that.
>
> Yea, seems more work than is necessary.  If we require the configuration
> to be always supplied, and document that mismatches are a problem, I
> think we are in good shape.

We should agree that all you describe is only for DUMMY users. 
From authors point of view I dislike your approach to treat text searching 
as a very limited tool. But I understand that we should preserve people from 
stupid errors.

I want for beginners easy setup and error-prone functionality,
but leaving experienced users to develop complex search engines.
Can we have separate safe interface for text searching and explicitly
recommend it for beginners ?
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: "Jaime Casanova"
Date:
Subject: Re: Function structure in formatting.c
Next
From: "Pavan Deolasee"
Date:
Subject: Re: HOT and INSERT/DELETE