Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | 200708090107.l7917F224853@momjian.us Whole thread Raw |
In response to | Re: default_text_search_config and expression indexes (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: default_text_search_config and expression indexes
Re: default_text_search_config and expression indexes |
List | pgsql-hackers |
Heikki Linnakangas wrote: > >>> Sure, but you have make sure you use the right configuration in the > >>> trigger, no? Does the tsquery have to use the same configuration? > >> I wish I knew this myself. :-) Whatever I had done happened to work > >> but that was largely through people on IRC walking me through it. > > > > This illustrates the major issue --- that this has to be simple for > > people to get started, while keeping the capabilities for experienced > > users. > > > > I am now thinking that making users always specify the configuration > > name and not allowing :: casting is going to be the best approach. We > > can always add more in 8.4 after it is in wide use. > > I just read the docs and I'm trying to get a grip of the problem here. > > If I understood correctly, the basic issue is that a tsvector datum > created using configuration A is incompatible with a tsquery datum > created using configuration B, in the sense that you won't get > reasonable results if you use the tsquery to search the tsvector, or do > ranking or highlighting. If the configurations happen to be similar > enough, it can work, but not in general. Right. > That underlying issue manifests itself in many ways, including: > - if you create table with a field of type tsvector, typically kept > up-to-date by triggers, and do a search on it using a different > configuration, you get incorrect results. Right. > - using an expression index instead of a tsvector-field, and always > explicitly specifying the configuration, you can avoid that problem (a > query with a different configuration won't use the index). But an > expression index, without explicitly specifying the configuration, will > get corrupted if you change the default configuration. Right. > Removing the default configuration setting altogether removes the 2nd > problem, but that's not good from a usability point of view. And it > doesn't solve the general issue, you can still do things like: > SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@ > to_tsquery('confB', 'query'); True, but in that case you are specifically naming different configurations, so it is hopefully obvious you have a mismatch. > ISTM we should have a separate tsvector and tsquery data type for each > configuration, and throw an error if you try to mix and match them in a > query. to_tsquery and to_tsvector would be new kind of polymorphic > functions that work with the types. Or we could automatically create a > copy of them when you create a new configuration. We could have a > default configuration setting and rewrite queries that don't explicitly > specify a configuration to use the default. That is going to make multiple configurations quite complex in the backend, and I think for little value. > You could still get into trouble if you alter the configuration after > starting to use it. We could solve that by not allowing you to ALTER > CONFIGURATION, at least not if it's used in tables or indexes. Forcing > people to create a new configuration, and to recreate all indexes and > tsvector columns every time you add a word to a stop-list, for example, > seems too onerous, though. Not sure what to do about that. Yea, seems more work than is necessary. If we require the configuration to be always supplied, and document that mismatches are a problem, I think we are in good shape. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: