Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | 46B62169.6050204@enterprisedb.com Whole thread Raw |
In response to | Re: default_text_search_config and expression indexes (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: default_text_search_config and expression
indexes
|
List | pgsql-hackers |
Bruce Momjian wrote: > Ron Mayer wrote: >>>>>> We need more feedback from users. >>>>> Well, I am waiting for other hackers to get involved, but if they don't, >>>>> I have to evaluate it myself on the email lists. >>>> Personally, I think documentation changes would be an OK way to >>>> to handle it. Something that makes it extremely clear to the >>>> user the advantages of having the extra column and the risks >>>> of avoiding them. >>> Sure, but you have make sure you use the right configuration in the >>> trigger, no? Does the tsquery have to use the same configuration? >> I wish I knew this myself. :-) Whatever I had done happened to work >> but that was largely through people on IRC walking me through it. > > This illustrates the major issue --- that this has to be simple for > people to get started, while keeping the capabilities for experienced > users. > > I am now thinking that making users always specify the configuration > name and not allowing :: casting is going to be the best approach. We > can always add more in 8.4 after it is in wide use. I just read the docs and I'm trying to get a grip of the problem here. If I understood correctly, the basic issue is that a tsvector datum created using configuration A is incompatible with a tsquery datum created using configuration B, in the sense that you won't get reasonable results if you use the tsquery to search the tsvector, or do ranking or highlighting. If the configurations happen to be similar enough, it can work, but not in general. That underlying issue manifests itself in many ways, including: - if you create table with a field of type tsvector, typically kept up-to-date by triggers, and do a search on it using a different configuration, you get incorrect results. - using an expression index instead of a tsvector-field, and always explicitly specifying the configuration, you can avoid that problem (a query with a different configuration won't use the index). But an expression index, without explicitly specifying the configuration, will get corrupted if you change the default configuration. Removing the default configuration setting altogether removes the 2nd problem, but that's not good from a usability point of view. And it doesn't solve the general issue, you can still do things like: SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@ to_tsquery('confB', 'query'); ISTM we should have a separate tsvector and tsquery data type for each configuration, and throw an error if you try to mix and match them in a query. to_tsquery and to_tsvector would be new kind of polymorphic functions that work with the types. Or we could automatically create a copy of them when you create a new configuration. We could have a default configuration setting and rewrite queries that don't explicitly specify a configuration to use the default. You could still get into trouble if you alter the configuration after starting to use it. We could solve that by not allowing you to ALTER CONFIGURATION, at least not if it's used in tables or indexes. Forcing people to create a new configuration, and to recreate all indexes and tsvector columns every time you add a word to a stop-list, for example, seems too onerous, though. Not sure what to do about that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: