Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | Pine.LNX.4.64.0708090935260.18739@sn.sai.msu.ru Whole thread Raw |
In response to | Re: default_text_search_config and expression indexes (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: default_text_search_config and expression
indexes
Re: default_text_search_config and expression indexes |
List | pgsql-hackers |
On Wed, 8 Aug 2007, Bruce Momjian wrote: > Heikki Linnakangas wrote: >>>>> Sure, but you have make sure you use the right configuration in the >>>>> trigger, no? Does the tsquery have to use the same configuration? >>>> I wish I knew this myself. :-) Whatever I had done happened to work >>>> but that was largely through people on IRC walking me through it. >>> >>> This illustrates the major issue --- that this has to be simple for >>> people to get started, while keeping the capabilities for experienced >>> users. >>> >>> I am now thinking that making users always specify the configuration >>> name and not allowing :: casting is going to be the best approach. We >>> can always add more in 8.4 after it is in wide use. >> >> I just read the docs and I'm trying to get a grip of the problem here. >> >> If I understood correctly, the basic issue is that a tsvector datum >> created using configuration A is incompatible with a tsquery datum >> created using configuration B, in the sense that you won't get >> reasonable results if you use the tsquery to search the tsvector, or do >> ranking or highlighting. If the configurations happen to be similar >> enough, it can work, but not in general. > > Right. not fair. There are many cases when one can intentionally use different configurations. But I agree, this is not for beginners. > >> That underlying issue manifests itself in many ways, including: >> - if you create table with a field of type tsvector, typically kept >> up-to-date by triggers, and do a search on it using a different >> configuration, you get incorrect results. > > Right. again, you might want to use different configuration. > >> - using an expression index instead of a tsvector-field, and always >> explicitly specifying the configuration, you can avoid that problem (a >> query with a different configuration won't use the index). But an >> expression index, without explicitly specifying the configuration, will >> get corrupted if you change the default configuration. > > Right. the same problem if you drop constrain from table (accidently) and then gets surprised by select results. > >> Removing the default configuration setting altogether removes the 2nd >> problem, but that's not good from a usability point of view. And it >> doesn't solve the general issue, you can still do things like: >> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@ >> to_tsquery('confB', 'query'); > > True, but in that case you are specifically naming different > configurations, so it is hopefully obvious you have a mismatch. > >> ISTM we should have a separate tsvector and tsquery data type for each >> configuration, and throw an error if you try to mix and match them in a >> query. to_tsquery and to_tsvector would be new kind of polymorphic >> functions that work with the types. Or we could automatically create a >> copy of them when you create a new configuration. We could have a >> default configuration setting and rewrite queries that don't explicitly >> specify a configuration to use the default. > > That is going to make multiple configurations quite complex in the > backend, and I think for little value. > >> You could still get into trouble if you alter the configuration after >> starting to use it. We could solve that by not allowing you to ALTER >> CONFIGURATION, at least not if it's used in tables or indexes. Forcing >> people to create a new configuration, and to recreate all indexes and >> tsvector columns every time you add a word to a stop-list, for example, >> seems too onerous, though. Not sure what to do about that. > > Yea, seems more work than is necessary. If we require the configuration > to be always supplied, and document that mismatches are a problem, I > think we are in good shape. We should agree that all you describe is only for DUMMY users. From authors point of view I dislike your approach to treat text searching as a very limited tool. But I understand that we should preserve people from stupid errors. I want for beginners easy setup and error-prone functionality, but leaving experienced users to develop complex search engines. Can we have separate safe interface for text searching and explicitly recommend it for beginners ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: