Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | 200708012133.l71LXqS03436@momjian.us Whole thread Raw |
In response to | Re: default_text_search_config and expression indexes (Oleg Bartunov <oleg@sai.msu.su>) |
List | pgsql-hackers |
Oleg Bartunov wrote: > On Tue, 31 Jul 2007, Bruce Momjian wrote: > > > Oleg Bartunov wrote: > >> On Tue, 31 Jul 2007, Bruce Momjian wrote: > >> > >>>>> And if we have to require the configuration name in CREATE INDEX, it has > >>>>> to be used in WHERE, so we might as well just remove the default > >>>>> capability and always require the configuration name. > >>>> > >>>> this is very rare use case for text searching > >>>> 1. expression index without configuration name > >>>> 2. default_text_search_config can be changed by somebody > >>> > >>> If you are going to be using the configuration name with the create > >>> expression index, you have to use it in the WHERE clause (or the index > >>> doesn't work), and I assume that is 90% of the text search uses. I > >>> don't see it as rare at all. > >> > >> What is a basis of your assumption ? In my opinion, it's very limited > >> use of text search, because it doesn't supports ranking. For 4-5 years > >> of tsearch2 usage I never used it and I never seem in mailing lists. > >> This is very user-oriented feature and we could probably ask > >> -general people for their opinion. > > > > I doubt 'general' is going to understand the details of merging this > > into the backend. I assume we have enough people on hackers to decide > > this. > > I mean not technical details, but use case. Does they need expressional > index without ranking but sacrifice ability to use default configuration > in other cases too ? My prediction is that people doesn't ever thought about > this possibility until we said them about. In a choice between expression indexes and default_text_search_config, there is no question in my mind that expression indexes are more useful. Lack of default_text_search_config only means you have to specify the configuration name every time, and can't do casting to a text search data type. > > Are you saying the majority of users have a separate column with a > > trigger? Does the trigger specify the configuation? I don't see that > > as a parameter argument to tsvector_update_trigger(). If you reload a > > pg_dump, what does it use for the configuration? > > > > yes, separate column with custom trigger works fine. It's up to you how > to keep your data actual and it's up to you how to write trigger. > Our tsvector_update_trigger() is a tsvector_update_trigger_example() ! Well, that is the major problem --- that this is very error-prone, especially considering that the tsvector_update_trigger() doesn't get it right either. > > Why is a separate column better than the index? Just ranking? > > ranking + composite documents. I already mentioned, that this could be > rather expensive. Also, having separate column allow people various > ways to say what is a document and even change it. OK, I am confused why an expression index can't use those features if a separate column can. I realize the index can't store that information, but why can the code pick it out of a heap column but not run the function on the heap row to get that information. I assume it is something that is just hard to implement. > > The reason the expression index is nice is this feature has to be easy > > to use for people who are new to full text and even PostgreSQL. Right > > now /contrib is fine for experts to use, but we want a larger user base > > for this feature. > > I agree here. This was one of the main reason of our work for 8.3. > Probably, we shold think in another direction - not to curtail tsearch2 > and confuse rather big existing users, but to add an ability to save somehow > configuration used for creating of *document* > either implicitly (in expression index, or just gin(text_column)), or > explicitly (separate column). There is no problem with index itself ! Agreed. We need to find a way to save the configuration when the output of a text search function is stored, either in an expression index or via a trigger into a separate column, but only if we allow the default configuration to be changed by non-super-users. > > > > Should we hold the patch for 8.4? > > If we're not agree to say in docs, that implicit usage of text search > configuration in CREATE INDEX command doesn't supported. Could we leave > default_text_search_config for super-users, at least ? > > Anyway, let's wait what other people say. The big problem is that not many people have taken the time to fully understand how full text search works. I hoped that putting the updated documentation online would help: http://momjian.us/expire/fulltext/HTML/textsearch.html but it seems it hasn't. What we could do it if we make default_text_search_config super-user-only and tell users at the start that if default_text_search_config doesn't match the language they want to use, then they have to read a documentation section that explains the problem of configuration mismatches. The problem with that is that we should be setting default_text_search_config in the pg_dump output, like we do for client_encoding, but because it is a super-user-only, it will fail for non-super-user restores. So, I am back to thinking default_text_search_config isn't going to work reliably for novice users. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: