Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: default_text_search_config and expression indexes
Date
Msg-id 200708012133.l71LXqS03436@momjian.us
Whole thread Raw
In response to Re: default_text_search_config and expression indexes  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
Oleg Bartunov wrote:
> On Tue, 31 Jul 2007, Bruce Momjian wrote:
> 
> > Oleg Bartunov wrote:
> >> On Tue, 31 Jul 2007, Bruce Momjian wrote:
> >>
> >>>>> And if we have to require the configuration name in CREATE INDEX, it has
> >>>>> to be used in WHERE, so we might as well just remove the default
> >>>>> capability and always require the configuration name.
> >>>>
> >>>> this is very rare use case for text searching
> >>>> 1. expression index without configuration name
> >>>> 2. default_text_search_config can be changed by somebody
> >>>
> >>> If you are going to be using the configuration name with the create
> >>> expression index, you have to use it in the WHERE clause (or the index
> >>> doesn't work), and I assume that is 90% of the text search uses.  I
> >>> don't see it as rare at all.
> >>
> >> What is a basis of your assumption ? In my opinion, it's very limited
> >> use of text search, because it doesn't supports ranking. For 4-5 years
> >> of tsearch2 usage I never used it and I never seem in mailing lists.
> >> This is very user-oriented feature and we could probably ask
> >> -general people for their opinion.
> >
> > I doubt 'general' is going to understand the details of merging this
> > into the backend.  I assume we have enough people on hackers to decide
> > this.
> 
> I mean not technical details, but use case. Does they need expressional
> index without ranking but sacrifice ability to use default configuration
> in other cases too ? My prediction is that  people doesn't ever thought about 
> this possibility until we said them about.

In a choice between expression indexes and default_text_search_config,
there is no question in my mind that expression indexes are more useful.
Lack of default_text_search_config only means you have to specify the
configuration name every time, and can't do casting to a text search
data type.

> > Are you saying the majority of users have a separate column with a
> > trigger?  Does the trigger specify the configuation?  I don't see that
> > as a parameter argument to tsvector_update_trigger().  If you reload a
> > pg_dump, what does it use for the configuration?
> >
> 
> yes, separate column with custom trigger works fine. It's up to you how
> to keep your data actual and it's up to you how to write trigger. 
> Our tsvector_update_trigger() is a tsvector_update_trigger_example() !

Well, that is the major problem --- that this is very error-prone,
especially considering that the tsvector_update_trigger() doesn't get it
right either.

> > Why is a separate column better than the index?  Just ranking?
> 
> ranking + composite documents. I already mentioned, that this could be
> rather expensive. Also, having separate column allow people various
> ways to say what is a document and even change it.

OK, I am confused why an expression index can't use those features if a
separate column can.  I realize the index can't store that information,
but why can the code pick it out of a heap column but not run the
function on the heap row to get that information.  I assume it is
something that is just hard to implement.

> > The reason the expression index is nice is this feature has to be easy
> > to use for people who are new to full text and even PostgreSQL.  Right
> > now /contrib is fine for experts to use, but we want a larger user base
> > for this feature.
> 
> I agree here. This was one of the main reason of our work for 8.3.
> Probably, we shold think in another direction - not to curtail tsearch2
> and confuse rather big existing users, but to add an ability to save somehow
> configuration used for creating of *document*
> either implicitly (in expression index, or just gin(text_column)), or
> explicitly (separate column). There is no problem with index itself !

Agreed.  We need to find a way to save the configuration when the output
of a text search function is stored, either in an expression index or
via a trigger into a separate column, but only if we allow the default
configuration to be changed by non-super-users.

> >
> > Should we hold the patch for 8.4?
> 
> If we're not agree to say in docs, that implicit usage of text search 
> configuration in CREATE INDEX command doesn't supported. Could we leave
> default_text_search_config for super-users, at least ?
> 
> Anyway, let's wait what other people say.

The big problem is that not many people have taken the time to fully
understand how full text search works. I hoped that putting the updated
documentation online would help:
http://momjian.us/expire/fulltext/HTML/textsearch.html

but it seems it hasn't.

What we could do it if we make default_text_search_config
super-user-only and tell users at the start that if
default_text_search_config doesn't match the language they want to use,
then they have to read a documentation section that explains the problem
of configuration mismatches.

The problem with that is that we should be setting
default_text_search_config in the pg_dump output, like we do for
client_encoding, but because it is a super-user-only, it will fail for
non-super-user restores.

So, I am back to thinking default_text_search_config isn't going to
work reliably for novice users.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Ron Mayer
Date:
Subject: Re: default_text_search_config and expression indexes
Next
From: Bruce Momjian
Date:
Subject: Re: default_text_search_config and expression indexes