Re: default_text_search_config and expression indexes - Mailing list pgsql-advocacy
From | Ron Mayer |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | 46C497E5.7010108@cheapcomplexdevices.com Whole thread Raw |
List | pgsql-advocacy |
From over on the hackers list, Mike Rylander wrote: > My application (http://open-ils.org, which run >80% of the public > libraries in Georgia, USA, http://gapines.org and > http://georgialibraries.org/lib/pines.html) requires that I be able to > search a corpus of bibliographic records in a mix of languages, and > potentially with mixed stop-word rules, with one query. Whoa, cool. Seems it'd make for a pretty awesome case study. My apologies if it's already there, but I can't find it on the web site. http://search.postgresql.org/search?q=georgia&a=1&submit=Search Is this also related to the project? http://open-ils.org/ More context from the thread on hackers. > On 8/13/07, Bruce Momjian <bruce@momjian.us> wrote: >> Heikki Linnakangas wrote: >>> Bruce Momjian wrote: >>>> Heikki Linnakangas wrote: >>>>> Removing the default configuration setting altogether removes the 2nd >>>>> problem, but that's not good from a usability point of view. And it >>>>> doesn't solve the general issue, you can still do things like: >>>>> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@ >>>>> to_tsquery('confB', 'query'); >>>> True, but in that case you are specifically naming different >>>> configurations, so it is hopefully obvious you have a mismatch. >>> There's many more subtle ways to do that. For example, filling a >>> tsvector column using a DEFAULT clause. But then you sometimes fill it >>> in the application instead, with a different configuration. Or if one of >>> the function calls is buried in another user defined function. >>> >>> I don't think explicitly naming the configuration gives enough protection. >> Oh, wow, OK, well in that case the text search API isn't ready and we >> will have to hold this for 8.4. >> > > I've been watching this thread with a mixture of dread and hope, > waiting to see where the developers' inclination will end up; whether > leaving a useful foot gun available will be allowed. > > This is just my $0.02 as a fairly heavy user of the current tsearch2 > code, but I sincerely hope you do not cripple the system by removing > the ability to store tsvectors built using arbitrary configurations in > a single column. Yes, it can lead to unexpected results if you do not > know what you are doing, but if you have gone beyond building a single > tsearch2 configuration then you are required to know what you are > doing. What's more, IMO the default configuration mechanism feels > very much like a CONSTRAINT, as Oleg suggests. That point is one of > cognizance, where if one has gone to the trouble of setting up > multiple configurations and has learned enough to do so correctly, > then one necessarily understands the importance of the setting and can > use it (or not, and use explicit configurations) correctly. The > default config lowers the bar to an acceptable level for beginners > that have no need of multiple configurations, and while I don't feel > too strongly, personally, about having a default, I think it is both > useful and helpful for new users -- it was for me. > > Now, so this email isn't entirely complaining, and as a data point for > the discussion, I'll explain why I do not want to see tsearch2 > crippled in the way suggested by Heikki and Bruce. > > My application (http://open-ils.org, which run >80% of the public > libraries in Georgia, USA, http://gapines.org and > http://georgialibraries.org/lib/pines.html) requires that I be able to > search a corpus of bibliographic records in a mix of languages, and > potentially with mixed stop-word rules, with one query. I cannot know > ahead of time what languages will be used in the corpus and I cannot > restrict any one query to one language. To accomplish this, the > record itself will be inspected inside an INSERT/UPDATE trigger to > determine the language and type, and use the correct configuration for > creating the tsvector. This will obviously result in a "mixed" > tsvector column, but that's exactly what I need. I can filter on > record language if the user happens to specify a query language (and > thus configuration), or simply rank the assumed (IP based, perhaps, or > browser preference based) preferred language higher, or one of a > hundred other things. But I won't be able to do any of that if > tsvectors are required to have one and only one configuration per > column. > > Anyway, I felt I needed to provide some outside perspective to this, > as a user, since it seems that the external viewpoint (my particular > viewpoint, at least) was missing from the discussion. > > Thanks, folks, for all the work on this so far! > > --miker > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
pgsql-advocacy by date: