Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers
From | Mike Rylander |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | b918cf3d0708141335k362cf9acpa400bc52f5bb427f@mail.gmail.com Whole thread Raw |
In response to | Re: default_text_search_config and expression indexes (Heikki Linnakangas <heikki@enterprisedb.com>) |
List | pgsql-hackers |
On 8/14/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Mike Rylander wrote: [snip] > > Don't you need to use the right configuration to parse the query into a > tsquery as well? > Only if the user (or user agent) can supply enough information to move away from the configured default of, say, en-US. And even then, other tsvector configurations need to be searched. Configurations are not, of course, just about language/stemming, but also stop-words and thesauri. If it were just languages it wouldn't be too big of a deal (other than in-app DDL management...). > What you have is basically the classic problem problem of representing > inheritance in a relational model. You have a superclass, bibliographic > record, and multiple subclasses, one per language with one extra field, > the corpus in the right language. You've solved it by storing them all > in one table, with an extra column (language) to denote which subclass > the record is. Alternatively, you could solve it by using inherited > tables, or having one table per language with a foreign key referencing > the master table containing the other fields, or having a single table > with one nullable field per configuration, and a check constraint to > check that exactly one of them is not null. > Sorry, communication problem here ... I provided an oversimplified example meant more to show the issues than to find alternate solutions, though I appreciate you taking the time to consider the problem. If I only had to consider one delimiting facet per record then it would be much simpler. But with the complication that some fields have stop-word lists (and some not), some use thesauri (and different ones, at that), and these fields (as extracted from the records) and their configurations (stem? use a stop-word list? use thesaurus x, y or z?) are user-defined ... > As a thought experiment, let me present another, not text search related > example that's isomorphic to your problem: > Unfortunately, the problem you describe is not quite the same as the problem I need to solve. Fortunately, the current incarnation of tsearch2 does a pretty good job of solving the problem if you store everything in one place and use the query-time environment to apply some adjustments to the ranking of items. I could still work around this problem by creating inherited tables, one for each configuration on each index-providing table but I /really/ hope to avoid that. Creating new configurations for local requirements doesn't require creating new tables (and the associated management overhead in the app) today, something I'd really like to avoid. In fact, I'm starting to sweat just thinking about what the planner would go through with the number tables needed for the potential configurations in an installation that makes use of multiple thesauri and a mix of stop-word lists across, say, 30 languages. Such a dataset is not uncommon. In any case, thanks again for taking the time to think about the problem. I still think having the ability to store any old tsvector I happen to have hanging around in any column of the correct type is a GoodThing(tm). I see from Oleg's message down-thread that that's the way things will be (the tsvector type doesn't know about columns, just lexem "tuples"). --miker
pgsql-hackers by date: