Folks,
Here's something not to forget in this whole business: the present TSearch2
implementation permits you to have a different tsvector configuration for
each *row*, not just each column. That is, applications can be built with
"per-cell" configs.
I know of at least one out there: Ubuntu's Rosetta. I'm sure there are
others.
Therefore there are two cases we're trying to solve:
(1) The simple case: someone wants to build a database with text search
entirely in one UTF8 language. All vectors are in that language, and so are
all queries. The user wants the simplest syntax possible.
(2) The Rosetta case: different configs are used for each cell and all
searches have to be language-qualified.
In both cases, the databases need to backup and restore cleanly.
From this, I'd first of all say that I don't see the point of a Superuser
default_tsvector_search_config. There are too many failure conditions with
the default once you get away from the simplest case, so I don't see how
setting it to Superuser-only protects anything. Might as well make it a
userset and then it will be more useful.
Unfortunately, the way I see it the only permanent solution for this is to
alter the TSvector structure to include a config OID at the beginning of it.
That doesn't sound like it's doable in time for 8.3, though; is there a way
we could work around that until 8.4?
And why does this sound exactly like the issues we've had with per-column
encodings and the currency type?
--
Josh Berkus
PostgreSQL @ Sun
San Francisco