Re: How does the tsearch configuration get selected? - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: How does the tsearch configuration get selected?
Date
Msg-id Pine.LNX.4.64.0706150745090.1881@sn.sai.msu.ru
Whole thread Raw
In response to Re: How does the tsearch configuration get selected?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: How does the tsearch configuration get selected?
default_text_search_config and expression indexes
List pgsql-hackers
On Thu, 14 Jun 2007, Tom Lane wrote:

> Bruce Momjian <bruce@momjian.us> writes:
>> First, why are we specifying the server locale here since it never
>> changes:

server's locale is used just for one purpose - to select what text search 
configuration to use by default. Any text search functions can accept
text search configuration as an optional parameter.

>
> It's poorly described.  What it should really say is the language
> that the text-to-be-searched is in.  We can actually support multiple
> languages here today, the restriction being that there have to be
> stemmer instances for the languages with the database encoding you're
> using.  With UTF8 encoding this isn't much of a restriction.  We do need
> to put code into the dictionary stuff to enforce that you can't use a
> stemmer when the database encoding isn't compatible with it.
>
> I would prefer that we not drive any of this stuff off the server's
> LC_xxx settings, since as you say that restricts things to just one
> locale.

something like 
CREATE TEXT SEARCH DICTIONARY dictname [LOCALE=ru_RU.UTF-8]
and raise warning/error if database encoding doesn't match dictionary 
encoding if specified (not all dictionaries depend on encoding, so it
should be an optional parameter).

>
>> Second, I can't figure out how to reference a non-default
>> configuration.
>
> See the multi-argument versions of to_tsvector etc.
>
> I do see a problem with having to_tsvector(config, text) plus
> to_tsvector(text) where the latter implicitly references a config
> selected by a GUC variable: how can you tell whether a query using the
> latter matches a particular index using the former?  There isn't
> anything in the current planner mechanisms that would make that work.

Probably, having default text search configuration is not a good idea
and we could just require it as a mandatory parameter, which could
eliminate many confusion with selecting text search configuration.

    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: How does the tsearch configuration get selected?
Next
From: Oleg Bartunov
Date:
Subject: Re: tsearch_core patch: permissions and security issues