Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: default_text_search_config and expression indexes
Date
Msg-id 87fy2loj68.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: default_text_search_config and expression indexes  ("Mike Rylander" <mrylander@gmail.com>)
Responses Re: default_text_search_config and expression indexes
List pgsql-hackers
"Mike Rylander" <mrylander@gmail.com> writes:

> My application (http://open-ils.org, which run >80% of the public
> libraries in Georgia, USA, http://gapines.org and
> http://georgialibraries.org/lib/pines.html) requires that I be able to
> search a corpus of bibliographic records in a mix of languages, and
> potentially with mixed stop-word rules, with one query.  I cannot know
> ahead of time what languages will be used in the corpus and I cannot
> restrict any one query to one language.  To accomplish this, the
> record itself will be inspected inside an INSERT/UPDATE trigger to
> determine the language and type, and use the correct configuration for
> creating the tsvector.  This will obviously result in a "mixed"
> tsvector column, but that's exactly what I need.  I can filter on
> record language if the user happens to specify a query language (and
> thus configuration), or simply rank the assumed (IP based, perhaps, or
> browser preference based) preferred language higher, or one of a
> hundred other things.  But I won't be able to do any of that if
> tsvectors are required to have one and only one configuration per
> column.
>
> Anyway, I felt I needed to provide some outside perspective to this,
> as a user, since it seems that the external viewpoint (my particular
> viewpoint, at least) was missing from the discussion.

This is *extremely* useful. I think it's precisely what we've been missing so
far. At least, what I've been missing.

So the question is what exactly happens in this case? If I search for "the"
does that mean it will ignore matches in English where that's a stop-word but
find me books on tea in French? Is that what I should expect to happen? What
if I search for "earl and the"? Does that find me French books on Early Grey
Tea but English books on all earls?

What happens if I use the same operator directly on the text column? Or
perhaps it's not even possible to specify stop-words when operating on a text
column? Should it be?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CVS corruption/mistagging?
Next
From: Tom Lane
Date:
Subject: Re: CVS corruption/mistagging?