Re: PostgreSQL GIN index not used when ts_query language is fetched from a column - Mailing list pgsql-general

From Jeff Janes
Subject Re: PostgreSQL GIN index not used when ts_query language is fetched from a column
Date
Msg-id CAMkU=1w_s+2KsK5GqPXMMEog+WjG86HtqGe=_txQXvvC_ShfjQ@mail.gmail.com
Whole thread Raw
In response to PostgreSQL GIN index not used when ts_query language is fetched from a column  (Jaap Roes <jroes@leukeleu.nl>)
List pgsql-general
On Thu, Sep 15, 2016 at 5:46 AM, Jaap Roes <jroes@leukeleu.nl> wrote:
I've got a table that stores some multilingual content:

    CREATE TABLE search (
      content text NOT NULL,
      language regconfig NOT NULL,
      fulltext tsvector
    );
    CREATE INDEX search_fulltext ON search USING GIN(fulltext);

    INSERT INTO search (language, content) VALUES
      ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna'),
      ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op zijn kale koeli-kop.'),
      ('dutch', 'Moeder sneed zeven scheve sneden brood'),
      ('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she shines, and where she shines she sits.'),
      ('english', 'How can a clam cram in a clean cream can?'),
      ('english', 'Can you can a can as a canner can can a can?');

    UPDATE search SET fulltext = to_tsvector(language, content);

To make sure I always search in the correct language I use these queries:

    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
    (1 row)

I don't think that what you want is possible to do in a clean way.  It can't know what the language for a row is until it finds the row, but it can't find the row using an index until it runs to_tsquery, and it can't run to_tsquery until it knows the language.

You would probably need to run the query once for each language, and filter out the results which it found under the 'wrong' language.

SELECT * FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine') and language = 'dutch'::regconfig
union all
SELECT * FROM search WHERE fulltext @@ to_tsquery('english', 'shine') and language = 'english'::regconfig;

If you have more than two languages, this will become tedious, and perhaps non-performant.  I am not aware of a more general solution, though.

It might help to build partial indexes on each language.

Cheers,

Jeff

pgsql-general by date:

Previous
From: Arun Rangarajan
Date:
Subject: Unable to create oracle_fdw (foreign data wrapper) extension
Next
From: John R Pierce
Date:
Subject: Re: PgSQL versions supported on ubuntu 16 and debian 8