Re: tsearch2, large data and indexes - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: tsearch2, large data and indexes
Date
Msg-id 5357B95B.2050008@vmware.com
Whole thread Raw
In response to Re: tsearch2, large data and indexes  (Ivan Voras <ivoras@freebsd.org>)
List pgsql-performance
On 04/22/2014 10:57 AM, Ivan Voras wrote:
> On 22 April 2014 08:40, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> On 04/20/2014 02:15 AM, Ivan Voras wrote:
>>> More details: after thinking about it some more, it might have
>>> something to do with tsearch2 and indexes: the large data in this case
>>> is a tsvector, indexed with GIN, and the query plan involves a
>>> re-check condition.
>>>
>>> The query is of the form:
>>> SELECT simple_fields FROM table WHERE fts @@ to_tsquery('...').
>>>
>>> Does the "re-check condition" mean that the original tsvector data is
>>> always read from the table in addition to the index?
>>
>> Yes, if the re-check condition involves the fts column. I don't see why you
>> would have a re-check condition with a query like that, though. Are there
>> some other WHERE-conditions that you didn't show us?
>
> Yes, I've read about tsearch2 and GIN indexes and there shouldn't be a
> recheck condition - but there is.
> This is the query:
>
> SELECT documents.id, title, raw_data, q, ts_rank(fts_data, q, 4) AS
> rank, html_filename
>              FROM documents, to_tsquery('document') AS q
>              WHERE fts_data @@ q
>           ORDER BY rank DESC  LIMIT 25;

It's the ranking that's causing the detoasting. "ts_rank(fts_data, q,
4)" has to fetch the contents of the fts_data column.

Sorry, I was confused earlier: the "Recheck Cond:" line is always there
in the EXPLAIN output of bitmap index scans, even if the recheck
condition is never executed at runtime. It's because the executor has to
be prepared to run the recheck-condition, if the bitmap grows large
enough to become "lossy", so that it only stores the page numbers of
matching tuples, not the individual tuples

- Heikki


pgsql-performance by date:

Previous
From: Matheus de Oliveira
Date:
Subject: Re: tsearch2, large data and indexes
Next
From: Josh Berkus
Date:
Subject: Re: HFS+ pg_test_fsync performance