Thread: How to avoid a GIN recheck condition

How to avoid a GIN recheck condition

From
Alexis
Date:
Hi,

I have a GIN index but the planner rechecks the condition. When many rows
are foud in the index that is very slow:

EXPLAIN SELECT  "tracks".* FROM "tracks"  WHERE (to_tsvector('simple',
normalized_artist || ' ' || normalized_title) @@ to_tsquery('love'));

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tracks  (cost=471.80..33026.91 rows=13908 width=129)
   Recheck Cond: (to_tsvector('simple'::regconfig,
(((normalized_artist)::text || ' '::text) || (normalized_title)::text)) @@
to_tsquery('love'::text))
   ->  Bitmap Index Scan on tracks_gin_keywords  (cost=0.00..468.32
rows=13908 width=0)
         Index Cond: (to_tsvector('simple'::regconfig,
(((normalized_artist)::text || ' '::text) || (normalized_title)::text)) @@
to_tsquery('love'::text))


The documentation says that recheck are only required for weight:
http://www.postgresql.org/docs/current/static/textsearch-indexes.html

    "GIN indexes store only the words (lexemes) of tsvector values, and not
their weight labels. Thus a table row recheck is needed when using a query
that involves weights.)"

I don't need weight because I order results on another column. How can I get
read of that recheck condition?

Cheers,
Alexis.


PS, table definition:

                                         Table "public.tracks"
      Column       |            Type             |
Modifiers
-------------------+-----------------------------+-----------------------------------------------------
 id                | integer                     | not null default
nextval('tracks_id_seq'::regclass)
 created_at        | timestamp without time zone |
 updated_at        | timestamp without time zone |
 title             | character varying(255)      | not null
 artist            | character varying(255)      | not null
 normalized_title  | character varying(255)      |
 normalized_artist | character varying(255)      |
 released_on       | date                        |
 resolved_at       | timestamp without time zone |
 permalink         | character varying(255)      |
 featuring         | character varying(255)      |
Indexes:
    "tracks_pkey" PRIMARY KEY, btree (id)
    "index_tracks_on_permalink" UNIQUE, btree (permalink)
    "index_tracks_on_normalized_artist_and_normalized_title" btree
(normalized_artist, normalized_title)
    "tracks_gin_keywords" gin (to_tsvector('simple'::regconfig,
(normalized_artist::text || ' '::text) || normalized_title::text))




--
View this message in context: http://postgresql.nabble.com/How-to-avoid-a-GIN-recheck-condition-tp5827995.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: How to avoid a GIN recheck condition

From
Tom Lane
Date:
Alexis <alexis@bernard.io> writes:
> I have a GIN index but the planner rechecks the condition. When many rows
> are foud in the index that is very slow:

You're assuming a fact not in evidence, namely that the recheck condition
is being used at runtime.  The planner will *always* attach a recheck
condition to a bitmap heap scan, but it only gets used if
(1) the index opclass reports that its result is lossy, or
(2) the bitmap bloats to the point where it has to be stored lossily.

I see no weights in your query so (1) isn't happening, and it seems
unlikely that (2) is happening if the rowcount estimate is on target.
Lacking EXPLAIN ANALYZE results, it's hard to speculate further.

            regards, tom lane


Re: How to avoid a GIN recheck condition

From
Alexis
Date:
Thanks for your detailed answer !

I miss understood the query plan. Indeed the slowing down comes from my
order clause, which I didn't include in my previous email.

Still loving PostgreSQL from the first day <3



--
View this message in context: http://postgresql.nabble.com/How-to-avoid-a-GIN-recheck-condition-tp5827995p5828501.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.