Re: not using my GIN index in JOIN expression - Mailing list pgsql-performance

From Jean-Max Reymond
Subject Re: not using my GIN index in JOIN expression
Date
Msg-id 530F4FC5.9040100@free.fr
Whole thread Raw
In response to Re: not using my GIN index in JOIN expression  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-performance
Le 27/02/2014 15:19, Heikki Linnakangas a écrit :
> On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
>> I am running the last version of PostgreSQL 9.3.3
>> I have two tables detm and corm and a lot of datas in the column
>> cormdata of corm table (1.4 GB).
>>
>> I have a GIN index on cormdata:
>> CREATE INDEX ix_corm_fulltext_cormdata  ON corm
>>     USING gin (to_tsvector('french'::regconfig, cormdata))
>>     WHERE cormishtml IS FALSE AND length(cormdata) < 20000;
>>
>> select distinct b.detmmailid from corm b where
>> (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
>> b.cormishtml is false and length(b.cormdata) < 20000)
>> is very fast and use the GIN index.
>>
>> "HashAggregate  (cost=2027.72..2031.00 rows=328 width=52)"
>> "  ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
>> width=52)"
>> "        Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
>> to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
>> (length(cormdata) < 20000))"
>> "        ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
>> (cost=0.00..24.11 rows=548 width=0)"
>> "              Index Cond: (to_tsvector('french'::regconfig, cormdata)
>> @@ to_tsquery('mauritanie'::text))"
>>
>>
>> With a join an another table detm, GIN index is not used
>>
>>
>>    explain select distinct a.detmmailid from detm a  JOIN corm b on
>> a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
>> to_tsquery('mauritanie') and b.cormishtml is false and
>> length(b.cormdata) < 20000)  OR ( detmobjet ~* 'mauritanie' ))
>>
>> "HashAggregate  (cost=172418.27..172423.98 rows=571 width=52)"
>> "  ->  Hash Join  (cost=28514.92..172416.85 rows=571 width=52)"
>> "        Hash Cond: (b.detmmailid = a.detmmailid)"
>> "        Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
>> to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
>> (length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
>> "        ->  Seq Scan on corm b  (cost=0.00..44755.07 rows=449507
>> width=689)"
>> "        ->  Hash  (cost=19322.74..19322.74 rows=338574 width=94)"
>> "              ->  Seq Scan on detm a  (cost=0.00..19322.74 rows=338574
>> width=94)"
>>
>>
>> If I remove   OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
>> index is used
>>    explain select distinct a.detmmailid from detm a  JOIN corm b on
>> a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
>> to_tsquery('mauritanie') and b.cormishtml is false and
>> length(b.cormdata) < 20000))
>>
>> "HashAggregate  (cost=4295.69..4301.17 rows=548 width=52)"
>> "  ->  Nested Loop  (cost=24.67..4294.32 rows=548 width=52)"
>> "        ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
>> width=52)"
>> "              Recheck Cond: ((to_tsvector('french'::regconfig,
>> cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
>> AND (length(cormdata) < 20000))"
>> "              ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
>> (cost=0.00..24.11 rows=548 width=0)"
>> "                    Index Cond: (to_tsvector('french'::regconfig,
>> cormdata) @@ to_tsquery('mauritanie'::text))"
>> "        ->  Index Only Scan using pkey_detm on detm a  (cost=0.42..4.13
>> rows=1 width=52)"
>> "              Index Cond: (detmmailid = b.detmmailid)"
>>
>> How can i force the use of the GIN index ?
>> thanks for your tips,
>
> The problem with the OR detmobject ~* 'mauritanie' restriction is that
> the rows that match that condition cannot be found using the GIN index.
> I think you'd want the system to fetch all the rows that match the other
> condition using the GIN index, and do something else to find the other
> rows. The planner should be able to do that if you rewrite the query as
> a UNION:
>
> select a.detmmailid from detm a JOIN corm b on
> a.detmmailid = b.detmmailid
> where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
> b.cormishtml is false and length(b.cormdata) < 20000)
> union
> select a.detmmailid from detm a  JOIN corm b on
> a.detmmailid = b.detmmailid
> where detmobjet ~* 'mauritanie'
>
> Note that that will not return rows in 'detm' that have no matching rows
> in 'corm' table, even if they match the "detmobjet ~* 'mauritanie"
> condition. That's what your original query also did, but if that's not
> what you want, leave out the JOIN from the second part of the union.
>
> - Heikki

It works great: thanks a lot :-)

--
Jean-Max Reymond
CKR Solutions Open Source http://www.ckr-solutions.com


pgsql-performance by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: not using my GIN index in JOIN expression
Next
From: Tom Coogan
Date:
Subject: Inefficient filter order in query plan