Re: not using my GIN index in JOIN expression - Mailing list pgsql-performance
From | Jean-Max Reymond |
---|---|
Subject | Re: not using my GIN index in JOIN expression |
Date | |
Msg-id | 530F4FC5.9040100@free.fr Whole thread Raw |
In response to | Re: not using my GIN index in JOIN expression (Heikki Linnakangas <hlinnakangas@vmware.com>) |
List | pgsql-performance |
Le 27/02/2014 15:19, Heikki Linnakangas a écrit : > On 02/27/2014 04:06 PM, Jean-Max Reymond wrote: >> I am running the last version of PostgreSQL 9.3.3 >> I have two tables detm and corm and a lot of datas in the column >> cormdata of corm table (1.4 GB). >> >> I have a GIN index on cormdata: >> CREATE INDEX ix_corm_fulltext_cormdata ON corm >> USING gin (to_tsvector('french'::regconfig, cormdata)) >> WHERE cormishtml IS FALSE AND length(cormdata) < 20000; >> >> select distinct b.detmmailid from corm b where >> (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and >> b.cormishtml is false and length(b.cormdata) < 20000) >> is very fast and use the GIN index. >> >> "HashAggregate (cost=2027.72..2031.00 rows=328 width=52)" >> " -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548 >> width=52)" >> " Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@ >> to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND >> (length(cormdata) < 20000))" >> " -> Bitmap Index Scan on ix_corm_fulltext_cormdata >> (cost=0.00..24.11 rows=548 width=0)" >> " Index Cond: (to_tsvector('french'::regconfig, cormdata) >> @@ to_tsquery('mauritanie'::text))" >> >> >> With a join an another table detm, GIN index is not used >> >> >> explain select distinct a.detmmailid from detm a JOIN corm b on >> a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@ >> to_tsquery('mauritanie') and b.cormishtml is false and >> length(b.cormdata) < 20000) OR ( detmobjet ~* 'mauritanie' )) >> >> "HashAggregate (cost=172418.27..172423.98 rows=571 width=52)" >> " -> Hash Join (cost=28514.92..172416.85 rows=571 width=52)" >> " Hash Cond: (b.detmmailid = a.detmmailid)" >> " Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@ >> to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND >> (length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))" >> " -> Seq Scan on corm b (cost=0.00..44755.07 rows=449507 >> width=689)" >> " -> Hash (cost=19322.74..19322.74 rows=338574 width=94)" >> " -> Seq Scan on detm a (cost=0.00..19322.74 rows=338574 >> width=94)" >> >> >> If I remove OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN >> index is used >> explain select distinct a.detmmailid from detm a JOIN corm b on >> a.detmmailid = b.detmmailid where ((to_tsvector('french',b.cormdata) @@ >> to_tsquery('mauritanie') and b.cormishtml is false and >> length(b.cormdata) < 20000)) >> >> "HashAggregate (cost=4295.69..4301.17 rows=548 width=52)" >> " -> Nested Loop (cost=24.67..4294.32 rows=548 width=52)" >> " -> Bitmap Heap Scan on corm b (cost=24.25..2026.35 rows=548 >> width=52)" >> " Recheck Cond: ((to_tsvector('french'::regconfig, >> cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) >> AND (length(cormdata) < 20000))" >> " -> Bitmap Index Scan on ix_corm_fulltext_cormdata >> (cost=0.00..24.11 rows=548 width=0)" >> " Index Cond: (to_tsvector('french'::regconfig, >> cormdata) @@ to_tsquery('mauritanie'::text))" >> " -> Index Only Scan using pkey_detm on detm a (cost=0.42..4.13 >> rows=1 width=52)" >> " Index Cond: (detmmailid = b.detmmailid)" >> >> How can i force the use of the GIN index ? >> thanks for your tips, > > The problem with the OR detmobject ~* 'mauritanie' restriction is that > the rows that match that condition cannot be found using the GIN index. > I think you'd want the system to fetch all the rows that match the other > condition using the GIN index, and do something else to find the other > rows. The planner should be able to do that if you rewrite the query as > a UNION: > > select a.detmmailid from detm a JOIN corm b on > a.detmmailid = b.detmmailid > where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and > b.cormishtml is false and length(b.cormdata) < 20000) > union > select a.detmmailid from detm a JOIN corm b on > a.detmmailid = b.detmmailid > where detmobjet ~* 'mauritanie' > > Note that that will not return rows in 'detm' that have no matching rows > in 'corm' table, even if they match the "detmobjet ~* 'mauritanie" > condition. That's what your original query also did, but if that's not > what you want, leave out the JOIN from the second part of the union. > > - Heikki It works great: thanks a lot :-) -- Jean-Max Reymond CKR Solutions Open Source http://www.ckr-solutions.com
pgsql-performance by date: