Thread: GIN INdex is not used with && operator for a text array index
HI Team
db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan kangra riyali 144 hp 176058', complete_address) from address18 where pentgram @> show_trgm('jattan kangra riyali 144 hp 176058');
Bitmap Heap Scan on public.address18 (cost=261.25..262.52 rows=1 width=4) (actual time=58.992..58.994 rows=1 loops=1)
Output: similarity('*****'::text, complete_address)
Recheck Cond: (address18.pentgram @> '{******}'::text[])
Heap Blocks: exact=1
Buffers: shared hit=1483
-> Bitmap Index Scan on pentgram_idx (cost=0.00..261.25 rows=1 width=0) (actual time=58.960..58.960 rows=1 loops=1)
Index Cond: (address18.pentgram @> '{***}'::text[])
Buffers: shared hit=1482
Query Identifier: -126591413296272164
Planning:
Buffers: shared hit=1
Planning Time: 0.679 ms
Execution Time: 60.373 ms
(13 rows)
when Operator '&&' is used the index is used and the execution time is 60ms.
db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan kangra riyali 144 hp 176058', complete_address) from address18 where pentgram && show_trgm('jattan kangra riyali 144 hp 176058');
Seq Scan on public.address18 (cost=0.00..77215.11 rows=247741 width=4) (actual time=0.063..1880.467 rows=247741 loops=1)
Output: similarity('****'::text, complete_address)
Filter: (address18.pentgram && '{"****}'::text[])
Buffers: shared hit=3592 read=69907
I/O Timings: shared/local read=267.274
Query Identifier: 2367846469053211383
Planning:
Buffers: shared hit=1
Planning Time: 0.386 ms
Execution Time: 1886.125 ms
(10 rows)
Sorry for the spam.
We have Postgres DB where the list of addresses are stored and for a given complete address
trigram of addresses are stored in a column which is a text array.
trigram of addresses are stored in a column which is a text array.
after looking at the list of operators that are available for gin index I decided to use array_ops operator.
select amop.amopopr::regoperator, amop.amopstrategy, opc.opcname from pg_opclass opc, pg_opfamily opf, pg_am am, pg_amop amop where opf.oid = opc.opcfamily and am.oid = opf.opfmethod and amop.amopfamily = opc.opcfamily and am.amname = 'gin' and amop.amoplefttype = opc.opcintype;
amopopr | amopstrategy | opcname
-------------------------+--------------+----------------
&&(anyarray,anyarray) | 1 | array_ops
@>(anyarray,anyarray) | 2 | array_ops
<@(anyarray,anyarray) | 3 | array_ops
=(anyarray,anyarray) | 4 | array_ops
CREATE INDEX pentgram_idx ON address18 USING GIN(pentgram array_ops);
amopopr | amopstrategy | opcname
-------------------------+--------------+----------------
&&(anyarray,anyarray) | 1 | array_ops
@>(anyarray,anyarray) | 2 | array_ops
<@(anyarray,anyarray) | 3 | array_ops
=(anyarray,anyarray) | 4 | array_ops
CREATE INDEX pentgram_idx ON address18 USING GIN(pentgram array_ops);
when Operator '@>' is used the index is used and the execution time is 60ms.
db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan kangra riyali 144 hp 176058', complete_address) from address18 where pentgram @> show_trgm('jattan kangra riyali 144 hp 176058');
Bitmap Heap Scan on public.address18 (cost=261.25..262.52 rows=1 width=4) (actual time=58.992..58.994 rows=1 loops=1)
Output: similarity('*****'::text, complete_address)
Recheck Cond: (address18.pentgram @> '{******}'::text[])
Heap Blocks: exact=1
Buffers: shared hit=1483
-> Bitmap Index Scan on pentgram_idx (cost=0.00..261.25 rows=1 width=0) (actual time=58.960..58.960 rows=1 loops=1)
Index Cond: (address18.pentgram @> '{***}'::text[])
Buffers: shared hit=1482
Query Identifier: -126591413296272164
Planning:
Buffers: shared hit=1
Planning Time: 0.679 ms
Execution Time: 60.373 ms
(13 rows)
when Operator '&&' is used the index is used and the execution time is 60ms.
It is performing sequential scan which is not expected.
db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan kangra riyali 144 hp 176058', complete_address) from address18 where pentgram && show_trgm('jattan kangra riyali 144 hp 176058');
Seq Scan on public.address18 (cost=0.00..77215.11 rows=247741 width=4) (actual time=0.063..1880.467 rows=247741 loops=1)
Output: similarity('****'::text, complete_address)
Filter: (address18.pentgram && '{"****}'::text[])
Buffers: shared hit=3592 read=69907
I/O Timings: shared/local read=267.274
Query Identifier: 2367846469053211383
Planning:
Buffers: shared hit=1
Planning Time: 0.386 ms
Execution Time: 1886.125 ms
(10 rows)
When i disable the sequential scan the execution time is increased significantly.
db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan kangra riyali 144 hp 176058', complete_address) from address18 where pentgram && show_trgm('jattan kangra riyali 144 hp 176058');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.address18 (cost=2266.24..79481.36 rows=247741 width=4) (actual time=186.402..3285.090 rows=247741 loops=1)
Output: similarity('****'::text, complete_address)
Recheck Cond: (address18.pentgram && '{***********}'::text[])
Heap Blocks: exact=39632
Buffers: shared hit=1361 read=39155 written=6260
I/O Timings: shared/local read=981.127 write=544.491
-> Bitmap Index Scan on pentgram_idx (cost=0.00..2204.31 rows=247741 width=0) (actual time=182.462..182.463 rows=247741 loops=1)
Index Cond: (address18.pentgram && '{*****}'::text[])
Buffers: shared hit=884
Query Identifier: 2367846469053211383
Planning:
Buffers: shared hit=1
Planning Time: 6.707 ms
Execution Time: 3292.339 ms
(14 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.address18 (cost=2266.24..79481.36 rows=247741 width=4) (actual time=186.402..3285.090 rows=247741 loops=1)
Output: similarity('****'::text, complete_address)
Recheck Cond: (address18.pentgram && '{***********}'::text[])
Heap Blocks: exact=39632
Buffers: shared hit=1361 read=39155 written=6260
I/O Timings: shared/local read=981.127 write=544.491
-> Bitmap Index Scan on pentgram_idx (cost=0.00..2204.31 rows=247741 width=0) (actual time=182.462..182.463 rows=247741 loops=1)
Index Cond: (address18.pentgram && '{*****}'::text[])
Buffers: shared hit=884
Query Identifier: 2367846469053211383
Planning:
Buffers: shared hit=1
Planning Time: 6.707 ms
Execution Time: 3292.339 ms
(14 rows)
Not expecting this behavior currently.
Few parameters in my configuration
postgresql version is 15
OS is RHEL 8.8
effective_cache_size | 131072
huge_pages | off
maintenance_work_mem | 327680
max_parallel_maintenance_workers | 1
max_parallel_workers | 2
max_parallel_workers_per_gather | 1 shared_buffers | 32768
max_parallel_workers | 2
max_parallel_workers_per_gather | 1 shared_buffers | 32768
Thanks
C.R.Bala
balasubramanian c r <crbs.siebel@gmail.com> writes: > when Operator '@>' is used the index is used and the execution time is 60ms. Yeah ... note that it's predicted to return just one row, and that guess is correct: > Bitmap Heap Scan on public.address18 (cost=261.25..262.52 rows=1 width=4) > (actual time=58.992..58.994 rows=1 loops=1) > when Operator '&&' is used the index is used and the execution time is 60ms. > It is performing sequential scan which is not expected. I do not see why you expect that. This case retrieves many more rows than the other one, both in the planner's estimate and in reality: > Seq Scan on public.address18 (cost=0.00..77215.11 rows=247741 width=4) > (actual time=0.063..1880.467 rows=247741 loops=1) An indexscan is not necessarily better than a seqscan for such cases... > When i disable the sequential scan the execution time is increased > significantly. ... so, indeed, the planner was correct to use a seqscan. > Not expecting this behavior currently. Your expectation is faulty. regards, tom lane