BUG #18968: GiST Index Cost Estimation - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18968: GiST Index Cost Estimation |
Date | |
Msg-id | 18968-eb7187b33887f562@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18968: GiST Index Cost Estimation
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18968 Logged by: Victor Quartucio Email address: victorq@extrahop.com PostgreSQL version: 17.5 Operating system: Ubuntu 22.04 Description: When a text column has both a B-tree unique index and a GiST gist_trgm_ops index, sometimes the planner estimates a slightly lower cost to use the GiST index for an IN/ANY query with a constant list, but the actual cost to use the GiST index is orders of magnitude higher than the B-tree index, e.g. 2 min versus 6 ms in the example below. Is the GiST index cost estimate too low for gist_trgm_ops? This seems similar to a relatively recent discussion (2022) regarding GiN, GiST index cost (versus B-Tree) that led to a patch https://www.postgresql.org/message-id/flat/3188617.44csPzL39Z%40aivenronan . Further observations: I have experimented with various settings as suggested with some success. The parameters I experimented with include seq_page_cost, random_page_cost, effective_cache_size, work_mem, effective_io_concurrency, cpu_index_tuple_cost, default_statistics_target. The only change that produced a B-tree plan was increasing seq_page_cost to 4.0 while leaving other settings at defaults. But a high seq_page_cost relative to default random_page_cost/cpu costs doesn’t really make sense and is not recommended for heavily-cached I/O performant systems (per the docs). As the number of elements in the IN increases, the performance delta gets worse. Interestingly, the B-Tree estimated cost gets progressively more expensive when compared to GiST from 100 -> ~600 items. But, after 1000 items, the B-Tree is chosen. IN elts GiST cost btree cost cost delta GiST actual ms btree actual ms 100 814.5 828.5 14.0 20632.2 1.2 300 2366.6 2392.6 26.0 60608.7 3.2 600 4574.9 4606.9 32.0 121598.3 6.2 800 5992.7 6008.7 16.0 161267.2 7.8 950 7023.8 7024.8 1.0 190366.5 9.1 1000 7368.6 7360.3 -8.3 201993.7 8.2 Also interestingly, decreasing both random_page_cost, and seq_page_cost to 0.9, the B-tree plan is not chosen until ~3600 items. Thank you. CREATE EXTENSION IF NOT EXISTS pg_trgm; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( id SERIAL PRIMARY KEY, name TEXT NOT NULL ); -- Insert 600k random strings (~40-160 chars long) into t1 INSERT INTO t1 (name) SELECT ( SELECT string_agg( substr('abcdefghijklmnopqrstuvwxyz', floor(random() * 26 + 1)::int, 1), '' ) FROM generate_series(1, floor(random() * (160 - 40 + 1) + 40)::int + (s.i * 0)) ) || '-' || s.i::text FROM generate_series(1, 600000) AS s(i); -- gist CREATE INDEX t1_name_gist ON t1 USING gist (name gist_trgm_ops); -- b-tree ALTER TABLE t1 ADD CONSTRAINT t1_name_key UNIQUE (name); ANALYZE t1; -- hide output \o /dev/null -- get 600 names to query as a list of text literals SELECT string_agg(format('%L', name), ', ') AS names_to_find_literal FROM (SELECT name FROM t1 ORDER BY random() LIMIT 600); \gset \o \echo 'Plan when both GiST and B-Tree indexes are available' EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM t1 WHERE name IN (:names_to_find_literal); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t1 (cost=2640.15..4574.92 rows=600 width=112) (actual time=121595.408..121598.273 rows=600 loops=1) Recheck Cond: (name = ANY ('{omitted_for_brevity}'::text[])) Heap Blocks: exact=588 Buffers: shared hit=1782905 read=27182961 written=22079 -> Bitmap Index Scan on t1_name_gist (cost=0.00..2638.50 rows=600 width=0) (actual time=121595.266..121595.267 rows=600 loops=1) Index Cond: (name = ANY ('{omitted_for_brevity}'::text[])) Buffers: shared hit=1782905 read=27182373 written=22079 Planning: Buffers: shared hit=10 read=1 Planning Time: 0.370 ms Execution Time: 121598.341 ms (11 rows) \echo 'Plan when only B-Tree is available' DROP INDEX t1_name_gist; EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM t1 WHERE name IN (:names_to_find_literal); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t1 (cost=2672.15..4606.92 rows=600 width=112) (actual time=5.722..6.192 rows=600 loops=1) Recheck Cond: (name = ANY ('{omitted_for_brevity}'::text[])) Heap Blocks: exact=588 Buffers: shared hit=2109 read=807 -> Bitmap Index Scan on t1_name_key (cost=0.00..2670.50 rows=600 width=0) (actual time=5.651..5.651 rows=600 loops=1) Index Cond: (name = ANY ('{omitted_for_brevity}'::text[])) Buffers: shared hit=1521 read=807 Planning: Buffers: shared hit=3 read=2 dirtied=1 Planning Time: 0.396 ms Execution Time: 6.229 ms (11 rows)
pgsql-bugs by date: