BUG #18968: GiST Index Cost Estimation - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18968: GiST Index Cost Estimation
Date
Msg-id 18968-eb7187b33887f562@postgresql.org
Whole thread Raw
Responses Re: BUG #18968: GiST Index Cost Estimation
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18968
Logged by:          Victor Quartucio
Email address:      victorq@extrahop.com
PostgreSQL version: 17.5
Operating system:   Ubuntu 22.04
Description:

When a text column has both a B-tree unique index and a GiST gist_trgm_ops
index, sometimes the planner estimates a slightly lower cost to use the GiST
index for an IN/ANY query with a constant list, but the actual cost to use
the GiST index is orders of magnitude higher than the B-tree index, e.g. 2
min versus 6 ms in the example below. Is the GiST index cost estimate too
low for gist_trgm_ops? This seems similar to a relatively recent discussion
(2022) regarding GiN, GiST index cost (versus B-Tree) that led to a patch
https://www.postgresql.org/message-id/flat/3188617.44csPzL39Z%40aivenronan .
Further observations:
I have experimented with various settings as suggested with some success.
The parameters I experimented with include seq_page_cost, random_page_cost,
effective_cache_size, work_mem, effective_io_concurrency,
cpu_index_tuple_cost, default_statistics_target.  The only change that
produced a B-tree plan was increasing seq_page_cost to 4.0 while leaving
other settings at defaults. But a high seq_page_cost relative to default
random_page_cost/cpu costs doesn’t really make sense and is not recommended
for heavily-cached I/O performant systems (per the docs).
As the number of elements in the IN increases, the performance delta gets
worse. Interestingly, the B-Tree estimated cost gets progressively more
expensive when compared to GiST from 100 -> ~600 items. But, after 1000
items, the B-Tree is chosen.
 IN elts  GiST cost  btree cost  cost delta  GiST actual ms  btree actual ms
     100      814.5       828.5        14.0         20632.2              1.2
     300     2366.6      2392.6        26.0         60608.7              3.2
     600     4574.9      4606.9        32.0        121598.3              6.2
     800     5992.7      6008.7        16.0        161267.2              7.8
     950     7023.8      7024.8         1.0        190366.5              9.1
    1000     7368.6      7360.3        -8.3        201993.7              8.2
Also interestingly, decreasing both random_page_cost, and seq_page_cost to
0.9, the B-tree plan is not chosen until ~3600 items.
Thank you.
CREATE EXTENSION IF NOT EXISTS pg_trgm;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL
);
 -- Insert 600k random strings (~40-160 chars long) into t1
INSERT INTO t1 (name)
SELECT
    (
        SELECT string_agg(
            substr('abcdefghijklmnopqrstuvwxyz', floor(random() * 26 +
1)::int, 1),
            ''
        )
        FROM generate_series(1, floor(random() * (160 - 40 + 1) + 40)::int +
(s.i * 0))
    ) || '-' || s.i::text
FROM
    generate_series(1, 600000) AS s(i);
-- gist
CREATE INDEX t1_name_gist ON t1 USING gist (name gist_trgm_ops);
-- b-tree
ALTER TABLE t1 ADD CONSTRAINT t1_name_key UNIQUE (name);
ANALYZE t1;
-- hide output
\o /dev/null
-- get 600 names to query as a list of text literals
SELECT string_agg(format('%L', name), ', ') AS names_to_find_literal
FROM (SELECT name FROM t1 ORDER BY random() LIMIT 600);
\gset
\o
\echo 'Plan when both GiST and B-Tree indexes are available'
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT * FROM t1
WHERE name IN (:names_to_find_literal);
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1  (cost=2640.15..4574.92 rows=600 width=112) (actual
time=121595.408..121598.273 rows=600 loops=1)
   Recheck Cond: (name = ANY ('{omitted_for_brevity}'::text[]))
   Heap Blocks: exact=588
   Buffers: shared hit=1782905 read=27182961 written=22079
   ->  Bitmap Index Scan on t1_name_gist  (cost=0.00..2638.50 rows=600
width=0) (actual time=121595.266..121595.267 rows=600 loops=1)
         Index Cond: (name = ANY ('{omitted_for_brevity}'::text[]))
         Buffers: shared hit=1782905 read=27182373 written=22079
 Planning:
   Buffers: shared hit=10 read=1
 Planning Time: 0.370 ms
 Execution Time: 121598.341 ms
(11 rows)
\echo 'Plan when only B-Tree is available'
DROP INDEX t1_name_gist;
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT * FROM t1
WHERE name IN (:names_to_find_literal);
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1  (cost=2672.15..4606.92 rows=600 width=112) (actual
time=5.722..6.192 rows=600 loops=1)
   Recheck Cond: (name = ANY ('{omitted_for_brevity}'::text[]))
   Heap Blocks: exact=588
   Buffers: shared hit=2109 read=807
   ->  Bitmap Index Scan on t1_name_key  (cost=0.00..2670.50 rows=600
width=0) (actual time=5.651..5.651 rows=600 loops=1)
         Index Cond: (name = ANY ('{omitted_for_brevity}'::text[]))
         Buffers: shared hit=1521 read=807
 Planning:
   Buffers: shared hit=3 read=2 dirtied=1
 Planning Time: 0.396 ms
 Execution Time: 6.229 ms
(11 rows)


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Next
From: Daniil Davydov
Date:
Subject: Re: BUG #16961: Could not access status of transaction