Re: - Mailing list pgsql-bugs
From | Ronan Dunklau |
---|---|
Subject | Re: |
Date | |
Msg-id | 2834175.e9J7NaK4W3@aivenronan Whole thread Raw |
In response to | Re: (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: index cost estimation
|
List | pgsql-bugs |
I've taken a look at this, and can expose a minimal test case reproducing what I believe is the same problem, see the attached test.sql file for this. The test case is a bit extreme, by setting random_page_cost so low, but the same thing can happen with default values of random_page_cost given a significantly high number of loops. Running the attached test case, I get the following: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.01..711.84 rows=20000 width=13) (actual time=0.240..6521.129 rows=20000 loops=1) Buffers: shared hit=445110 -> Seq Scan on t2 (cost=0.00..307.00 rows=20000 width=9) (actual time=0.007..2.297 rows=20000 loops=1) Buffers: shared hit=107 -> Bitmap Heap Scan on t1 (cost=0.01..0.02 rows=1 width=4) (actual time=0.325..0.325 rows=1 loops=20000) Recheck Cond: (id = t2.t1_id) Rows Removed by Index Recheck: 0 Heap Blocks: exact=20013 Buffers: shared hit=445003 -> Bitmap Index Scan on t1_gin_index (cost=0.00..0.01 rows=1 width=0) (actual time=0.324..0.324 rows=1 loops=20000) Index Cond: (id = t2.t1_id) Buffers: shared hit=424990 Planning: Buffers: shared hit=79 Planning Time: 0.325 ms Execution Time: 6522.759 ms If I drop the gin index, the btree index is used: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.29..1249.56 rows=20000 width=13) (actual time=0.018..16.570 rows=20000 loops=1) Buffers: shared hit=4607 -> Seq Scan on t2 (cost=0.00..307.00 rows=20000 width=9) (actual time=0.007..1.717 rows=20000 loops=1) Buffers: shared hit=107 -> Memoize (cost=0.29..0.31 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=20000) Cache Key: t2.t1_id Cache Mode: logical Hits: 18500 Misses: 1500 Evictions: 0 Overflows: 0 Memory Usage: 154kB Buffers: shared hit=4500 -> Index Only Scan using t1_pkey on t1 (cost=0.28..0.30 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1500) Index Cond: (id = t2.t1_id) Heap Fetches: 1500 Buffers: shared hit=4500 Planning: Buffers: shared hit=10 Planning Time: 0.198 ms Execution Time: 17.683 ms Looking into it, it looks like we are not charging a cpu "descent" cost for the entry tree of the gin index, which we do for the btree index. In general, it does not pose a problem since IO costs are far greater than cpu costs. But when the index scan is inside a nestloop, we account for cache effect and amortize the cost of IO over the number of outer scans, which reduces its relative importance significantly. In that case, the index scan on the gin index appears much cheaper, as the constant cpu cost is not taken into account. I'm not sure how we should account for the cost of the descent, but I believe it should be more than free. Perhaps we could devise a similar strategy using the estimated numEntryPages ? -- Ronan Dunklau
Attachment
pgsql-bugs by date: