Re: - Mailing list pgsql-bugs

From Ronan Dunklau
Subject Re:
Date
Msg-id 2834175.e9J7NaK4W3@aivenronan
Whole thread Raw
In response to Re:  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: index cost estimation
List pgsql-bugs
I've taken a look at this, and can expose a minimal test case reproducing what 
I believe is the same problem, see the attached test.sql file for this.

The test case is a bit extreme, by setting random_page_cost so low, but the 
same thing can happen with default values of random_page_cost given a 
significantly high number of loops.

Running the attached test case, I get the following:

                                                          QUERY PLAN
      
 

-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.01..711.84 rows=20000 width=13) (actual 
time=0.240..6521.129 rows=20000 loops=1)
   Buffers: shared hit=445110
   ->  Seq Scan on t2  (cost=0.00..307.00 rows=20000 width=9) (actual 
time=0.007..2.297 rows=20000 loops=1)
         Buffers: shared hit=107
   ->  Bitmap Heap Scan on t1  (cost=0.01..0.02 rows=1 width=4) (actual 
time=0.325..0.325 rows=1 loops=20000)
         Recheck Cond: (id = t2.t1_id)
         Rows Removed by Index Recheck: 0
         Heap Blocks: exact=20013
         Buffers: shared hit=445003
         ->  Bitmap Index Scan on t1_gin_index  (cost=0.00..0.01 rows=1 
width=0) (actual time=0.324..0.324 rows=1 loops=20000)
               Index Cond: (id = t2.t1_id)
               Buffers: shared hit=424990
 Planning:
   Buffers: shared hit=79
 Planning Time: 0.325 ms
 Execution Time: 6522.759 ms

If I drop the gin index, the btree index is used:

                                                           QUERY PLAN
       
 

--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.29..1249.56 rows=20000 width=13) (actual 
time=0.018..16.570 rows=20000 loops=1)
   Buffers: shared hit=4607
   ->  Seq Scan on t2  (cost=0.00..307.00 rows=20000 width=9) (actual 
time=0.007..1.717 rows=20000 loops=1)
         Buffers: shared hit=107
   ->  Memoize  (cost=0.29..0.31 rows=1 width=4) (actual time=0.000..0.000 
rows=1 loops=20000)
         Cache Key: t2.t1_id
         Cache Mode: logical
         Hits: 18500  Misses: 1500  Evictions: 0  Overflows: 0  Memory Usage: 
154kB
         Buffers: shared hit=4500
         ->  Index Only Scan using t1_pkey on t1  (cost=0.28..0.30 rows=1 
width=4) (actual time=0.002..0.002 rows=1 loops=1500)
               Index Cond: (id = t2.t1_id)
               Heap Fetches: 1500
               Buffers: shared hit=4500
 Planning:
   Buffers: shared hit=10
 Planning Time: 0.198 ms
 Execution Time: 17.683 ms

Looking into it, it looks like we are not charging a cpu "descent" cost for 
the entry tree of the gin index, which we do for the btree index. In general, 
it does not pose a problem since IO costs are far greater than cpu costs. But 
when the index scan is inside a nestloop, we account for cache effect and 
amortize the cost of IO over the number of outer scans, which reduces its 
relative importance significantly. In that case, the index scan on the gin 
index appears much cheaper, as the constant cpu cost is not taken into 
account. 

I'm not sure how we should account for the cost of the descent, but  I believe 
it should be more than free. Perhaps we could devise a similar strategy using 
the estimated numEntryPages ?

-- 
Ronan Dunklau
Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17539: Assert after CREATE OPERATOR CLASS command
Next
From: Tom Lane
Date:
Subject: Re: index cost estimation