Bloom index cost model seems to be wrong - Mailing list pgsql-performance

From Thomas Kellerer
Subject Bloom index cost model seems to be wrong
Date
Msg-id dc54f6e8-c6bf-6869-08c2-de855de470ae@gmx.net
Whole thread Raw
Responses Re: Bloom index cost model seems to be wrong  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I stumbled upon this question:

    https://dba.stackexchange.com/questions/229413

in a nutshell: the bloom index is not used with the example from the manual. 

The bloom index is only used if either Seq Scan is disabled or if the random_page_cost is set to 1 (anything about 1
triggersa Seq Scan on my Windows laptop). 
 

If parallel execution is disabled, then the bloom index is only used if the random_page_cost is lower than 4. 

This does not use the index:

  set random_page_cost = 4; 
  set max_parallel_workers_per_gather=0;
  explain (analyze, buffers) 
  select * 
  from tbloom 
  where i2 = 898732 
    and i5 = 123451;

This uses the bloom index:

  set random_page_cost = 3.5; 
  set max_parallel_workers_per_gather=0;
  explain (analyze, buffers) 
  select * 
  from tbloom 
  where i2 = 898732 
    and i5 = 123451;

And this uses the index also: 

  set random_page_cost = 1; 
  explain (analyze, buffers) 
  select * 
  from tbloom 
  where i2 = 898732 
    and i5 = 123451;

This is the plan with when the index is used (either through "enable_seqscan = off" or "random_page_cost = 1")

Bitmap Heap Scan on tbloom  (cost=138436.69..138440.70 rows=1 width=24) (actual time=42.444..42.444 rows=0 loops=1)

 
  Recheck Cond: ((i2 = 898732) AND (i5 = 123451))

 
  Rows Removed by Index Recheck: 2400

 
  Heap Blocks: exact=2365

 
  Buffers: shared hit=21973

 
  ->  Bitmap Index Scan on bloomidx  (cost=0.00..138436.69 rows=1 width=0) (actual time=40.756..40.756 rows=2400
loops=1)
        Index Cond: ((i2 = 898732) AND (i5 = 123451))

 
        Buffers: shared hit=19608

 
Planning Time: 0.075 ms

 
Execution Time: 42.531 ms

 

And this is the plan when everything left at default settings:

Seq Scan on tbloom  (cost=0.00..133695.80 rows=1 width=24) (actual time=1220.116..1220.116 rows=0 loops=1)
  Filter: ((i2 = 898732) AND (i5 = 123451))                                                               
  Rows Removed by Filter: 10000000                                                                        
  Buffers: shared hit=4697 read=58998                                                                     
  I/O Timings: read=354.670                                                                               
Planning Time: 0.075 ms                                                                                   
Execution Time: 1220.144 ms                                                                               

Can this be considered a bug in the cost model of the bloom index implementation? 
Or is it expected that this is only used if random access is really cheap? 

Thomas




pgsql-performance by date:

Previous
From: Christoph Berg
Date:
Subject: Re: Performance regressions found using sqlfuzz
Next
From: Tom Lane
Date:
Subject: Re: Bloom index cost model seems to be wrong