bitmap scan much slower than index scan, hash_search_with_hash_value - Mailing list pgsql-hackers

From Sergey Koposov
Subject bitmap scan much slower than index scan, hash_search_with_hash_value
Date
Msg-id alpine.LRH.2.02.1209020540470.25232@calx046.ast.cam.ac.uk
Whole thread Raw
Responses Re: bitmap scan much slower than index scan, hash_search_with_hash_value  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: bitmap scan much slower than index scan, hash_search_with_hash_value  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-hackers
Hi,

I'm experiencing the case when bitmap scan is ~ 70 times slower than 
index scan which seems to be caused by 1) very big table 2) some hash 
search logic (hash_search_with_hash_value )

Here is the explain analyze of the query with bitmap scans allowed:

wsdb=> explain analyze select * from test as t, crts.data as d1                where d1.objid=t.objid and d1.mjd=t.mjd
limit10000;                                                                      QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=11514.04..115493165.44 rows=10000 width=68) (actual time=27.512..66620.231 rows=10000 loops=1)   ->  Nested
Loop (cost=11514.04..1799585184.18 rows=155832 width=68) (actual time=27.511..66616.807 rows=10000 loops=1)         ->
SeqScan on test t  (cost=0.00..2678.40 rows=156240 width=28) (actual time=0.010..4.685 rows=11456 loops=1)         ->
BitmapHeap Scan on data d1  (cost=11514.04..11518.05 rows=1 width=40) (actual time=5.807..5.807 rows=1 loops=11456)
         Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))               ->  BitmapAnd  (cost=11514.04..11514.04
rows=1width=0) (actual time=5.777..5.777 rows=0 loops=11456)                     ->  Bitmap Index Scan on data_mjd_idx
(cost=0.00..2501.40rows=42872 width=0) (actual time=3.920..3.920 rows=22241 loops=11456)
IndexCond: (mjd = t.mjd)                     ->  Bitmap Index Scan on data_objid_idx  (cost=0.00..8897.90 rows=415080
width=0)(actual time=0.025..0.025 rows=248 loops=11456)                           Index Cond: (objid = t.objid) Total
runtime:66622.026 ms
 
(11 rows)

Here is the output when bitmap scans are disabled:
             QUERY PLAN                                                                     QUERY PLAN 
 

----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..329631941.65 rows=10000 width=68) (actual time=0.082..906.876 rows=10000 loops=1)   ->  Nested Loop
(cost=0.00..4979486036.95rows=151062 width=68) (actual time=0.081..905.683 rows=10000 loops=1)         Join Filter:
(t.mjd= d1.mjd)         ->  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28) (actual time=0.009..1.679
rows=11456loops=1)         ->  Index Scan using data_objid_idx on data d1  (cost=0.00..26603.32 rows=415080 width=40)
(actualtime=0.010..0.050 rows=248 loops=11456)               Index Cond: (objid = t.objid) Total runtime: 907.462 ms
 

When the bitmap scans are enabled the "prof" of postgres shows    47.10%  postmaster  postgres           [.]
hash_search_with_hash_value           |            --- hash_search_with_hash_value
 
    11.06%  postmaster  postgres           [.] hash_seq_search            |            --- hash_seq_search
     6.95%  postmaster  postgres           [.] hash_any            |            --- hash_any
     5.17%  postmaster  postgres           [.] _bt_checkkeys            |            --- _bt_checkkeys
     4.07%  postmaster  postgres           [.] tbm_add_tuples            |            --- tbm_add_tuples
     3.41%  postmaster  postgres           [.] hash_search            |            --- hash_search


And the last note is that the crts.data table which is being bitmap scanned is 
a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan code
is somehow unprepared to combine two bitmaps for such a big table, and this
leads to the terrible performance.

Regards,    Sergey

PS Here are the schemas of the tables, just in case:
wsdb=> \d test          Table "koposov.test" Column  |       Type       | Modifiers
---------+------------------+----------- mjd     | double precision | fieldid | bigint           | intmag  | integer
     | objid   | bigint           |
 

wsdb=> \d crts.data           Table "crts.data" Column |       Type       | Modifiers
--------+------------------+----------- objid  | bigint           | mjd    | double precision | mag    | real
 | emag   | real             | ra     | double precision | dec    | double precision |
 
Indexes:    "data_mjd_idx" btree (mjd) WITH (fillfactor=100)    "data_objid_idx" btree (objid) WITH (fillfactor=100)
"data_q3c_ang2ipix_idx"btree (q3c_ang2ipix(ra, "dec")) WITH (fillfactor=100)
 

PPS shared_buffers=10GB, work_mem=1GB
All the test shown here were don in fully cached regime.

PPS I can believe that what I'm seeing is a feature, not a bug of bitmap scans,
and I can live with disabling them, but I still thought it's worth reporting.


*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Fwd: PATCH: psql boolean display
Next
From: Pavel Stehule
Date:
Subject: Re: bitmap scan much slower than index scan, hash_search_with_hash_value