Re: bitmap scan much slower than index scan, hash_search_with_hash_value - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: bitmap scan much slower than index scan, hash_search_with_hash_value
Date
Msg-id CAFj8pRA3f19NsDd7rNRgJw9mLv=FEFXNQRJivR+Te-X1jOCY7A@mail.gmail.com
Whole thread Raw
In response to bitmap scan much slower than index scan, hash_search_with_hash_value  (Sergey Koposov <koposov@ast.cam.ac.uk>)
Responses Re: bitmap scan much slower than index scan, hash_search_with_hash_value
List pgsql-hackers
Hello

2012/9/2 Sergey Koposov <koposov@ast.cam.ac.uk>:
> Hi,
>
> I'm experiencing the case when bitmap scan is ~ 70 times slower than index
> scan which seems to be caused by 1) very big table 2) some hash search logic
> (hash_search_with_hash_value )
>
> Here is the explain analyze of the query with bitmap scans allowed:
>
> wsdb=> explain analyze select * from test as t, crts.data as d1
>                 where d1.objid=t.objid and d1.mjd=t.mjd limit 10000;
>                                                                       QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=11514.04..115493165.44 rows=10000 width=68) (actual
> time=27.512..66620.231 rows=10000 loops=1)
>    ->  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
> (actual time=27.511..66616.807 rows=10000 loops=1)
>          ->  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
> (actual time=0.010..4.685 rows=11456 loops=1)
>          ->  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
> width=40) (actual time=5.807..5.807 rows=1 loops=11456)
>                Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
>                ->  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
> (actual time=5.777..5.777 rows=0 loops=11456)
>                      ->  Bitmap Index Scan on data_mjd_idx
> (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
> loops=11456)
>                            Index Cond: (mjd = t.mjd)



>                      ->  Bitmap Index Scan on data_objid_idx
> (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
> loops=11456)

statistics on data_objid_idx  table are absolutly out - so planner
cannot find optimal plan

Regard

Pavel Stehule

>                            Index Cond: (objid = t.objid)
>  Total runtime: 66622.026 ms
> (11 rows)
>
> Here is the output when bitmap scans are disabled:
> QUERY PLAN
>                                                                      QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..329631941.65 rows=10000 width=68) (actual
> time=0.082..906.876 rows=10000 loops=1)
>    ->  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
> time=0.081..905.683 rows=10000 loops=1)
>          Join Filter: (t.mjd = d1.mjd)
>          ->  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
> (actual time=0.009..1.679 rows=11456 loops=1)
>          ->  Index Scan using data_objid_idx on data d1
> (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
> rows=248 loops=11456)
>                Index Cond: (objid = t.objid)
>  Total runtime: 907.462 ms
>
> When the bitmap scans are enabled the "prof" of postgres shows
>     47.10%  postmaster  postgres           [.] hash_search_with_hash_value
>             |
>             --- hash_search_with_hash_value
>
>     11.06%  postmaster  postgres           [.] hash_seq_search
>             |
>             --- hash_seq_search
>
>      6.95%  postmaster  postgres           [.] hash_any
>             |
>             --- hash_any
>
>      5.17%  postmaster  postgres           [.] _bt_checkkeys
>             |
>             --- _bt_checkkeys
>
>      4.07%  postmaster  postgres           [.] tbm_add_tuples
>             |
>             --- tbm_add_tuples
>
>      3.41%  postmaster  postgres           [.] hash_search
>             |
>             --- hash_search
>
>
> And the last note is that the crts.data table which is being bitmap scanned
> is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
> code
> is somehow unprepared to combine two bitmaps for such a big table, and this
> leads to the terrible performance.
>
> Regards,
>         Sergey
>
> PS Here are the schemas of the tables, just in case:
> wsdb=> \d test
>           Table "koposov.test"
>  Column  |       Type       | Modifiers
> ---------+------------------+-----------
>  mjd     | double precision |
>  fieldid | bigint           |
>  intmag  | integer          |
>  objid   | bigint           |
>
> wsdb=> \d crts.data
>            Table "crts.data"
>  Column |       Type       | Modifiers
> --------+------------------+-----------
>  objid  | bigint           |
>  mjd    | double precision |
>  mag    | real             |
>  emag   | real             |
>  ra     | double precision |
>  dec    | double precision |
> Indexes:
>     "data_mjd_idx" btree (mjd) WITH (fillfactor=100)
>     "data_objid_idx" btree (objid) WITH (fillfactor=100)
>     "data_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec")) WITH
> (fillfactor=100)
>
> PPS shared_buffers=10GB, work_mem=1GB
> All the test shown here were don in fully cached regime.
>
> PPS I can believe that what I'm seeing is a feature, not a bug of bitmap
> scans,
> and I can live with disabling them, but I still thought it's worth
> reporting.
>
>
> *****************************************************
> Sergey E. Koposov, PhD, Research Associate
> Institute of Astronomy, University of Cambridge
> Madingley road, CB3 0HA, Cambridge, UK
> Tel: +44-1223-337-551
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



pgsql-hackers by date:

Previous
From: Sergey Koposov
Date:
Subject: bitmap scan much slower than index scan, hash_search_with_hash_value
Next
From: Peter Geoghegan
Date:
Subject: Re: bitmap scan much slower than index scan, hash_search_with_hash_value