Re: bitmap scan much slower than index scan, hash_search_with_hash_value - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: bitmap scan much slower than index scan, hash_search_with_hash_value
Date
Msg-id CAEYLb_UX0_5fw3-jW4neJa=bs0KmGpgH_9FWqVhKXh5AHrpAEQ@mail.gmail.com
Whole thread Raw
In response to bitmap scan much slower than index scan, hash_search_with_hash_value  (Sergey Koposov <koposov@ast.cam.ac.uk>)
Responses Re: bitmap scan much slower than index scan, hash_search_with_hash_value  (Sergey Koposov <koposov@ast.cam.ac.uk>)
List pgsql-hackers
On 2 September 2012 06:21, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
> Hi,
>
> I'm experiencing the case when bitmap scan is ~ 70 times slower than index
> scan which seems to be caused by 1) very big table 2) some hash search logic
> (hash_search_with_hash_value )
>
> Here is the explain analyze of the query with bitmap scans allowed:
>
> wsdb=> explain analyze select * from test as t, crts.data as d1
>                 where d1.objid=t.objid and d1.mjd=t.mjd limit 10000;
>                                                                       QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=11514.04..115493165.44 rows=10000 width=68) (actual
> time=27.512..66620.231 rows=10000 loops=1)
>    ->  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
> (actual time=27.511..66616.807 rows=10000 loops=1)
>          ->  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
> (actual time=0.010..4.685 rows=11456 loops=1)
>          ->  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
> width=40) (actual time=5.807..5.807 rows=1 loops=11456)
>                Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
>                ->  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
> (actual time=5.777..5.777 rows=0 loops=11456)
>                      ->  Bitmap Index Scan on data_mjd_idx
> (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
> loops=11456)
>                            Index Cond: (mjd = t.mjd)
>                      ->  Bitmap Index Scan on data_objid_idx
> (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
> loops=11456)
>                            Index Cond: (objid = t.objid)
>  Total runtime: 66622.026 ms
> (11 rows)
>
> Here is the output when bitmap scans are disabled:
> QUERY PLAN
>                                                                      QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..329631941.65 rows=10000 width=68) (actual
> time=0.082..906.876 rows=10000 loops=1)
>    ->  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
> time=0.081..905.683 rows=10000 loops=1)
>          Join Filter: (t.mjd = d1.mjd)
>          ->  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
> (actual time=0.009..1.679 rows=11456 loops=1)
>          ->  Index Scan using data_objid_idx on data d1
> (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
> rows=248 loops=11456)
>                Index Cond: (objid = t.objid)
>  Total runtime: 907.462 ms
>
> When the bitmap scans are enabled the "prof" of postgres shows
>     47.10%  postmaster  postgres           [.] hash_search_with_hash_value
>             |
>             --- hash_search_with_hash_value
>
>     11.06%  postmaster  postgres           [.] hash_seq_search
>             |
>             --- hash_seq_search
>
>      6.95%  postmaster  postgres           [.] hash_any
>             |
>             --- hash_any
>
>      5.17%  postmaster  postgres           [.] _bt_checkkeys
>             |
>             --- _bt_checkkeys
>
>      4.07%  postmaster  postgres           [.] tbm_add_tuples
>             |
>             --- tbm_add_tuples
>
>      3.41%  postmaster  postgres           [.] hash_search
>             |
>             --- hash_search
>
>
> And the last note is that the crts.data table which is being bitmap scanned
> is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
> code
> is somehow unprepared to combine two bitmaps for such a big table, and this
> leads to the terrible performance.

I think that this kind of question is better suited to the
pgsql-performance list. Granted, it was presented as a bug report
(though they're generally sent to -bugs rather than -hackers), but I
don't think that this is a valid bug.

One obvious red-flag from your query plans is that there is a
misestimation of the row return count of a few orders of magnitude in
the Bitmap Index Scan node. Did you trying performing an ANALYZE to
see if that helped? It may also be helpful to show pg_stats entries
for both the data.mjd and test.mjd columns. You may find, prior to
doing an ANALYZE, that there is no entries for one of those tables.

Turning off the enable_* planner options in production is generally
discouraged. Certainly, you'd be crazy to do that on a server-wide
basis.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: bitmap scan much slower than index scan, hash_search_with_hash_value
Next
From: Bruce Momjian
Date:
Subject: Re: Yet another failure mode in pg_upgrade