Re: Any way to favor index scans, but not bitmap index scans? - Mailing list pgsql-general

From Francisco Reyes
Subject Re: Any way to favor index scans, but not bitmap index scans?
Date
Msg-id 0ae6b4fd279686f9a57adb020983134c@stringsutils.com
Whole thread Raw
In response to Re: Any way to favor index scans, but not bitmap index scans?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Any way to favor index scans, but not bitmap index scans?
Re: Any way to favor index scans, but not bitmap index scans?
List pgsql-general
On 12:40 pm 07/23/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That's fairly hard to believe.  Care to offer some details?


I will dig that actual project and run explain analyze. Will likely not
have it till middle of next week though because of a monthly process
starting out Friday.

However, I do have a current example where bitmap index scan was 3 times
worse. On the extremely bad case the data set was 5 times larger than the
sample below (1+ billion vs 215 million).

Sequential scan: 12 minutes
Seq scan dissabled -> bitmap index scan: 29.98 minutes
Seq scan disabled and bitmap index disabled: 3 seconds!

I have to mask the names so any discrepancy in names likely just my mistake.

2 tables involved.
historical has 215 million rows.
join_ids has 2.5 million rows.

A join from join_ids to historical will only touch about 40% of the records
in historical.

The queries below only returned 0.2% (less than 1%) of records from the
historical table.



default_statistics_target = 1000
random_page_cost = 4.0

Default query before changing settings.
Aggregate  (cost=7656776.19..7656776.20 rows=1 width=12) (actual
time=719661.082..719661.085 rows=1 loops=1)
   ->  Hash Join  (cost=9260.90..7653867.79 rows=387785 width=12)
         (actual time=2249.423..719109.201 rows=194734 loops=1)
         Hash Cond: (historical.join_id = join_ids.join_id)
         ->  Seq Scan on historical
                (cost=0.00..5825538.00 rows=207450405 width=16) (actual
time=7.966..410078.540 rows=207589225 loops=1)
               Filter: ((f5 > 0::numeric) AND (date > '2007-04-01'::date)
AND (date < '2008-05-01'::date))
         ->  Hash  (cost=9198.15..9198.15 rows=5020 width=4) (actual
time=2210.953..2210.953 rows=4437 loops=1)
               ->  Bitmap Heap Scan on join_ids join_ids
(cost=163.00..9198.15 rows=5020 width=4)
                   (actual time=247.903..2201.789 rows=4437 loops=1)
                     Recheck Cond: (customer_id = ANY ('{1014,2485,4636,4635,1255,547,374,580}'::integer[]))
                     ->  Bitmap Index Scan on join_ids_customer_id_join_id
(cost=0.00..161.74 rows=5020 width=0)
                         (actual time=241.111..241.111 rows=4437 loops=1)
                           Index Cond: (customer_id = ANY
('{1014,2485,4636,4635,1255,547,374,580}'::integer[]))
Total runtime: 719816.542 ms --> 12 minutes


SET ENABLE_SEQSCAN TO OFF;
Aggregate  (cost=11867354.72..11867354.73 rows=1 width=12) (actual
time=1798829.579..1798829.581 rows=1 loops=1)
   ->  Hash Join  (cost=4645436.35..11864446.33 rows=387785 width=12)
         (actual time=1086218.285..1798250.004 rows=194734 loops=1)
         Hash Cond: (historical.join_id = join_ids.join_id)
         ->  Bitmap Heap Scan on historical
             (cost=4636175.45..10036116.53 rows=207450405 width=16)
               (actual time=1086158.692..1487577.412 rows=207589225 loops=1)
               Recheck Cond: ((date > '2007-04-01'::date) AND (date <
'2008-05-01'::date))
               Filter: (f5 > 0::numeric)
               ->  Bitmap Index Scan on historical_join_id_date
                  (cost=0.00..4584312.85 rows=210080576 width=0)
                   (actual time=1085395.070..1085395.070 rows=210233223
loops=1)
                     Index Cond: ((date > '2007-04-01'::date) AND (date <
'2008-05-01'::date))
         ->  Hash  (cost=9198.15..9198.15 rows=5020 width=4) (actual
time=18.712..18.712 rows=4437 loops=1)
               ->  Bitmap Heap Scan on join_ids (cost=163.00..9198.15
rows=5020 width=4)
                    (actual time=1.541..11.654 rows=4437 loops=1)
                     Recheck Cond: (customer_id = ANY ('{1014,2485,4636,4635,1255,547,374,580}'::integer[]))
                     ->  Bitmap Index Scan on join_ids_customer_id_join_id
                         (cost=0.00..161.74 rows=5020 width=0) (actual
time=0.984..0.984 rows=4437 loops=1)
                           Index Cond: (customer_id = ANY
('{1014,2485,4636,4635,1255,547,374,580}'::integer[]))
 Total runtime: 1798847.930 ms --> 29.98 minutes


SET ENABLE_SEQSCAN TO OFF;
SET ENABLE_BITMAPSCAN TO OFF;
Aggregate  (cost=25665216.10..25665216.11 rows=1 width=12) (actual
time=3088.894..3088.896 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..25662307.70 rows=387785 width=12)
             (actual time=0.264..2624.680 rows=194734 loops=1)
         ->  Index Scan using join_ids_join_id on join_ids
             (cost=0.00..2867051.21 rows=5020 width=4) (actual
time=0.237..1236.019 rows=4437 loops=1)
               Filter: (customer_id = ANY ('{1014,2485,4636,4635,1255,547,374,580}'::integer[]))
         ->  Index Scan using historical_join_id_date on historical
             (cost=0.00..4522.43 rows=1477 width=16) (actual
time=0.010..0.153 rows=44 loops=4437)
               Index Cond: ((historical.join_id = join_ids.join_id) AND
(historical.date > '2007-04-01'::date)
               AND (historical.date < '2008-05-01'::date))
               Filter: (trans.f5 > 0::numeric)
 Total runtime: 3091.227 ms --> 3 seconds


pgsql-general by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: Substitute a variable in PL/PGSQL.
Next
From: "Francisco Reyes"
Date:
Subject: Re: A couple of newbie questions ...