BUG #6075: mergejoin early termination estimates bug or defficiency - Mailing list pgsql-bugs

From Roman
Subject BUG #6075: mergejoin early termination estimates bug or defficiency
Date
Msg-id 201106222235.p5MMZS45031751@wwwmaster.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged online:

Bug reference:      6075
Logged by:          Roman
Email address:      roman_mir@hotmail.com
PostgreSQL version: 9.0.3
Operating system:   Red Hat 4.4.1-2, 64-bit
Description:        mergejoin early termination estimates bug or defficiency
Details:

Entered PG IRC, spoke with RhodiumToad, who advised that this must be
reported as a bug or deficiency.

http://pgsql.privatepaste.com/09af338ded   - the SQL and EXPLAIN ANALYZE
output, and http://explain.depesz.com/s/RDN8  - explanation.

SQL took 30 seconds to perform, HashAggregate was 8239.96..8240.28

One missing index was added, took runtime to 27 seconds. Eventually executed
"set enable_mergejoin=false;" command and runtime of the SQL went down to 38
MILLISECONDS.

From 27 seconds down to 38ms with mergejoin off.

Tried setting cpu tuple to 0.1, this did not improve the situation, ended up
setting mergejoin to off in the postgresql.conf

If more details are needed, please contact me at roman_mir@hotmail.com

--
In case the paste bin is emptied, here is the SQL:

select  r.store_id, sum(ri.amount_bought) as count,  CASE WHEN
sum(least(ri.total_price_before_discount, ri.total_price_after_discount))<0
AND  sum(greatest(ri.total_price_before_discount,
ri.total_price_after_discount))<0  THEN
sum(greatest(ri.total_price_before_discount, ri.total_price_after_discount))
 ELSE sum(least(ri.total_price_before_discount,
ri.total_price_after_discount)) END as total_sale,  CASE WHEN
sum(ri.total_discount_amount)<>0 THEN sum(ri.total_discount_amount) ELSE 0
END as total_discount,  ri.receipt_id,  ri.barcode as barcode,  ri.item_id,
ri.master_product_id,  null as name,  null as price from  receipt r,
receipt_item ri where  r.store_id in (0,1,2,3,5,6,7,8,9,10,12,13,14) and
r.receipt_id=ri.receipt_id  and  r.receipt_date>'2011-06-22'  and
r.receipt_date<'2011-06-24'  and ri.receipt_item_id in  ( select distinct
ri.receipt_item_id from receipt r, receipt_item ri, master_product mp,
master_product_label mpl where  r.receipt_id=ri.receipt_id and
ri.master_product_id=mp.product_id and mp.product_id=mpl.product_id and  (
(mpl.label_id= 530))  and  r.receipt_date>'2011-06-22'  and
r.receipt_date<'2011-06-24'  )  group by  r.store_id,  ri.receipt_id,
ri.barcode,  ri.item_id,  ri.master_product_id;




Here is the EXPLAIN ANALYZE with mergejoin on:

 HashAggregate  (cost=8579.64..8579.68 rows=1 width=72) (actual
time=30901.150..30901.391 rows=352 loops=1)
   ->  Nested Loop  (cost=8255.64..8579.61 rows=1 width=72) (actual
time=30897.164..30900.399 rows=356 loops=1)
         ->  Nested Loop  (cost=8255.64..8547.23 rows=32 width=68) (actual
time=30897.138..30898.823 rows=356 loops=1)
               ->  HashAggregate  (cost=8255.64..8255.96 rows=32 width=8)
(actual time=30897.127..30897.252 rows=356 loops=1)
                     ->  Merge Join  (cost=1165.04..8255.56 rows=32 width=8)
(actual time=62.582..30896.810 rows=356 loops=1)
                           Merge Cond: (ri.master_product_id =
mpl.product_id)
                           ->  Merge Join  (cost=3.19..243053.40 rows=1099
width=24) (actual time=0.038..30890.508 rows=13204 loops=1)
                                 Merge Cond: (ri.master_product_id =
mp.product_id)
                                 ->  Nested Loop  (cost=0.00..8099569.68
rows=37762 width=16) (actual time=0.032..30852.168 rows=13204 loops=1)
                                       ->  Index Scan using
receipt_item_master_product_idx on receipt_item ri  (cost=0.00..729693.91
rows=7400642 width=24) (actual time=0.014..4165.121 rows=7398781 loops=1)
                                       ->  Index Scan using receipt_pkey on
receipt r  (cost=0.00..0.98 rows=1 width=8) (actual time=0.003..0.003 rows=0
loops=7398781)
                                             Index Cond: (r.receipt_id =
ri.receipt_id)
                                             Filter: ((r.receipt_date >
'2011-06-22 00:00:00'::timestamp without time zone) AND (r.receipt_date <
'2011-06-24 00:00:00'::timestamp without time zone))
                                 ->  Index Scan using master_product_pkey on
master_product mp  (cost=0.00..7229.15 rows=34421 width=8) (actual
time=0.004..24.787 rows=41681 loops=1)
                           ->  Sort  (cost=1161.84..1168.42 rows=2629
width=8) (actual time=3.429..3.698 rows=2884 loops=1)
                                 Sort Key: mpl.product_id
                                 Sort Method:  quicksort  Memory: 224kB
                                 ->  Bitmap Heap Scan on
master_product_label mpl  (cost=60.65..1012.51 rows=2629 width=8) (actual
time=0.475..1.615 rows=2715 loops=1)
                                       Recheck Cond: (label_id = 530)
                                       ->  Bitmap Index Scan on
master_product_label_labelid_idx  (cost=0.00..59.99 rows=2629 width=0)
(actual time=0.398..0.398 rows=2715 loops=1)
                                             Index Cond: (label_id = 530)
               ->  Index Scan using receipt_item_pkey on receipt_item ri
(cost=0.00..9.08 rows=1 width=76) (actual time=0.004..0.004 rows=1
loops=356)
                     Index Cond: (ri.receipt_item_id = ri.receipt_item_id)
         ->  Index Scan using receipt_pkey on receipt r  (cost=0.00..1.00
rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=356)
               Index Cond: (r.receipt_id = ri.receipt_id)
               Filter: ((r.receipt_date > '2011-06-22 00:00:00'::timestamp
without time zone) AND (r.receipt_date < '2011-06-24 00:00:00'::timestamp
without time zone) AND (r.store_id = ANY
('{0,1,2,3,5,6,7,8,9,10,12,13,14}'::integer[])))
 Total runtime: 30901.690 ms



Here is the EXPLAIN ANALYZE with mergejoin off:

 HashAggregate  (cost=156852.07..156852.11 rows=1 width=72) (actual
time=38.404..38.647 rows=352 loops=1)
   ->  Nested Loop  (cost=156536.82..156852.04 rows=1 width=72) (actual
time=34.483..37.646 rows=356 loops=1)
         ->  Nested Loop  (cost=156536.82..156828.40 rows=32 width=68)
(actual time=34.464..36.153 rows=356 loops=1)
               ->  HashAggregate  (cost=156536.82..156537.14 rows=32
width=8) (actual time=34.454..34.550 rows=356 loops=1)
                     ->  Nested Loop  (cost=1189.44..156536.74 rows=32
width=8) (actual time=3.530..34.244 rows=356 loops=1)
                           ->  Hash Join  (cost=1189.44..156185.25 rows=1097
width=24) (actual time=3.522..32.878 rows=356 loops=1)
                                 Hash Cond: (ri.master_product_id =
mpl.product_id)
                                 ->  Nested Loop  (cost=144.07..153714.67
rows=37713 width=16) (actual time=0.766..27.303 rows=13204 loops=1)
                                       ->  Bitmap Heap Scan on receipt r
(cost=144.07..12325.61 rows=3819 width=8) (actual time=0.753..2.032
rows=4435 loops=1)
                                             Recheck Cond: ((receipt_date >
'2011-06-22 00:00:00'::timestamp without time zone) AND (receipt_date <
'2011-06-24 00:00:00'::timestamp without time zone))
                                             ->  Bitmap Index Scan on
receipt_receipt_date_idx  (cost=0.00..143.12 rows=3819 width=0) (actual
time=0.691..0.691 rows=4435 loops=1)
                                                   Index Cond:
((receipt_date > '2011-06-22 00:00:00'::timestamp without time zone) AND
(receipt_date < '2011-06-24 00:00:00'::timestamp without time zone))
                                       ->  Index Scan using
receipt_item_receiptid_idx on receipt_item ri  (cost=0.00..36.90 rows=10
width=24) (actual time=0.003..0.005 rows=3 loops=4435)
                                             Index Cond: (ri.receipt_id =
r.receipt_id)
                                 ->  Hash  (cost=1012.51..1012.51 rows=2629
width=8) (actual time=2.333..2.333 rows=2715 loops=1)
                                       Buckets: 1024  Batches: 1  Memory
Usage: 107kB
                                       ->  Bitmap Heap Scan on
master_product_label mpl  (cost=60.65..1012.51 rows=2629 width=8) (actual
time=0.478..1.689 rows=2715 loops=1)
                                             Recheck Cond: (label_id = 530)
                                             ->  Bitmap Index Scan on
master_product_label_labelid_idx  (cost=0.00..59.99 rows=2629 width=0)
(actual time=0.407..0.407 rows=2715 loops=1)
                                                   Index Cond: (label_id =
530)
                           ->  Index Scan using master_product_pkey on
master_product mp  (cost=0.00..0.31 rows=1 width=8) (actual
time=0.003..0.003 rows=1 loops=356)
                                 Index Cond: (mp.product_id =
ri.master_product_id)
               ->  Index Scan using receipt_item_pkey on receipt_item ri
(cost=0.00..9.08 rows=1 width=76) (actual time=0.004..0.004 rows=1
loops=356)
                     Index Cond: (ri.receipt_item_id = ri.receipt_item_id)
         ->  Index Scan using receipt_receiptid_receiptdate_idx on receipt r
 (cost=0.00..0.73 rows=1 width=12) (actual time=0.003..0.004 rows=1
loops=356)
               Index Cond: ((r.receipt_id = ri.receipt_id) AND
(r.receipt_date > '2011-06-22 00:00:00'::timestamp without time zone) AND
(r.receipt_date < '2011-06-24 00:00:00'::timestamp without time zone))
               Filter: (r.store_id = ANY
('{0,1,2,3,5,6,7,8,9,10,12,13,14}'::integer[]))
 Total runtime: 38.820 ms


Thank you,
Roman

pgsql-bugs by date:

Previous
From: "Kruus, Robert ENV"
Date:
Subject: Re: BUG #6069: C function returns NaN when part of an SQL function
Next
From: Dave Page
Date:
Subject: Re: BUG #6074: postgresql service script