Home > mailing lists

Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: nested loop semijoin estimates
Date	June 2, 2015 18:50:31
Msg-id	30441.1433260205@sss.pgh.pa.us Whole thread Raw
In response to	Re: nested loop semijoin estimates (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: nested loop semijoin estimates
List	pgsql-hackers

Tree view

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 06/02/15 16:37, Tom Lane wrote:
>> It's possible that the change was due to random variation in ANALYZE
>> statistics, in which case it was just luck.

> I don't think so. I simply loaded the data, ran ANALYZE, and then simply 
> started either master or patched master. There should be no difference 
> in statistics, I believe. Also, the plans contain pretty much the same 
> row counts, but the costs differ.

> For example look at the 'cs_ui' CTE, right at the beginning of the 
> analyze logs. The row counts are exactly the same, but the costs are 
> different. And it's not using semijoins or not nested loops ...

The cost estimates in that CTE all look exactly the same to me, and the
actual runtime's not all that different either.  The key runtime
difference in the first query seems to be in the first couple of joins
to the cs_ui CTE:
       ->  Nested Loop  (cost=4.63..724.34 rows=16 width=67) (actual time=8346.904..14024.947 rows=117 loops=1)
   Join Filter: (item.i_item_sk = store_returns.sr_item_sk)             ->  Nested Loop  (cost=0.29..662.07 rows=1
width=59)(actual time=8324.352..13618.106 rows=8 loops=1)                   ->  CTE Scan on cs_ui  (cost=0.00..2.16
rows=108width=4) (actual time=7264.670..7424.096 rows=17169 loops=1)                   ->  Index Scan using item_pkey
onitem  (cost=0.29..6.10 rows=1 width=55) (actual time=0.356..0.356 rows=0 loops=17169)                         Index
Cond:(i_item_sk = cs_ui.cs_item_sk)                         Filter: ((i_current_price >= '79'::numeric) AND
(i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND
(i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[])))                         Rows Removed
byFilter: 1             ->  Bitmap Heap Scan on store_returns  (cost=4.34..62.05 rows=18 width=8) (actual
time=32.525..50.662rows=15 loops=8)                   Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk)
HeapBlocks: exact=117                   ->  Bitmap Index Scan on idx_sr_item_sk  (cost=0.00..4.34 rows=18 width=0)
(actualtime=19.747..19.747 rows=15 loops=8)                         Index Cond: (sr_item_sk = cs_ui.cs_item_sk)
 

vs patched
       ->  Nested Loop  (cost=4.63..724.34 rows=16 width=67) (actual time=6867.921..7001.417 rows=117 loops=1)
  Join Filter: (item.i_item_sk = store_returns.sr_item_sk)             ->  Nested Loop  (cost=0.29..662.07 rows=1
width=59)(actual time=6867.874..7000.211 rows=8 loops=1)                   ->  CTE Scan on cs_ui  (cost=0.00..2.16
rows=108width=4) (actual time=6865.792..6924.816 rows=17169 loops=1)                   ->  Index Scan using item_pkey
onitem  (cost=0.29..6.10 rows=1 width=55) (actual time=0.003..0.003 rows=0 loops=17169)                         Index
Cond:(i_item_sk = cs_ui.cs_item_sk)                         Filter: ((i_current_price >= '79'::numeric) AND
(i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND
(i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[])))                         Rows Removed
byFilter: 1             ->  Bitmap Heap Scan on store_returns  (cost=4.34..62.05 rows=18 width=8) (actual
time=0.025..0.116rows=15 loops=8)                   Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk)
HeapBlocks: exact=117                   ->  Bitmap Index Scan on idx_sr_item_sk  (cost=0.00..4.34 rows=18 width=0)
(actualtime=0.017..0.017 rows=15 loops=8)                         Index Cond: (sr_item_sk = cs_ui.cs_item_sk)
 

Those are the exact same plans, and same cost estimates, but for some
reason the master run is 2X slower.  The only explanation I can think of
is disk caching effects, or maybe hint-bit setting during the first run.
It's certainly not the planner that deserves either credit or blame for
that.

The part of this plan that's actually different is a different join order
sequence for a lot of follow-on joins that are expected to get single rows
from their other table.  I think that's basically irrelevant really, but
again, this patch should not have changed anything there, since those were
plain joins not semijoins.
        regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 02 June 2015, 18:46:49
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Robert Haas
Date: 02 June 2015, 18:50:56
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Re: nested loop semijoin estimates - Mailing list pgsql-hackers

Previous

Next