Re: nested loop semijoin estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: nested loop semijoin estimates |
Date | |
Msg-id | 30441.1433260205@sss.pgh.pa.us Whole thread Raw |
In response to | Re: nested loop semijoin estimates (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: nested loop semijoin estimates
|
List | pgsql-hackers |
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 06/02/15 16:37, Tom Lane wrote: >> It's possible that the change was due to random variation in ANALYZE >> statistics, in which case it was just luck. > I don't think so. I simply loaded the data, ran ANALYZE, and then simply > started either master or patched master. There should be no difference > in statistics, I believe. Also, the plans contain pretty much the same > row counts, but the costs differ. > For example look at the 'cs_ui' CTE, right at the beginning of the > analyze logs. The row counts are exactly the same, but the costs are > different. And it's not using semijoins or not nested loops ... The cost estimates in that CTE all look exactly the same to me, and the actual runtime's not all that different either. The key runtime difference in the first query seems to be in the first couple of joins to the cs_ui CTE: -> Nested Loop (cost=4.63..724.34 rows=16 width=67) (actual time=8346.904..14024.947 rows=117 loops=1) Join Filter: (item.i_item_sk = store_returns.sr_item_sk) -> Nested Loop (cost=0.29..662.07 rows=1 width=59)(actual time=8324.352..13618.106 rows=8 loops=1) -> CTE Scan on cs_ui (cost=0.00..2.16 rows=108width=4) (actual time=7264.670..7424.096 rows=17169 loops=1) -> Index Scan using item_pkey onitem (cost=0.29..6.10 rows=1 width=55) (actual time=0.356..0.356 rows=0 loops=17169) Index Cond:(i_item_sk = cs_ui.cs_item_sk) Filter: ((i_current_price >= '79'::numeric) AND (i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND (i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[]))) Rows Removed byFilter: 1 -> Bitmap Heap Scan on store_returns (cost=4.34..62.05 rows=18 width=8) (actual time=32.525..50.662rows=15 loops=8) Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk) HeapBlocks: exact=117 -> Bitmap Index Scan on idx_sr_item_sk (cost=0.00..4.34 rows=18 width=0) (actualtime=19.747..19.747 rows=15 loops=8) Index Cond: (sr_item_sk = cs_ui.cs_item_sk) vs patched -> Nested Loop (cost=4.63..724.34 rows=16 width=67) (actual time=6867.921..7001.417 rows=117 loops=1) Join Filter: (item.i_item_sk = store_returns.sr_item_sk) -> Nested Loop (cost=0.29..662.07 rows=1 width=59)(actual time=6867.874..7000.211 rows=8 loops=1) -> CTE Scan on cs_ui (cost=0.00..2.16 rows=108width=4) (actual time=6865.792..6924.816 rows=17169 loops=1) -> Index Scan using item_pkey onitem (cost=0.29..6.10 rows=1 width=55) (actual time=0.003..0.003 rows=0 loops=17169) Index Cond:(i_item_sk = cs_ui.cs_item_sk) Filter: ((i_current_price >= '79'::numeric) AND (i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND (i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[]))) Rows Removed byFilter: 1 -> Bitmap Heap Scan on store_returns (cost=4.34..62.05 rows=18 width=8) (actual time=0.025..0.116rows=15 loops=8) Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk) HeapBlocks: exact=117 -> Bitmap Index Scan on idx_sr_item_sk (cost=0.00..4.34 rows=18 width=0) (actualtime=0.017..0.017 rows=15 loops=8) Index Cond: (sr_item_sk = cs_ui.cs_item_sk) Those are the exact same plans, and same cost estimates, but for some reason the master run is 2X slower. The only explanation I can think of is disk caching effects, or maybe hint-bit setting during the first run. It's certainly not the planner that deserves either credit or blame for that. The part of this plan that's actually different is a different join order sequence for a lot of follow-on joins that are expected to get single rows from their other table. I think that's basically irrelevant really, but again, this patch should not have changed anything there, since those were plain joins not semijoins. regards, tom lane
pgsql-hackers by date: