Thread: Slow 3 Table Join with v bad row estimate
David Osborne <david@qcode.co.uk> writes: > We have 3 different ways we have to do the final X join condition (we use 3 > subqueries UNIONed together), but the one causing the issues is: > (o.branch_code || o.po_number = replace(ss.order_no,' ','')) > ... So we can see straight away that the outer Nested loop expects 1 row, and > gets 53595. This isn't going to help the planner pick the most efficient > plan I suspect. > I've tried increasing default_statistics_target to the max and re analysing > all the tables involved but this does not help the estimate. > I suspect it's due to the join being based on functional result meaning any > stats are ignored? Yeah, the planner is not nearly smart enough to draw any useful conclusions about the selectivity of that clause from standard statistics. What you might try doing is creating functional indexes on the two subexpressions: create index on branch_purchase_order ((branch_code || po_number)); create index on stocksales_ib (replace(order_no,' ','')); (actually it looks like you've already got the latter one) and then re-ANALYZING. I'm not necessarily expecting that the planner will actually choose to use these indexes in its plan; but their existence will prompt ANALYZE to gather stats about the expression results, and that should at least let the planner draw more-accurate conclusions about the selectivity of the equality constraint. regards, tom lane
In addition, also no change if I change the query to have the join ss.order_no=o.branch_code || ' ' || o.po_number and create an index on (branch_code || ' ' || o.po_number)
Yeah, the planner is not nearly smart enough to draw any useful
conclusions about the selectivity of that clause from standard statistics.
What you might try doing is creating functional indexes on the two
subexpressions:
create index on branch_purchase_order ((branch_code || po_number));
create index on stocksales_ib (replace(order_no,' ',''));
(actually it looks like you've already got the latter one) and then
re-ANALYZING. I'm not necessarily expecting that the planner will
actually choose to use these indexes in its plan; but their existence
will prompt ANALYZE to gather stats about the expression results,
and that should at least let the planner draw more-accurate conclusions
about the selectivity of the equality constraint.
regards, tom lane
David Osborne <david@qcode.co.uk> writes: > Doesn't seem to quite do the trick. I created both those indexes (or the > missing one at least) > Then I ran analyse on stocksales_ib and branch_purchase_order. > I checked there were stats held in pg_stats for both indexes, which there > were. > But the query plan still predicts 1 row and comes up with the same plan. Meh. In that case, likely the explanation is that the various conditions in your query are highly correlated, and the planner is underestimating the number of rows that will satisfy them because it doesn't know about the correlation. But taking a step back, it seems like the core problem in your explain output is here: >> -> Nested Loop (cost=1.29..83263.71 rows=1 width=24) (actual time=0.196..23799.930 rows=53595 loops=1) >> Join Filter: (o.po_id = p.po_id) >> Rows Removed by Join Filter: 23006061 >> Buffers: shared hit=23217993 dirtied=1 That's an awful lot of rows being formed by the join only to be rejected. You should try creating an index on branch_purchase_order_products(po_id, product_code) so that the po_id condition could be enforced at the inner indexscan instead of the join. regards, tom lane
A 23000ms improvement.
But taking a step back, it seems like the core problem in your explain
output is here:
>> -> Nested Loop (cost=1.29..83263.71 rows=1 width=24) (actual time=0.196..23799.930 rows=53595 loops=1)
>> Join Filter: (o.po_id = p.po_id)
>> Rows Removed by Join Filter: 23006061
>> Buffers: shared hit=23217993 dirtied=1
That's an awful lot of rows being formed by the join only to be rejected.
You should try creating an index on
branch_purchase_order_products(po_id, product_code)
so that the po_id condition could be enforced at the inner indexscan
instead of the join.
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of David Osborne
Sent: Tuesday, November 10, 2015 12:32 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Ok - wow.
Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms.
A 23000ms improvement.
This is great. So as a general rule of thumb, if I see a Join Filter removing an excessive number of rows, I can check if that condition can be added to an index from the same table which is already being scanned.
Thanks for this!
David,
I believe the plan you are posting is the old plan.
Could you please post explain analyze with the index that Tom suggested?
Regards,
Igor Neyman