Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 |
Date | |
Msg-id | e44da652-2ecd-af34-a087-54fe327d5308@enterprisedb.com Whole thread Raw |
In response to | Wrong rows estimations with joins of CTEs slows queries by more than factor 500 (Hans Buschmann <buschmann@nidsa.net>) |
Responses |
AW: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
|
List | pgsql-hackers |
On 2/8/23 14:55, Hans Buschmann wrote: > During data refactoring of our Application I encountered $subject when > joining 4 CTEs with left join or inner join. > > > 1. Background > > PG 15.1 on Windows x64 (OS seems no to have no meening here) > > > I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping > certain data (4 CTEs qup,qli,qin,qou) > > The grouping of the data in the CTEs gives estimated row counts of about > 1000 (1 tenth of the real value) This is OK for estimation. > > > These 4 CTEs are then used to combine the data by joining them. > > > 2. Problem > > The 4 CTEs are joined by left joins as shown below: > ... > > This case really brought me to detect the problem! > > The original query and data are not shown here, but the principle should > be clear from the execution plans. > > I think the planner shouldn't change the row estimations on further > steps after left joins at all, and be a bit more conservative on inner > joins. But the code should alredy do exactly that, see: https://github.com/postgres/postgres/blob/dbe8a1726cfd5a09cf1ef99e76f5f89e2efada71/src/backend/optimizer/path/costsize.c#L5212 And in fact, the second part of the plains shows it's doing the trick: -> Merge Left Join (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.321..2.556 rows=1415 loops=1) Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text)) -> Sort (cost=641.68..656.02 rows=5733 width=72) -> Sort (cost=651.57..666.11 rows=5816 width=72) But notice the first join (with rows=33) doesn't say "Left". And I see there's Append on top, so presumably the query is much more complex, and there's a regular join of these CTEs in some other part. We'll need to se the whole query, not just one chunk of it. FWIW it seems you're using materialized CTEs - that's likely pretty bad for the estimates, because we don't propagate statistics from the CTE. So a join on CTEs can't see statistics from the underlying tables, and that can easily produce really bad estimates. I'm assuming you're not using AS MATERIALIZED explicitly, so I'd bet this happens because the "cardinality" function is marked as volatile. Perhaps it can be redefined as stable/immutable. > This may be related to the fact that this case has 2 join-conditions > (xx_season an xx_code). That shouldn't affect outer join estimates this way (but as I explained above, the join does not seem to be "left" per the explain). Multi-column joins can cause issues, no doubt about it - but CTEs make it worse because we can't e.g. see foreign keys. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: