Re: Why am I getting great/terrible estimates with these CTE queries? - Mailing list pgsql-performance

From Tom Lane
Subject Re: Why am I getting great/terrible estimates with these CTE queries?
Date
Msg-id 9075.1349824161@sss.pgh.pa.us
Whole thread Raw
In response to Why am I getting great/terrible estimates with these CTE queries?  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: Why am I getting great/terrible estimates with these CTE queries?
List pgsql-performance
Tomas Vondra <tv@fuzzy.cz> writes:
> I've been fighting with some CTE queries recently, and in the end I've
> ended up with two basic cases. In one case the CTEs work absolutely
> great, making the estimates much more precise, while in the other the
> results are pretty terrible. And I'm not sure why both of these behave
> the way they do.

You're assuming the case where the estimate is better is better for a
reason ... but it's only better as a result of blind dumb luck.  The
outer-level query planner doesn't know anything about the CTE's output
except the estimated number of rows --- in particular, it doesn't drill
down to find any statistics about the join column.  So what you're
getting there is a default selectivity estimate that just happens to
match reality in this case.  (If you work through the math in
eqjoinsel_inner for the case where the relation sizes are grossly
different and we don't have MCV stats, you'll find that it comes out to
be assuming that each row in the larger relation has one join partner in
the smaller one, which indeed is your situation here.)  In the other
example, you're likewise getting a default selectivity estimate, only it
doesn't match so well, because the default does not include assuming
that the join keys are unique on both sides.  Without the CTEs, the
optimizer can see the keys are unique so it makes the right selectivity
estimate.

In principle we could make the optimizer try to drill down for stats,
which would make these examples work the same with or without the CTE
layers.  I'm not sure it's worth the trouble though --- I'm dubious that
people would use a CTE for cases that are simple enough for the stats
estimates to be worth anything.

            regards, tom lane


pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Why am I getting great/terrible estimates with these CTE queries?
Next
From: Craig James
Date:
Subject: Re: Hyperthreading (was: Two identical systems, radically different performance)