Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 |
Date | |
Msg-id | 482957.1700192299@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 |
List | pgsql-hackers |
Richard Guo <guofenglinux@gmail.com> writes: > On Fri, Nov 17, 2023 at 2:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So you could argue that there's more to do here, but I'm hesitant >> to go further. Part of the point of MATERIALIZED is to be an >> optimization fence, so breaking down that fence is something to be >> wary of. Maybe we shouldn't even take this patch --- but on >> balance I think it's an OK compromise. > Agreed. I think the patch is still valuable on its own, although it > does not go down into MATERIALIZED case for further optimization. Right. My earlier response was rather rushed, so let me explain my thinking a bit more. When I realized that the discrepancy between MATERIALIZED-and-not plans was due to the upper planner not seeing the pathkeys for the CTE scan, my first thought was to try to export those pathkeys. And my second thought was that the CTE should return multiple potential paths, much as we do for sub-SELECT-in-FROM subqueries, with the upper planner eventually choosing one of those paths. But that second idea would break down the optimization fence almost completely, because the opinions of the upper planner would influence which plan we choose for the CTE query. I think we shouldn't go there, at least not for a CTE explicitly marked MATERIALIZED. (Maybe if it's not marked MATERIALIZED, but we chose not to flatten it for some other reason, we could think about that differently? Not sure.) I think that when we say that MATERIALIZED is meant as an optimization fence, what we mostly mean is that the upper query shouldn't influence the choice of plan for the sub-query. However, we surely allow our statistics or guesses for the sub-query to subsequently influence what the upper planner does. If that weren't true, we shouldn't even expose any non-default rowcount guess to the upper planner --- but that would lead to really horrid results, so we allow that information to percolate up from the sub-query. It seems like exposing column statistics to the upper planner, as the proposed patch does, isn't fundamentally different from exposing rowcount estimates. That line of argument also leads to the conclusion that it'd be okay to expose info about the ordering of the CTE result to the upper planner. This patch doesn't do that, and I'm not sufficiently excited about the issue to go write some code. But if someone else does, I think we shouldn't exclude doing it on the grounds of wanting to preserve an optimization fence. The fence is sort of one-way in this line of thinking: information can propagate up to the outer planner level, but not down into the CTE plan. Thoughts? regards, tom lane
pgsql-hackers by date: