Re: On disable_cost - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: On disable_cost |
Date | |
Msg-id | CA+TgmoaubpbXamuWfJTtYh_+jQ_upJxL40AaydiHxCHbXwdCSQ@mail.gmail.com Whole thread Raw |
In response to | Re: On disable_cost (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: On disable_cost
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
On Tue, Mar 12, 2024 at 1:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > 1. You stated that it changes lots of plans in the regression tests, > > but you haven't provided any sort of analysis of why those plans > > changed. I'm kind of surprised that there would be "tons" of plan > > changes. You (or someone) should look into why that's happening. > > I've not read the patch, but given this description I would expect > there to be *zero* regression changes --- I don't think we have any > test cases that depend on disable_cost being finite. If there's more > than zero changes, that very likely indicates a bug in the patch. > Even if there are places where the output legitimately changes, you > need to justify each one and make sure that you didn't invalidate the > intent of that test case. I spent some more time poking at this patch. It's missing a ton of important stuff and is wrong in a whole bunch of really serious ways, and I'm not going to try to mention all of them in this email. But I do want to talk about some of the more interesting realizations that came to me as I was working my way through this. One of the things I realized relatively early is that the patch does nothing to propagate disable_cost upward through the plan tree. That means that if you have a choice between, say, Sort-over-Append-over-SeqScan and MergeAppend-over-IndexScan, as we do in the regression tests, disabling IndexScan doesn't change the plan with the patch applied, as it does in master. That's because only the IndexScan node ends up flagged as disabled. Once we start stacking other plan nodes on top of disabled plan nodes, the resultant plans are at no disadvantage compared to plans containing no disabled nodes. The IndexScan plan survives initially, despite being disabled, because it's got a sort order. That lets us use it to build a MergeAppend path, and that MergeAppend path is not disabled, and compares favorably on cost. After straining my brain over various plan changes for a long time, and hacking on the code somewhat, I realized that just propagating the Boolean upward is insufficient to set things right. That's basically because I was being dumb when I said this: > I don't think we should care how MANY disabled nodes appear in a > plan, particularly. Suppose we try to plan a Nested Loop with SeqScan disabled, but there's no alternative to a SeqScan for the outer side of the join. If we suppose an upward-propagating Boolean, every path for the join is disabled because every path for the outer side is disabled. That means that we have no reason to avoid paths where the inner side also uses a disabled path. When we loop over the inner rel's pathlist looking for ways to build a path for the join, we may find some disabled paths there, and the join paths we build from those paths are disabled, but so are the join paths where we use a non-disabled path on the inner side. So those paths are just competing with each other on cost, and the path type that is supposedly disabled on the outer side of the join ends up not really being very disabled at all. More precisely, if disabling a plan type causes paths to be discarded completely before the join paths are constructed, then they actually do get removed from consideration. But if those paths make it into inner rel's path list, even way out towards the end, then paths derived from them can jump to the front of the joinrel's path list. The same kind of problem happens with Append or MergeAppend nodes. The regression tests expect that we'll avoid disabled plan types whenever possible even if we can't avoid them completely; for instance, the matest0 table intentionally omits an index on one child table. Disabling sequential scans is expected to disable them for all of the other child tables even though for that particular child table there is no other option. But that behavior is hard to achieve if every path for the parent rel is "equally disabled". You either want the path that uses only the one required SeqScan to be not-disabled even though one of its children is disabled ... or you want the disabled flag to be more than a Boolean. And while there's probably more than one way to make it work, the easiest thing seems to be to just have a disabled-counter in every node that gets initialized to the total disabled-counter values of all of its children, and then you add 1 if that node is itself doing something that is disabled, i.e. the exact opposite of what I said in the quote above. I haven't done enough work to know whether that would match the current behavior, let alone whether it would have acceptable performance, and I'm not at all convinced that's the right direction anyway. Actually, at the moment, I don't have a very good idea at all what the right direction is. I do have a feeling that this patch is probably not going in the right direction, but I might be wrong about that, too. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: