Re: plan shape work - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: plan shape work |
Date | |
Msg-id | CA+TgmobF3V0h_7o-4_LrSFLMXaETS_cAq4PyW+0Fxc6EWOKgLA@mail.gmail.com Whole thread Raw |
In response to | Re: plan shape work (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: plan shape work
|
List | pgsql-hackers |
On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't think so. We do not have a nice story on marking Node fields > const: it's very unclear for example what consequences that ought to > have for copyObject(). Maybe somebody will tackle that issue someday, > but it's not something to touch casually in a patch with other > objectives. So I don't think we can make the plan_name fields const. > The best solution I think is to make choose_plan_name() take a const > string and return a non-const one. The attached v10-0001 is just like > your v9-0001 except for doing the const stuff this way. I chose to > fix the impedance mismatch within choose_plan_name() by having it > pstrdup when it wants to just return the "name" string, but you could > make a case for holding your nose and just casting away const there. Yeah, these are the kinds of trade-offs I never know how to make. I have a little difficulty believing that it's worth a pstrdup() to have the benefit of marking things const, but maybe it is, and the difference is probably microscopic either way. It's not like we're going to be faced with many planning problems involving gigantic numbers of subqueries. If this were a hot code path we'd want to think harder. > > - You (Tom) also asked why not print InitPlan/SubPlan wherever we > > refer to subplans, so this version restores that behavior. > > Thanks. I'm good with the output now (modulo the bug described > below). Someone could potentially argue that this exposes more > of the internals than we really ought to, such as the difference > between expr and multiexpr SubLinks, but I'm okay with that. I actually thought it looked kind of nice exposing some of that detail, but it's certainly a judgement call. > Aside from the const issue, something I don't really like at the > coding level is the use of an "allroots" list. One reason is that > it's partially redundant with the adjacent "subroots" list, but > a bigger one is that we have transient roots that shouldn't be > in there. An example here is pull_up_simple_subquery: it builds > a clone of the query's PlannerInfo to help it use various > infrastructure along the way to flattening the subquery, but > that clone is not referenced anymore after the function exits. > You were putting that into allroots, which seems to me to be > a fundamental error, even more so because it went in with the > same plan_name as the root it was cloned from. > > I think a better idea is to keep a list of just the subplan > names that we've assigned so far. That has a far clearer > charter, plus it can be updated immediately by choose_plan_name() > instead of relying on the caller to do the right thing later. > I coded this up, and was rather surprised to find that it changed > some regression outputs. On investigation, that's because > build_minmax_path() was actually doing the wrong thing later: > it was putting the wrong root into allroots, so that "minmax_1" > never became assigned and could be re-used later. Ooph, that's embarrassing. I think the reason that I ended up making a list of the roots themselves rather than the strings is that I was thinking that everything in this data structure would need to be a node, and I didn't want to cons up String nodes for every list element. Then later I marked that structure member read_write_ignore and never stopped to think that maybe then we didn't need nodes at all. So, in short, I think this is a great solution and thanks a ton for putting in the legwork to figure it out. > I also observed that SS_process_ctes() was not using > choose_plan_name() but simply assigning the user-written CTE > name. I believe it's possible to use the same CTE name in > different parts of a query tree, so this fails to achieve > the stated purpose of making the names unique. Ouch, that's another good catch. Also, even if the CTEs had to be unique among themselves, there could be a collision between a CTE name and a generated name. > I'm still a little bit uncomfortable about whether > it's okay for pull_up_simple_subquery() to just do > > + subroot->plan_name = root->plan_name; > > rather than giving some other name to the transient subroot. > I think it's okay because we are not making any meaningful planning > decisions during the life of the subroot, just seeing if we can > transform the subquery into a form that allows it to be pulled up. > But you might think differently. Perhaps a potential compromise > is to set the transient subroot's plan_name to NULL instead? I don't like NULL. Imagine that some meaningful planning decision does get made during the life of the subroot, and imagine further that some hook is called by means of which some extension can influence that decision. What plan_name do we want that extension to see? I think there's some argument for letting it see the plan name of the parent plan into which we're thinking about inlining the subquery, which I believe is the effect of the current coding, and there's perhaps also an argument for having a wholly new plan name there just to really identify that particular decision clearly, but if we put NULL, that's the name we use for the topmost query level. If we changed things so that the top query level gets called "main" or some other constant string, then it would make sense to use NULL as a sentinel value here to mean "undefined," but I kind of think we probably don't want to go there. > Anyway, v10-0002 is a delta patch to use a list of subplan > names instead of "allroots", and there are a couple of trivial > cosmetic changes too. OK, I'll try out these versions and let you know what I find out. Thanks much. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: