Re: pg_plan_advice - Mailing list pgsql-hackers
| From | Robert Haas |
|---|---|
| Subject | Re: pg_plan_advice |
| Date | |
| Msg-id | CA+TgmoY0J9==y93stO7QaAU+TyA-zm5wqsc4GWUa0hgzPAi_sw@mail.gmail.com Whole thread Raw |
| In response to | Re: pg_plan_advice (Amit Langote <amitlangote09@gmail.com>) |
| Responses |
Re: pg_plan_advice
|
| List | pgsql-hackers |
On Wed, Dec 10, 2025 at 6:20 AM Amit Langote <amitlangote09@gmail.com> wrote: > These are just high-level comments after browsing the patches and > reading some bits like pgpa_identifier to get myself familiarized with > the project. I like that the key concept here is plan stability > rather than plan control, because that framing makes it easier to > treat this as infrastructure instead of policy. Thanks, I agree. I'm sure people will use this for plan control, but if you start with that, then it's really unclear what things you should allow to be controlled and what things not. Defining the focus as plan stability makes round-trip safety a priority and the scope of what you can request is what the planner could have generated had the costing come out just so. There's still some definitional questions at the margin, but IMHO it's much less fuzzy. > On the relation identifier system: IMHO this part doesn't seem as > opinionated as the advice mini-language. The requirements pretty much > dictate the design -- you need alias names and occurrence counters to > handle self-joins, partition fields for partitioned tables, and a > string representation to survive dump/restore. There doesn't seem to > be much flexibility in that. Right. There's some flexibility. For instance, you could handle partitions using occurrence numbers, which would actually save a bunch of code, but that seems obviously worse in terms of user experience. Also, you could if you wanted key it off of the name of the table rather than the relation alias used for the table. I think that's also worse but possibly it's debatable. You could change the order of the pieces in the representation; e.g. maybe plan_name should come first rather than last; or you could change the separator characters. But, honestly, none of that strikes me as sufficient grounds to want multiple implementations. If the choices I've made don't seem good to other people, then we should just change them and hopefully find something everybody can live with. It's a bit like the way that extension SQL scripts use "--" as a separator: maybe not everybody agrees that this is the absolutely most elegant choice, but nobody's proposing a a second version of the extension mechanism just to do something different. > Given that, it seems more practical to put this in core from the > start. Extensions that might want to build plan-advice-like > functionality shouldn’t have to clone this logic and wait another > release for something that’s already well-defined and deterministic. > The mini-language is opinionated and belongs in contrib, but the > identifier infrastructure just solves a fundamental problem cleanly. It's not quite as easy to make a sharp distinction between these things as someone might hope. Note that the lexer and parser handle the whole mini-language, which includes parsing the relation identifiers. That doesn't of course mean that the code to *generate* relation identifiers couldn't be in core, and I actually had it that way at one point, but it's not very much code and I wasn't too impressed with how that turned out. It seemed to couple the core code to the extension more tightly than necessary for not much real benefit. But that's not to say I disagree with you categorically. Suppose we decided (and I'm not saying we should) to start showing relation identifiers in EXPLAIN output instead of identifying things in EXPLAIN output as we do today. Maybe we even decide to show elided subqueries and similar as first-class parts of the EXPLAIN output, also using relation identifier syntax. That would be a pretty significant change, and would destabilize a WHOLE LOT of regression test outputs, but then relation identifiers become a first-class PostgreSQL concept that everyone who looks at EXPLAIN output will encounter and, probably, come to understand. Then obviously the relation identifier generation code needs to be in core, and that makes total sense because we're actually using it for something, and arguably we've made life easier for everyone who wants to use pg_plan_advice in the future because they're already familiar with the identifier convention. The downside is everyone has to get used to the new EXPLAIN output even if they don't care about pg_plan_advice or hate it with a fiery passion. So my point here is that there are things we can decide to do to make some or all of this "core," but IMHO it's not just as simple as saying "this is in, that's out". It's more about deciding what the end state ought to look like, and how integrated this stuff ought to be into the fabric of PostgreSQL. I started with the minimal level of integration: little pieces of core infrastructure, all used by a giant extension. Now we need to either decide that's where we want to settle, or decide to push to some greater or lesser degree toward more integration. > On the infrastructure patches (0001-0005): these look sensible. The > range table flattening info, elided node tracking, and append node > consolidation preserve information that's currently lost -- there's > some additional overhead to track this, but it's fixed per-relation > per-subquery, which seems reasonable. The path generation hooks > (0005) are a clear improvement: moving from global enable_* GUCs to > per-RelOptInfo pgs_mask gives extensions the granularity they need for > relation-specific and join-specific decisions. Yes, you need C code to > use them, but you'd need to write C code to do something of value in > this area anyway, and the hooks give you control that GUCs can't > provide. > > Overall, I'm supportive of getting these committed once they're ready. > contrib/pg_plan_advice is a compelling proof-of-concept for why these > hooks are needed. Great. I don't think there's anything terribly controversial in 0001-0004. I think the comments and so on might need improving and there could be little mini-bugs or whatever, but basically I think they work and I don't anticipate any major problems. However, I'd want at least one other person to do a detailed review before committing anything. 0005 might be a little more controversial. There's some design choices to dislike (though I believe I've made them for good reason) and there's a question of whether it's as complete as we want. It might be fine to commit it the way it is and just adjust it later if we find that something ought to be different, but it's also possible that we should think harder about some of the choices or hold off for a bit while other parts of this effort move forward. I'm happy to hear opinions on the best strategy here. > I'll try to post more specific comments once I've read this some more. Thanks for the review so far, and that sounds great! -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: