Re: pg_plan_advice - Mailing list pgsql-hackers
| From | Lukas Fittl |
|---|---|
| Subject | Re: pg_plan_advice |
| Date | |
| Msg-id | CAP53Pkz3DSFaaowYvbO5LULf3NhydD_UhHkighfWf6_pwxiqUw@mail.gmail.com Whole thread Raw |
| In response to | Re: pg_plan_advice (Robert Haas <robertmhaas@gmail.com>) |
| Responses |
Re: pg_plan_advice
|
| List | pgsql-hackers |
Hi Robert,
On Tue, Mar 24, 2026 at 2:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Mar 21, 2026 at 9:13 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > So I'm left with the idea that to get test_plan_advice to be fully
> > stable on these slower machines, it will probably be necessary to make
> > it control which AlternativeSubPlan is chosen and whether a
> > MinMaxAggPath is chosen or not. I have some ideas about how to
> > accomplish that in a reasonably elegant fashion without adding too
> > much new machinery, but I need to spend some more time validating
> > those ideas before committing to a precise course of action. More
> > soon.
>
> Here is v22. There are four new patches.
>
> 0001 adds a disabled_nodes fields to SubPlan, to fix the bug that I
> identified in the email to which this is a reply.
This looks good, and is consistent with how it would have worked
before the introduction of disabled nodes (since costs would have just
been very high and thus discourage a subplan with many disabled
nodes).
Only nit is that the commit hash reference in the commit message
doesn't seem right, I think you probably meant
e22253467942fdb100087787c3e1e3a8620c54b2
> 0002-0004 are an attempt to fix the remaining buildfarm failures not
> already addressed (or attempted to be addressed, anyway) by other
> commits. The basic idea, implemented by 0004, is to add a
> DO_NOT_SCAN() advice tag. This advice is generated when we consider a
> MinMaxAggPath or a hashed SubPlan. In either case, all relations which
> are part of the non-selected alternative are marked DO_NOT_SCAN(),
> which works like scan type advice but disables every possible scan
> type rather than still allowing exactly one of them. Unless I've
> missed something, this should be sufficient to make pg_plan_advice
> stabilize which of two alternative SubPlans we pick and whether or not
> a min/max aggregate is chosen. 0002 does some preliminary refactoring
> to provide a more centralized way of tracking per-PlannerInfo details
> within pg_plan_advice. 0003 makes the necessary change to
> src/backend/optimizer, which consists of adding an alternative_root
> field to each PlannerInfo and setting it appropriately. 0004 then
> updates pg_plan_advice to implement DO_NOT_SCAN().
For 0002:
I think that overall looks like a good refactoring, with two minor notes:
> diff --git a/contrib/pg_plan_advice/pgpa_planner.c b/contrib/pg_plan_advice/pgpa_planner.c
> index fee88904760..70139ff42be 100644
> --- a/contrib/pg_plan_advice/pgpa_planner.c
> +++ b/contrib/pg_plan_advice/pgpa_planner.c
> ...
> @@ -2017,34 +1949,64 @@ pgpa_planner_feedback_warning(List *feedback)
> errdetail("%s", detailbuf.data));
> }
>
> -#ifdef USE_ASSERT_CHECKING
> -
> /*
> - * Fast hash function for a key consisting of an RTI and plan name.
> + * Get or create the pgpa_planner_info for the subroot with the given
> + * plan_name.
> */
> -static uint32
> -pgpa_ri_checker_hash_key(pgpa_ri_checker_key key)
> +static pgpa_planner_info *
> +pgpa_planner_get_proot(pgpa_planner_state *pps, PlannerInfo *root)
> {
I'd word that as "Get or create the pgpa_planner_info for the given
PlannerInfo and its associated plan_name", since you're not passing a
plan_name as the argument.
> @@ -2053,19 +2015,34 @@ pgpa_ri_checker_save(pgpa_planner_state *pps, PlannerInfo *root,
> RelOptInfo *rel)
> {
> ...
> + if (proot->rid_array_size <= rel->relid)
> + {
> + int new_size = Max(proot->rid_array_size, 8);
> +
> + while (new_size < rel->relid)
> + new_size *= 2;
This could use pg_nextpower2_32 on the rel->relid instead of the
manual while loop.
---
For 0003:
I wonder if "original_root" wouldn't be more correct here as a name
(instead of "alternative_root"), since if I follow the implementation
correctly, you are adding a pointer on each alternative root, back to
the original root that the alternative was copied from.
I also wonder if maybe we should be more narrow in what we keep here.
It seems 0004 mainly needs the original plan name, so maybe its better
if we just keep that for targeting purposes, vs a full pointer to the
PlannerInfo. The planner makes an effort to zap unused subplans at the
end of set_plan_references, and I think this new field would then be
the only pointer to those unused subplans. If we decided to add an
early free there at some point (instead of just making them a NULL
entry in the list), that'd break pg_plan_advice.
---
For 0004:
> + /*
> + * If the corresponding PlannerInfo has an alternative_root, then this is
> + * the plan name from that PlannerInfo; otherwise, it is the same as
> + * plan_name.
> + *
> + * is_alternative_plan is set to true for every pgpa_planner_info that
> + * shares an alternative_plan_name with at least one other, and to false
> + * otherwise.
> + */
> + char *alternative_plan_name;
> + bool is_alternative_plan;
Per the earlier note, I think using "original_plan_name" would make
more sense here, because it'll be the name the alternatives are based
on. I also think "has_alternative_plan" is more clear for the boolean,
since it'll be set on the info for the original info as well, if I
understand correctly.
Otherwise 0004 looks like a reasonable compromise for now. I feel like
we can find better ways of doing this over time, and there are parts
I'm not excited about (e.g. the targeting feels a bit brittle when it
comes to anything that'd cause generated alternative subplan names to
change), but I think it works for now.
> 0005 is the pg_collect_advice module from previous versions of the
> patch set. The main change here is that I completely rewrote the TAP
> test, which previously was running the entire regression test suite
> yet another time. That's been replaced with something that is much
> faster and much better targeted at properly testing the shared advice
> collector. Aside from that, I added one more check for
> InvalidDsaPointer where the code was previously lacking one.
From doing an initial code review, I feel like the interface.c vs
collector.c split in this module is confusing - e.g. I would have
expected SQL callable functions like "pg_get_collected_shared_advice"
to be part of the interface. It also makes reading the code confusing,
e.g. I just looked for where pg_collect_advice_get_mcxt is defined,
and I had to jump from collector.c to interface.c - maybe its better
if that's just in one file, since it'd still be slightly under 1000
lines anyway?
From a design perspective, I'm -1 on storing of full query text
strings in shared memory when the shared collector mode. With large
query texts and without an aggregate MB size limit that's an
expressway into OOM land, even if you used a low value like 100
entries max, because ORMs are just really good at creating large query
texts unexpectedly. I'm also skeptical whether that's a good idea for
the local collection mode, but it'd be less problematic there.
Overall, I think this needs to rely on queryid instead and not store
query texts. I would not make queryid optional, but instead enable it
automatically - which fits together with pg_stash_advice taking it as
input.
I realize not having query texts reduces its effectiveness (since you
don't see which parameters produced which plan advice), but it still
helps surface which different advice strings where seen for which
query IDs, letting you identify if you're getting a mix of bad and
good plans. And I'm just really worried people will enable this on
production in shared collection mode and take down their system.
> 0006 is the pg_stash_advice module from previous versions of the patch
> set. I have adjusted this to be much safer against permanent DSA
> leaks. It now uses dshash_find_or_insert_extended instead of relying
> on the ability to dshash_find a just-inserted entry without error. It
> now also holds an LWLock while inserting or updating an entry in the
> dshash table, for reasons explained in the comments. On the other
> hand, it no longer unnecessarily holds the LWLock in exclusive mode
> when looking up advice strings for automatic application, which was a
> rather silly mistake in the previous version. A few additional tests
> have been added. Alphabetization in contrib/Makefile has been fixed.
From a design perspective, I'm worried about the fact that we lose the
stashed advice on a restart. e.g. imagine a DBA using pg_stash_advice
to pin a query that sometimes picks a bad plan to the good plan, but
then their cloud provider applies a security update overnight.
Suddenly the database is slow because the bad plans are being picked
again.
pg_hint_plan's solution to this (the "hints" table [0]) uses an actual
table managed by the extension - but I suspect that doesn't fit the
picture, since it'd be per database, etc. It does have the benefit of
being restart safe though, and being copied to replicas.
I wonder if we could find a way to dump and restore the advice stash
information via a file, so its at least crash and restart safe?
Thanks,
Lukas
[0]: https://github.com/ossc-db/pg_hint_plan/blob/master/docs/hint_table.md
--
Lukas Fittl
pgsql-hackers by date: