Re: generic plans and "initial" pruning - Mailing list pgsql-hackers

From Tom Lane
Subject Re: generic plans and "initial" pruning
Date
Msg-id 605328.1747710381@sss.pgh.pa.us
Whole thread Raw
In response to Re: generic plans and "initial" pruning  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
Amit Langote <amitlangote09@gmail.com> writes:
> Pushed after some tweaks to comments and the test case.

My attention was drawn to commit 525392d57 after observing that
Valgrind complained about a memory leak in some code that commit added
to BuildCachedPlan().  I tried to make sense of said code so I could
remove the leak, and eventually arrived at the attached patch, which
is part of a series of leak-fixing things hence the high sequence
number.

Unfortunately, the bad things I speculated about in the added comments
seem to be reality.  The second attached file is a test case that
triggers

TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID:
602087

because it adds a DO ALSO rule that causes the rewriter to generate
more PlannedStmts than it did before.

This is quite awful, because it does more than simply break the klugy
(and undocumented) business about keeping the top-level List in a
different context.  What it means is that any outside code that is
busy iterating that List is very fundamentally broken: it's not clear
what List index it ought to resume at, except that "the one it was at"
is demonstrably incorrect.

I also don't really believe the (also undocumented) assumption that
such outside code is in between executions of PlannedStmts of the
List and hence can tolerate those being ripped out and replaced.
I have not attempted to build an example, because the one I have
seems sufficiently damning.  But I bet that a recursive function
could be constructed in such a way that an outer execution is
still in progress when an inner call triggers UpdateCachedPlan.

Another small problem (much more easily fixable than the above,
probably) is that summarily setting "plan->is_valid = true"
at the end is not okay.  We could already have received an
invalidation that should result in marking the plan stale.
(Holding locks on the tables involved is not sufficient to
prevent that, as there are other sources of inval events.)

It's possible that this code can be fixed, but I fear it's
going to involve some really fundamental redesign, which
probably shouldn't be happening after beta1.  I think there
is no alternative but to revert for v18.

            regards, tom lane

From a680e6b6885378beb0164e465b50afd81558ebc5 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 19 May 2025 00:02:20 -0400
Subject: [PATCH v2 10/20] Partially fix some extremely broken code from
 525392d57.

Avoid leaking memory in the stmt_context during BuildCachedPlan.
Sadly, this code has problems a lot worse than that (per the
documentation I added), so I suspect 525392d57 will get reverted
and we won't need this patch.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/utils/cache/plancache.c | 37 ++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9bcbc4c3e97..40ba3e9df7c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1109,22 +1109,32 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
      */
     if (!plansource->is_oneshot)
     {
+        List       *stmt_plist;
+
         plan_context = AllocSetContextCreate(CurrentMemoryContext,
                                              "CachedPlan",
                                              ALLOCSET_START_SMALL_SIZES);
         MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);

-        stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+        stmt_context = AllocSetContextCreate(plan_context,
                                              "CachedPlan PlannedStmts",
                                              ALLOCSET_START_SMALL_SIZES);
         MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
-        MemoryContextSetParent(stmt_context, plan_context);

+        /*
+         * Copy plans into the stmt_context.
+         */
         MemoryContextSwitchTo(stmt_context);
-        plist = copyObject(plist);
+        stmt_plist = copyObject(plist);

+        /*
+         * We actually need the top-level List object to be in the long-lived
+         * plan_context, in case UpdateCachedPlan wants to update it; see
+         * comments therein.  Do a shallow copy to make that happen.
+         */
         MemoryContextSwitchTo(plan_context);
-        plist = list_copy(plist);
+        plist = list_copy(stmt_plist);
+        list_free(stmt_plist);    /* be tidy */
     }
     else
         plan_context = CurrentMemoryContext;
@@ -1251,12 +1261,22 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,

     /*
      * Planning work is done in the caller's memory context.  The resulting
-     * PlannedStmt is then copied into plan->stmt_context after throwing away
-     * the old ones.
+     * PlannedStmt(s) are then copied into plan->stmt_context after throwing
+     * away the old ones.  But note that we re-use the long-lived
+     * plan->stmt_list list to hold the pointers to the PlannedStmts.  This
+     * kluge avoids breaking code that is iterating over that list, so long as
+     * it's between statements and not currently using one of the contained
+     * PlannedStmts.
+     *
+     * XXX this is, if not actively broken, at least unbelievably fragile.
+     * Aside from the likelihood that the just-stated assumption doesn't hold
+     * universally, there is not a good reason to believe that the length of
+     * the plan list is constant.
      */
     plan_list = pg_plan_queries(query_list, plansource->query_string,
                                 plansource->cursor_options, NULL);
-    Assert(list_length(plan_list) == list_length(plan->stmt_list));
+    if (list_length(plan_list) != list_length(plan->stmt_list))
+        elog(ERROR, "UpdateCachedPlan(): plan list length changed");

     MemoryContextReset(plan->stmt_context);
     oldcxt = MemoryContextSwitchTo(plan->stmt_context);
@@ -1276,7 +1296,8 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,

     /*
      * We've updated all the plans that might have been invalidated, so mark
-     * the CachedPlan as valid.
+     * the CachedPlan as valid.  XXX wrong: we could already have hit a new
+     * invalidation event.
      */
     plan->is_valid = true;

--
2.43.5

drop table if exists test_table;

CREATE TABLE test_table (a int);

create or replace function doit(r int, a int) returns bool
language plpgsql as $$
begin
  raise notice 'r = %, a = %', r, a;
  if (r = 10) then
    CREATE RULE make_noise AS ON DELETE TO test_table
    DO ALSO INSERT INTO test_table SELECT 2;
    raise notice 'made rule';
  end if;
  if (r = 20 and a = 1) then
    CREATE RULE make_noise_2 AS ON DELETE TO test_table
    DO ALSO INSERT INTO test_table SELECT 3;
    raise notice 'made rule 2';
  end if;
  return true;
end$$;

set plan_cache_mode to force_generic_plan;

DO $$
BEGIN
    FOR r IN 1..30 LOOP
        TRUNCATE test_table;
        INSERT INTO test_table SELECT 1;
        DELETE FROM test_table where doit(r,a);
    END LOOP;
END$$;

table test_table;

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Adding null patch entry to cfbot/CommitFest
Next
From: shveta malik
Date:
Subject: Re: Conflict detection for update_deleted in logical replication