Re: generic plans and "initial" pruning - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: generic plans and "initial" pruning |
Date | |
Msg-id | 605328.1747710381@sss.pgh.pa.us Whole thread Raw |
In response to | Re: generic plans and "initial" pruning (Amit Langote <amitlangote09@gmail.com>) |
List | pgsql-hackers |
Amit Langote <amitlangote09@gmail.com> writes: > Pushed after some tweaks to comments and the test case. My attention was drawn to commit 525392d57 after observing that Valgrind complained about a memory leak in some code that commit added to BuildCachedPlan(). I tried to make sense of said code so I could remove the leak, and eventually arrived at the attached patch, which is part of a series of leak-fixing things hence the high sequence number. Unfortunately, the bad things I speculated about in the added comments seem to be reality. The second attached file is a test case that triggers TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087 because it adds a DO ALSO rule that causes the rewriter to generate more PlannedStmts than it did before. This is quite awful, because it does more than simply break the klugy (and undocumented) business about keeping the top-level List in a different context. What it means is that any outside code that is busy iterating that List is very fundamentally broken: it's not clear what List index it ought to resume at, except that "the one it was at" is demonstrably incorrect. I also don't really believe the (also undocumented) assumption that such outside code is in between executions of PlannedStmts of the List and hence can tolerate those being ripped out and replaced. I have not attempted to build an example, because the one I have seems sufficiently damning. But I bet that a recursive function could be constructed in such a way that an outer execution is still in progress when an inner call triggers UpdateCachedPlan. Another small problem (much more easily fixable than the above, probably) is that summarily setting "plan->is_valid = true" at the end is not okay. We could already have received an invalidation that should result in marking the plan stale. (Holding locks on the tables involved is not sufficient to prevent that, as there are other sources of inval events.) It's possible that this code can be fixed, but I fear it's going to involve some really fundamental redesign, which probably shouldn't be happening after beta1. I think there is no alternative but to revert for v18. regards, tom lane From a680e6b6885378beb0164e465b50afd81558ebc5 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon, 19 May 2025 00:02:20 -0400 Subject: [PATCH v2 10/20] Partially fix some extremely broken code from 525392d57. Avoid leaking memory in the stmt_context during BuildCachedPlan. Sadly, this code has problems a lot worse than that (per the documentation I added), so I suspect 525392d57 will get reverted and we won't need this patch. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/utils/cache/plancache.c | 37 ++++++++++++++++++++++------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 9bcbc4c3e97..40ba3e9df7c 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -1109,22 +1109,32 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist, */ if (!plansource->is_oneshot) { + List *stmt_plist; + plan_context = AllocSetContextCreate(CurrentMemoryContext, "CachedPlan", ALLOCSET_START_SMALL_SIZES); MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string); - stmt_context = AllocSetContextCreate(CurrentMemoryContext, + stmt_context = AllocSetContextCreate(plan_context, "CachedPlan PlannedStmts", ALLOCSET_START_SMALL_SIZES); MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string); - MemoryContextSetParent(stmt_context, plan_context); + /* + * Copy plans into the stmt_context. + */ MemoryContextSwitchTo(stmt_context); - plist = copyObject(plist); + stmt_plist = copyObject(plist); + /* + * We actually need the top-level List object to be in the long-lived + * plan_context, in case UpdateCachedPlan wants to update it; see + * comments therein. Do a shallow copy to make that happen. + */ MemoryContextSwitchTo(plan_context); - plist = list_copy(plist); + plist = list_copy(stmt_plist); + list_free(stmt_plist); /* be tidy */ } else plan_context = CurrentMemoryContext; @@ -1251,12 +1261,22 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index, /* * Planning work is done in the caller's memory context. The resulting - * PlannedStmt is then copied into plan->stmt_context after throwing away - * the old ones. + * PlannedStmt(s) are then copied into plan->stmt_context after throwing + * away the old ones. But note that we re-use the long-lived + * plan->stmt_list list to hold the pointers to the PlannedStmts. This + * kluge avoids breaking code that is iterating over that list, so long as + * it's between statements and not currently using one of the contained + * PlannedStmts. + * + * XXX this is, if not actively broken, at least unbelievably fragile. + * Aside from the likelihood that the just-stated assumption doesn't hold + * universally, there is not a good reason to believe that the length of + * the plan list is constant. */ plan_list = pg_plan_queries(query_list, plansource->query_string, plansource->cursor_options, NULL); - Assert(list_length(plan_list) == list_length(plan->stmt_list)); + if (list_length(plan_list) != list_length(plan->stmt_list)) + elog(ERROR, "UpdateCachedPlan(): plan list length changed"); MemoryContextReset(plan->stmt_context); oldcxt = MemoryContextSwitchTo(plan->stmt_context); @@ -1276,7 +1296,8 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index, /* * We've updated all the plans that might have been invalidated, so mark - * the CachedPlan as valid. + * the CachedPlan as valid. XXX wrong: we could already have hit a new + * invalidation event. */ plan->is_valid = true; -- 2.43.5 drop table if exists test_table; CREATE TABLE test_table (a int); create or replace function doit(r int, a int) returns bool language plpgsql as $$ begin raise notice 'r = %, a = %', r, a; if (r = 10) then CREATE RULE make_noise AS ON DELETE TO test_table DO ALSO INSERT INTO test_table SELECT 2; raise notice 'made rule'; end if; if (r = 20 and a = 1) then CREATE RULE make_noise_2 AS ON DELETE TO test_table DO ALSO INSERT INTO test_table SELECT 3; raise notice 'made rule 2'; end if; return true; end$$; set plan_cache_mode to force_generic_plan; DO $$ BEGIN FOR r IN 1..30 LOOP TRUNCATE test_table; INSERT INTO test_table SELECT 1; DELETE FROM test_table where doit(r,a); END LOOP; END$$; table test_table;
pgsql-hackers by date: