Thread: Memory context can be its own parent and child in replication command
Hi, While running some tests with logical replication, I've run in a situation where a walsender process was stuck in an infinite loop with the following backtrace: #0 in MemoryContextDelete (...) at ../src/backend/utils/mmgr/mcxt.c:474 #1 in exec_replication_command (cmd_string=... "BEGIN") at ../src/backend/replication/walsender.c:2005 Which matches the following while loop: while (curr->firstchild != NULL) curr = curr->firstchild; Inspecting the memory context, I have the following: $9 = (MemoryContext) 0xafb741c35360 (gdb) p *curr $10 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, mem_allocated = 8192, methods = 0xafb7278a82d8 <mcxt_methods+216>, parent = 0xafb741c35360, firstchild = 0xafb741c35360, prevchild = 0x0, nextchild = 0x0, name = 0xafb7275f68a8 "Replication command context", ident = 0x0, reset_cbs = 0x0} So the memory context is 0xafb741c35360, which is also the same value for parent and firstchild. This explains the infinite loop as MemoryContextDelete tries to find the leaf with no children. I was able to get a rr recording of the issue and trace how this happened. This can be reproduced by triggering 2 replication commands, with the first one doing a snapshot export: CREATE_REPLICATION_SLOT "test_slot" LOGICAL "test_decoding" ( SNAPSHOT "export"); DROP_REPLICATION_SLOT "test_slot"; - CreateReplicationSlot will start a new transaction to handle the snapshot export. - This transaction will save the replication command context (let's call it ctx1) in its state. - ctx1 is deleted at the end of exec_replication_command - During the next replication command, the transaction is aborted in SnapBuildClearExportedSnapshot - The transaction restores ctx1 as the CurrentMemoryContext - AllocSetContextCreate is called with ctx1 as a parent, and it will pull ctx1 from the freelist - ctx1's parent and child will be set to ctx1 and returned - During ctx1 deletion, it will be stuck in an infinite loop I've added a tap test to reproduce the issue, along with an assertion during context creation to check the parent and returned context are not the same so the test would immediately abort and not stay stuck. To fix this, it seems like saving and restoring the memory context after the call AbortCurrentTransaction was the best approach. It is similar to what's done with CurrentResourceOwner. I've thought of switching to the TopMemoryContext before exporting the snapshot so aborting the transaction will switch back to TopMemoryContext, but this would still require restoring the memory context after the transaction is aborted. Regards, Anthonin Bonnefoy
Attachment
Re: Memory context can be its own parent and child in replication command
From
Anthonin Bonnefoy
Date:
The issue seems to have been introduced by 2129d184a206c when transactionState->priorContext was added and is only present in HEAD.
Re: Memory context can be its own parent and child in replication command
From
Michael Paquier
Date:
On Mon, Mar 10, 2025 at 09:07:59AM +0100, Anthonin Bonnefoy wrote: > The issue seems to have been introduced by 2129d184a206c when > transactionState->priorContext was added and is only present in HEAD. It seems to me that you mean 1afe31f03cd2, no? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Mar 10, 2025 at 09:07:59AM +0100, Anthonin Bonnefoy wrote: >> The issue seems to have been introduced by 2129d184a206c when >> transactionState->priorContext was added and is only present in HEAD. > It seems to me that you mean 1afe31f03cd2, no? I dunno about this patch: it seems to me it's doing things exactly backwards, namely trying to restore the CurrentMemoryContext after a transaction abort to what it was just beforehand. With only a slightly different set of suppositions, this is introducing use of a deleted context rather than removing it. I'm inclined to suspect that the real problem is somebody installed the wrong context as current just before the transaction was started. I don't have the energy to look closer right now, though. Independently of where the actual bug is there, it seems not nice that it's so easy to bollix the memory context data structures. I wonder if we can make that more detectable at reasonable cost. regards, tom lane
Re: Memory context can be its own parent and child in replication command
From
Anthonin Bonnefoy
Date:
On Tue, Mar 11, 2025 at 1:09 AM Michael Paquier <michael@paquier.xyz> wrote: > It seems to me that you mean 1afe31f03cd2, no? Yes, that was a bad copy/paste from me. On Tue, Mar 11, 2025 at 2:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I dunno about this patch: it seems to me it's doing things exactly > backwards, namely trying to restore the CurrentMemoryContext after > a transaction abort to what it was just beforehand. With only > a slightly different set of suppositions, this is introducing > use of a deleted context rather than removing it. Yeah, I'm not super happy about the fix either. This leaves the issue that CurrentMemoryContext is set to a deleted context, even if temporarily. > I'm inclined to suspect that the real problem is somebody installed > the wrong context as current just before the transaction was started. > I don't have the energy to look closer right now, though. The context installed is the replication command context created at the beginning of exec_replication_command. This context is deleted at the end of the function while the transaction is still running. Maybe a better fix would be to not delete the Replication command context when a transaction was started to handle the export, and switch back to this context at the start of the next exec_replication_command? This way, the memory context referenced by priorContext would still be valid when the transaction is aborted. This would also require the Replication command context to not be attached under the MessageContext to avoid being deleted at the end of the query cycle. > Independently of where the actual bug is there, it seems not > nice that it's so easy to bollix the memory context data > structures. I wonder if we can make that more detectable > at reasonable cost. One thing that made it hard to trace the issue was that there's no way to know if a MemoryContext was deleted or not. Setting the MemoryContext's node type to T_Invalid during reset could be a way to tag the context as deleted. And this will be able to leverage the existing MemoryContextIsValid check and additional MemoryContextIsValid checks could be added before restoring the priorContext. --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context) context->methods->delete_context(context); + context->type = T_Invalid; VALGRIND_DESTROY_MEMPOOL(context); } However, when testing this on my mac, it seems to trigger a heap corruption during initdb. postgres(9057,0x200690840) malloc: Heap corruption detected, free list is damaged at 0x60000322bb10 *** Incorrect guard value: 276707563012096 postgres(9057,0x200690840) malloc: *** set a breakpoint in malloc_error_break to debug While it ran fine on linux. I didn't have time to check why yet.
Re: Memory context can be its own parent and child in replication command
From
Anthonin Bonnefoy
Date:
On Tue, Mar 11, 2025 at 10:26 AM Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote: > --- a/src/backend/utils/mmgr/mcxt.c > +++ b/src/backend/utils/mmgr/mcxt.c > @@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context) > > context->methods->delete_context(context); > > + context->type = T_Invalid; > VALGRIND_DESTROY_MEMPOOL(context); > } > > However, when testing this on my mac, it seems to trigger a heap > corruption during initdb. Which is normal since this would only work with AllocSet as other delelte_context methods would just free the context... A better spot would be just before putting the context in the freelist in AllocSetDelete.
Re: Memory context can be its own parent and child in replication command
From
Anthonin Bonnefoy
Date:
I've updated the patch with another approach: 0001: This sets the MemoryContext type to T_Invalid just before adding it to the aset freelist, allowing to distinguish between a MemoryContext that was placed in the freelist and shouldn't be used, and a valid one. The existing MemoryContextIsValid checks will be triggered if a MemoryContext in the freelist is used. I've also added additional MemoryContextIsValid to make sure the MemoryContext restored by a transaction is valid. 0002: Keep the replication command alive when there's an ongoing snapshot export. This way, the context restored is valid and can be reused. This requires the replication command context to be created under the TopMemoryContext. Regards, Anthonin Bonnefoy
Attachment
Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> writes: > I've updated the patch with another approach: I got back to looking at this today. I still don't like very much about it. After thinking for awhile, I concur that we want to create the cmd_context under TopMemoryContext, but I think we can make things a great deal simpler and less fragile by just keeping it in existence indefinitely, as attached. The fundamental problem here is that exec_replication_command is creating and deleting the context without regard to where transaction boundaries are. I don't think the best way to deal with that is to require it to know where those boundaries are. I also don't like the proposed test script. The test case alone would be fine, but spinning up an entire new instance seems a rather huge overhead to carry forevermore for a single test that likely will never find anything again. I tried to cram the test case into one of the existing scripts, but it was hard to find one where it would fit naturally. An obvious candidate is subscription/t/100_bugs.pl, but the test failed there for lack of access to the test_decoding plugin. Perhaps we could use another plugin, but I didn't try hard. As for the memory-context-bug-catching changes, I'm not very convinced by the idea of setting context->type = T_Invalid. That would help if someone tries to reference the context while it's still in the freelist, but not if the reference is from re-use of a stale pointer after the context has been reassigned (which IIUC is the case here). I also think that decorating a few MemoryContextSwitchTos with assertions looks pretty random. I wonder if we should content ourselves with adding some assertions that catch attempts to make a context be its own parent. It looks like MemoryContextSetParent already does that, so the only new assert would be in MemoryContextCreate. regards, tom lane diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 987a7336ec8..9fa8beb6103 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1973,8 +1973,10 @@ exec_replication_command(const char *cmd_string) int parse_rc; Node *cmd_node; const char *cmdtag; - MemoryContext cmd_context; - MemoryContext old_context; + MemoryContext old_context = CurrentMemoryContext; + + /* We save and re-use the cmd_context across calls */ + static MemoryContext cmd_context = NULL; /* * If WAL sender has been told that shutdown is getting close, switch its @@ -2003,11 +2005,30 @@ exec_replication_command(const char *cmd_string) /* * Prepare to parse and execute the command. + * + * Because replication command execution can involve beginning or ending + * transactions, we need a working context that will survive that, so we + * make it a child of TopMemoryContext. That in turn creates a hazard of + * long-lived memory leaks if we lose track of the working context. We + * deal with that by creating it only once per walsender, and resetting it + * for each new command. (Normally this reset is a no-op, but if the + * prior exec_replication_command call failed with an error, it won't be.) + * + * This is subtler than it looks. The transactions we manage can extend + * across replication commands, indeed SnapBuildClearExportedSnapshot + * might have just ended one. Because transaction exit will revert to the + * memory context that was current at transaction start, we need to be + * sure that that context is still valid. That motivates re-using the + * same cmd_context rather than making a new one each time. */ - cmd_context = AllocSetContextCreate(CurrentMemoryContext, - "Replication command context", - ALLOCSET_DEFAULT_SIZES); - old_context = MemoryContextSwitchTo(cmd_context); + if (cmd_context == NULL) + cmd_context = AllocSetContextCreate(TopMemoryContext, + "Replication command context", + ALLOCSET_DEFAULT_SIZES); + else + MemoryContextReset(cmd_context); + + MemoryContextSwitchTo(cmd_context); replication_scanner_init(cmd_string, &scanner); @@ -2020,7 +2041,7 @@ exec_replication_command(const char *cmd_string) replication_scanner_finish(scanner); MemoryContextSwitchTo(old_context); - MemoryContextDelete(cmd_context); + MemoryContextReset(cmd_context); /* XXX this is a pretty random place to make this check */ if (MyDatabaseId == InvalidOid) @@ -2180,9 +2201,12 @@ exec_replication_command(const char *cmd_string) cmd_node->type); } - /* done */ + /* + * Done. Revert to caller's memory context, and clean out the cmd_context + * to recover memory right away. + */ MemoryContextSwitchTo(old_context); - MemoryContextDelete(cmd_context); + MemoryContextReset(cmd_context); /* * We need not update ps display or pg_stat_activity, because PostgresMain
I wrote: > ... An obvious candidate is subscription/t/100_bugs.pl, but > the test failed there for lack of access to the test_decoding plugin. > Perhaps we could use another plugin, but I didn't try hard. Ah, I realized we could just use pgoutput, since the test doesn't actually do anything that would exercise the plugin. So here's a v4 with the test restored. regards, tom lane From cd8ff75cf45cb9515e9bf3ad1696f80901b2035f Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 20 Apr 2025 15:24:14 -0400 Subject: [PATCH v4] Use the same cmd_context throughout a walsender's lifetime. exec_replication_command created a cmd_context to work in and then deleted it on exit. This is pretty dangerous because some replication commands start/finish transactions. In the wake of commit 1afe31f03, that could lead to restoring a CurrentMemoryContext that's already been deleted (and maybe even reassigned, leading to hilarity such as a memory context that is its own parent). To fix, let's make the cmd_context persist across exec_replication_command calls; instead of deleting it, we'll just reset it each time. In this way it retains the same identity and there's no problem if transaction abort restores it as the working context. It probably even saves a few microseconds to do this. This fix also ensures that exec_replication_command returns to the caller (PostgresMain) with the same context active that had been when it was called (probably MessageContext). The previous coding could get that wrong too. Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com --- src/backend/replication/walsender.c | 42 ++++++++++++++++++++++------- src/test/subscription/t/100_bugs.pl | 18 +++++++++++++ 2 files changed, 51 insertions(+), 9 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 987a7336ec8..9fa8beb6103 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1973,8 +1973,10 @@ exec_replication_command(const char *cmd_string) int parse_rc; Node *cmd_node; const char *cmdtag; - MemoryContext cmd_context; - MemoryContext old_context; + MemoryContext old_context = CurrentMemoryContext; + + /* We save and re-use the cmd_context across calls */ + static MemoryContext cmd_context = NULL; /* * If WAL sender has been told that shutdown is getting close, switch its @@ -2003,11 +2005,30 @@ exec_replication_command(const char *cmd_string) /* * Prepare to parse and execute the command. + * + * Because replication command execution can involve beginning or ending + * transactions, we need a working context that will survive that, so we + * make it a child of TopMemoryContext. That in turn creates a hazard of + * long-lived memory leaks if we lose track of the working context. We + * deal with that by creating it only once per walsender, and resetting it + * for each new command. (Normally this reset is a no-op, but if the + * prior exec_replication_command call failed with an error, it won't be.) + * + * This is subtler than it looks. The transactions we manage can extend + * across replication commands, indeed SnapBuildClearExportedSnapshot + * might have just ended one. Because transaction exit will revert to the + * memory context that was current at transaction start, we need to be + * sure that that context is still valid. That motivates re-using the + * same cmd_context rather than making a new one each time. */ - cmd_context = AllocSetContextCreate(CurrentMemoryContext, - "Replication command context", - ALLOCSET_DEFAULT_SIZES); - old_context = MemoryContextSwitchTo(cmd_context); + if (cmd_context == NULL) + cmd_context = AllocSetContextCreate(TopMemoryContext, + "Replication command context", + ALLOCSET_DEFAULT_SIZES); + else + MemoryContextReset(cmd_context); + + MemoryContextSwitchTo(cmd_context); replication_scanner_init(cmd_string, &scanner); @@ -2020,7 +2041,7 @@ exec_replication_command(const char *cmd_string) replication_scanner_finish(scanner); MemoryContextSwitchTo(old_context); - MemoryContextDelete(cmd_context); + MemoryContextReset(cmd_context); /* XXX this is a pretty random place to make this check */ if (MyDatabaseId == InvalidOid) @@ -2180,9 +2201,12 @@ exec_replication_command(const char *cmd_string) cmd_node->type); } - /* done */ + /* + * Done. Revert to caller's memory context, and clean out the cmd_context + * to recover memory right away. + */ MemoryContextSwitchTo(old_context); - MemoryContextDelete(cmd_context); + MemoryContextReset(cmd_context); /* * We need not update ps display or pg_stat_activity, because PostgresMain diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl index 83120f1cb6f..9c9701d58a8 100644 --- a/src/test/subscription/t/100_bugs.pl +++ b/src/test/subscription/t/100_bugs.pl @@ -477,6 +477,24 @@ $result = is( $result, qq(2|f 3|t), 'check replicated update on subscriber'); +# Test create and immediate drop of replication slot via replication commands +# (this exposed a memory-management bug in v18) +my $publisher_host = $node_publisher->host; +my $publisher_port = $node_publisher->port; +my $connstr_db = + "host=$publisher_host port=$publisher_port replication=database dbname=postgres"; + +is( $node_publisher->psql( + 'postgres', + qq[ + CREATE_REPLICATION_SLOT "test_slot" LOGICAL pgoutput ( SNAPSHOT "export"); + DROP_REPLICATION_SLOT "test_slot"; + ], + timeout => $PostgreSQL::Test::Utils::timeout_default, + extra_params => [ '-d', $connstr_db ]), + 0, + 'create and immediate drop of replication slot'); + $node_publisher->stop('fast'); $node_subscriber->stop('fast'); -- 2.43.5
Hi,
So here's a v4 with the test restored.
I tested this patch, it fixes the issue reported. It passes Github CI tests.
already does that, so the only new assert would be in
MemoryContextCreate.
+1 for adding the assertion to increase the chances of this bug being
caught by memory context infrastructure.
I had the following comment.
Why do we do this:
I had the following comment.
Why do we do this:
- MemoryContext old_context;
+ MemoryContext old_context = CurrentMemoryContext;
Instead of implementing it as done in the previous version of this code, i.e.
old_context = MemoryContextSwitchTo(cmd_context);
Thank you,
+ MemoryContext old_context = CurrentMemoryContext;
Instead of implementing it as done in the previous version of this code, i.e.
old_context = MemoryContextSwitchTo(cmd_context);
Thank you,
Rahila Syed