Re: Memory context can be its own parent and child in replication command - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Memory context can be its own parent and child in replication command |
Date | |
Msg-id | 382092.1745173107@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Memory context can be its own parent and child in replication command (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>) |
Responses |
Re: Memory context can be its own parent and child in replication command
|
List | pgsql-hackers |
Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> writes: > I've updated the patch with another approach: I got back to looking at this today. I still don't like very much about it. After thinking for awhile, I concur that we want to create the cmd_context under TopMemoryContext, but I think we can make things a great deal simpler and less fragile by just keeping it in existence indefinitely, as attached. The fundamental problem here is that exec_replication_command is creating and deleting the context without regard to where transaction boundaries are. I don't think the best way to deal with that is to require it to know where those boundaries are. I also don't like the proposed test script. The test case alone would be fine, but spinning up an entire new instance seems a rather huge overhead to carry forevermore for a single test that likely will never find anything again. I tried to cram the test case into one of the existing scripts, but it was hard to find one where it would fit naturally. An obvious candidate is subscription/t/100_bugs.pl, but the test failed there for lack of access to the test_decoding plugin. Perhaps we could use another plugin, but I didn't try hard. As for the memory-context-bug-catching changes, I'm not very convinced by the idea of setting context->type = T_Invalid. That would help if someone tries to reference the context while it's still in the freelist, but not if the reference is from re-use of a stale pointer after the context has been reassigned (which IIUC is the case here). I also think that decorating a few MemoryContextSwitchTos with assertions looks pretty random. I wonder if we should content ourselves with adding some assertions that catch attempts to make a context be its own parent. It looks like MemoryContextSetParent already does that, so the only new assert would be in MemoryContextCreate. regards, tom lane diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 987a7336ec8..9fa8beb6103 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1973,8 +1973,10 @@ exec_replication_command(const char *cmd_string) int parse_rc; Node *cmd_node; const char *cmdtag; - MemoryContext cmd_context; - MemoryContext old_context; + MemoryContext old_context = CurrentMemoryContext; + + /* We save and re-use the cmd_context across calls */ + static MemoryContext cmd_context = NULL; /* * If WAL sender has been told that shutdown is getting close, switch its @@ -2003,11 +2005,30 @@ exec_replication_command(const char *cmd_string) /* * Prepare to parse and execute the command. + * + * Because replication command execution can involve beginning or ending + * transactions, we need a working context that will survive that, so we + * make it a child of TopMemoryContext. That in turn creates a hazard of + * long-lived memory leaks if we lose track of the working context. We + * deal with that by creating it only once per walsender, and resetting it + * for each new command. (Normally this reset is a no-op, but if the + * prior exec_replication_command call failed with an error, it won't be.) + * + * This is subtler than it looks. The transactions we manage can extend + * across replication commands, indeed SnapBuildClearExportedSnapshot + * might have just ended one. Because transaction exit will revert to the + * memory context that was current at transaction start, we need to be + * sure that that context is still valid. That motivates re-using the + * same cmd_context rather than making a new one each time. */ - cmd_context = AllocSetContextCreate(CurrentMemoryContext, - "Replication command context", - ALLOCSET_DEFAULT_SIZES); - old_context = MemoryContextSwitchTo(cmd_context); + if (cmd_context == NULL) + cmd_context = AllocSetContextCreate(TopMemoryContext, + "Replication command context", + ALLOCSET_DEFAULT_SIZES); + else + MemoryContextReset(cmd_context); + + MemoryContextSwitchTo(cmd_context); replication_scanner_init(cmd_string, &scanner); @@ -2020,7 +2041,7 @@ exec_replication_command(const char *cmd_string) replication_scanner_finish(scanner); MemoryContextSwitchTo(old_context); - MemoryContextDelete(cmd_context); + MemoryContextReset(cmd_context); /* XXX this is a pretty random place to make this check */ if (MyDatabaseId == InvalidOid) @@ -2180,9 +2201,12 @@ exec_replication_command(const char *cmd_string) cmd_node->type); } - /* done */ + /* + * Done. Revert to caller's memory context, and clean out the cmd_context + * to recover memory right away. + */ MemoryContextSwitchTo(old_context); - MemoryContextDelete(cmd_context); + MemoryContextReset(cmd_context); /* * We need not update ps display or pg_stat_activity, because PostgresMain
pgsql-hackers by date: