Thread: Re: Memory leak in WAL sender with pgoutput (v10~)

Re: Memory leak in WAL sender with pgoutput (v10~)

From
Alvaro Herrera
Date:
On 2024-Dec-03, Michael Paquier wrote:

> So how about the attached that introduces a FreePublication() matching
> with GetPublication(), used to do the cleanup?  Feel free to comment.

I think this doubles down on bad design in the logical replication code,
or at least it goes against what we do almost everywhere else in backend
code.  We should do less freeing, more context deleting/resetting.
(Storing stuff in CacheMemoryContext was surely a mistake.)

If you don't like the idea of a static memcxt in the one block where
it's needed, I propose to store a new memcxt in PGOutputData, to be used
exclusively for publications, with a well defined lifetime.  I'm against
reusing data->cachecxt, because the lifetime of that is 100% unclear.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 3, 2024 at 4:03 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> If you don't like the idea of a static memcxt in the one block where
> it's needed, I propose to store a new memcxt in PGOutputData, to be used
> exclusively for publications, with a well defined lifetime.
>

+1. This sounds like a way to proceed at least for HEAD. For
back-branches, it is less clear whether changing PGOutputData is a
good idea. Can such a change in back branches break any existing
non-core code (extensions)?

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Alvaro Herrera
Date:
On 2024-Dec-03, Amit Kapila wrote:

> On Tue, Dec 3, 2024 at 4:03 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > If you don't like the idea of a static memcxt in the one block where
> > it's needed, I propose to store a new memcxt in PGOutputData, to be used
> > exclusively for publications, with a well defined lifetime.
> 
> +1. This sounds like a way to proceed at least for HEAD. For
> back-branches, it is less clear whether changing PGOutputData is a
> good idea. Can such a change in back branches break any existing
> non-core code (extensions)?

We can put the new member at the end of the struct, it shouldn't damage
anything even if they're using this struct -- which I find pretty
unlikely.  The only way that could break anything is if somebody is
allocating/using arrays of it, which sounds even more unlikely.

If we don't want to accept that risk (for which I see no argument, but
happy to be proven wrong), I would suggest to use the foreach-pfree
pattern Michael first proposed for the backbranches, and the new memory
context in master.  I think this is conducive to better coding overall
as we clean things up in this area.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: Memory leak in WAL sender with pgoutput (v10~)

From
"Euler Taveira"
Date:
On Tue, Dec 3, 2024, at 7:41 PM, Michael Paquier wrote:
On Tue, Dec 03, 2024 at 02:45:22PM +0100, Alvaro Herrera wrote:
> If we don't want to accept that risk (for which I see no argument, but
> happy to be proven wrong), I would suggest to use the foreach-pfree
> pattern Michael first proposed for the backbranches, and the new memory
> context in master.  I think this is conducive to better coding overall
> as we clean things up in this area.

Is it really worth betting on nobody doing something that does a
sizeof(PGOutputData) for the stable branches?  People like doing fancy
things, and we would not hear about such problems except if we push
the button making it a possibility because compiled code suddenly
breaks after a minor release update of the core engine.

Although, Debian code search [1] says this data structure is not used outside
PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
upgrade (even if it is known that such data structure is from that particular
output plugin -- pgoutput -- and other output plugins generally have its own
data structure). +1 from Alvaro's proposal.


--
Euler Taveira

RE: Memory leak in WAL sender with pgoutput (v10~)

From
"Zhijie Hou (Fujitsu)"
Date:
On Wednesday, December 4, 2024 8:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Dec 03, 2024 at 09:46:06PM -0300, Euler Taveira wrote:
> > Although, Debian code search [1] says this data structure is not used
> outside
> > PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
> > upgrade (even if it is known that such data structure is from that particular
> > output plugin -- pgoutput -- and other output plugins generally have its own
> > data structure). +1 from Alvaro's proposal.
>
> A lookup of the public repos of github did not show fancy with the
> manipulation of the structure for peoject related to Postgres, either.
>
> FWIW, I'm OK with the memory context reset solution as much as the
> direct free calls as we are sure that they will be safe.  And at the
> end of the day, the problem would be solved with any of these
> solutions.  My votes would be +0.6 for the free and +0.5 for the mcxt
> manipulation, so let's say that they are tied.
>
> As Alvaro and yourself are in favor of the mcxt approach, then let's
> go for it.

+1

> Amit has concerns with other code paths that could be
> similarly leaking.  I'm not sure if this is worth waiting too long
> based on how local the fix for the existing leak is with any of these
> solutions.

It appears there is an additional memory leak caused by allocating publication
names within the CacheMemoryContext, as noted in [1]. And it can also be fixed by
creating a separate memctx for publication names under the logical decoding
context. I think the approach makes sense since the lifespan of publication
names should ideally align with that of the logical decoding context.

[1]
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, December 4, 2024 8:55 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
>
> > Amit has concerns with other code paths that could be
> > similarly leaking.  I'm not sure if this is worth waiting too long
> > based on how local the fix for the existing leak is with any of these
> > solutions.
>
> It appears there is an additional memory leak caused by allocating publication
> names within the CacheMemoryContext, as noted in [1]. And it can also be fixed by
> creating a separate memctx for publication names under the logical decoding
> context. I think the approach makes sense since the lifespan of publication
> names should ideally align with that of the logical decoding context.
>

Yeah, I don't think we can go with the proposed patch for the local
memory context as it is.

--
With Regards,
Amit Kapila.



RE: Memory leak in WAL sender with pgoutput (v10~)

From
"Zhijie Hou (Fujitsu)"
Date:
On Wednesday, December 4, 2024 2:22 PM Michael Paquier <michael@paquier.xyz> wrote:
> 
> On Wed, Dec 04, 2024 at 11:05:43AM +0530, Amit Kapila wrote:
> > On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >> It appears there is an additional memory leak caused by allocating
> >> publication names within the CacheMemoryContext, as noted in [1]. And
> >> it can also be fixed by creating a separate memctx for publication
> >> names under the logical decoding context. I think the approach makes
> >> sense since the lifespan of publication names should ideally align with that
> of the logical decoding context.
> >
> > Yeah, I don't think we can go with the proposed patch for the local
> > memory context as it is.
> 
> Ah, indeed.  I was missing your point.  Would any of you like to write a patch
> to achieve that?

I can try to write a patch if no one else is working on this.

Best Regards,
Hou zj

Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Thu, Dec 5, 2024 at 1:32 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, December 4, 2024 7:39 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Wed, Dec 04, 2024 at 06:42:55AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > I can try to write a patch if no one else is working on this.
> >
> > If you have some room to write a patch, that would be really nice.
> > Thanks.
>
> No problem. Here is the patch for the HEAD. This patch introduces a new memory
> context within PGOutputData, specifically for allocating memory for
> publication_names. The new memory context is nested under the logical decoding
> context, ensuring it is freed at the end of decoding through
> FreeDecodingContext.

+1 for using new memory context to fix the issue for HEAD.

>
> I realized that this patch cannot be backpatched because it introduces a new
> field into the public PGOutputData structure. Therefore, I think we may need to
> use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions. I think we can live with such (a bit of) ugliness on back
branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Memory leak in WAL sender with pgoutput (v10~)

From
"Euler Taveira"
Date:
On Thu, Dec 5, 2024, at 1:31 AM, Zhijie Hou (Fujitsu) wrote:
No problem. Here is the patch for the HEAD. This patch introduces a new memory
context within PGOutputData, specifically for allocating memory for
publication_names. The new memory context is nested under the logical decoding
context, ensuring it is freed at the end of decoding through
FreeDecodingContext.

Thanks for taking care of it. I suggest 2 small adjustments: (a) use
ALLOCSET_SMALL_SIZES instead of ALLOCSET_DEFAULT_SIZES and (b) replace
pubmemcxt with pubmemctx (that's the same abbreviation used by
cachectx). I think you could remove 'mem' from this variable. My
suggestions are pubcxt or pubnamescxt. Although, I prefer the former, if
other publication elements are added to this context in the future.


--
Euler Taveira

RE: Memory leak in WAL sender with pgoutput (v10~)

From
"Zhijie Hou (Fujitsu)"
Date:
On Thursday, December 5, 2024 12:52 PM Michael Paquier <michael@paquier.xyz> wrote:

Hi,

>
> On Thu, Dec 05, 2024 at 04:31:56AM +0000, Zhijie Hou (Fujitsu) wrote:
> > I realized that this patch cannot be backpatched because it introduces
> > a new field into the public PGOutputData structure. Therefore, I think
> > we may need to use Alvaro's version [1] for the back branches.
> >
>
> Thanks for the patch.
>
> For HEAD it should be as good as it can be as it avoids the problem of
> CacheMemoryContext bloating for your case and my case.  Alvaro's patch
> would not take care of your case, unfortunately, but I'm less worried about this
> case in the back branches and we don't track the parent context where
> StartupDecodingContext() has begun its work when building PGOutputData.
> Thoughts?

I am fine with the plan. Thanks.

Best Regards,
Hou zj




Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >
> > I realized that this patch cannot be backpatched because it introduces a new
> > field into the public PGOutputData structure. Therefore, I think we may need to
> > use Alvaro's version [1] for the back branches.
>
> FWIW for back branches, I prefer using the foreach-pfree pattern
> Michael first proposed, just in case. It's not elegant but it can
> solve the problem while there is no risk of breaking non-core
> extensions.
>

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > >
> > > I realized that this patch cannot be backpatched because it introduces a new
> > > field into the public PGOutputData structure. Therefore, I think we may need to
> > > use Alvaro's version [1] for the back branches.
> >
> > FWIW for back branches, I prefer using the foreach-pfree pattern
> > Michael first proposed, just in case. It's not elegant but it can
> > solve the problem while there is no risk of breaking non-core
> > extensions.
> >
>
> It couldn't solve the problem completely even in back-branches. The
> SQL API case I mentioned and tested by Hou-San in the email [1] won't
> be solved.

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

        /* Map must live as long as the session does. */
        oldctx = MemoryContextSwitchTo(CacheMemoryContext);

        entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

        MemoryContextSwitchTo(oldctx);
        RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 10, 2024 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > >
> > > > I realized that this patch cannot be backpatched because it introduces a new
> > > > field into the public PGOutputData structure. Therefore, I think we may need to
> > > > use Alvaro's version [1] for the back branches.
> > >
> > > FWIW for back branches, I prefer using the foreach-pfree pattern
> > > Michael first proposed, just in case. It's not elegant but it can
> > > solve the problem while there is no risk of breaking non-core
> > > extensions.
> > >
> >
> > It couldn't solve the problem completely even in back-branches. The
> > SQL API case I mentioned and tested by Hou-San in the email [1] won't
> > be solved.
>
> True. There seems another place where we possibly leak memory on
> CacheMemoryContext when using pgoutput via SQL APIs:
>
>         /* Map must live as long as the session does. */
>         oldctx = MemoryContextSwitchTo(CacheMemoryContext);
>
>         entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);
>
>         MemoryContextSwitchTo(oldctx);
>         RelationClose(ancestor);
>
> entry->attrmap is pfree'd only when validating the RelationSyncEntry
> so remains even after logical decoding API calls.
>

We have also noticed this but it needs more analysis on the fix which
one of my colleagues is doing. I think we can fix this as a separate
issue unless you think otherwise.

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > It couldn't solve the problem completely even in back-branches. The
> > > SQL API case I mentioned and tested by Hou-San in the email [1] won't
> > > be solved.
> > >
> > > [1] -
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> >
> > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > thanks!).
>
> Yes, that makes sense. How about something like the attached patch.
>

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > It couldn't solve the problem completely even in back-branches. The
> > > > SQL API case I mentioned and tested by Hou-San in the email [1] won't
> > > > be solved.
> > > >
> > > > [1] -
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > >
> > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > thanks!).
> >
> > Yes, that makes sense. How about something like the attached patch.
> >
>
> - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> - if (data->publications)
> - {
> - list_free_deep(data->publications);
> - data->publications = NIL;
> - }
> + static MemoryContext pubctx = NULL;
> +
> + if (pubctx == NULL)
> + pubctx = AllocSetContextCreate(CacheMemoryContext,
> +    "logical replication publication list context",
> +    ALLOCSET_SMALL_SIZES);
> + else
> + MemoryContextReset(pubctx);
> +
> + oldctx = MemoryContextSwitchTo(pubctx);
>
> Considering the SQL API case, why is it okay to allocate this context
> under CacheMemoryContext?
>

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Mon, Dec 9, 2024 at 6:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 10, 2024 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > >
> > > > > I realized that this patch cannot be backpatched because it introduces a new
> > > > > field into the public PGOutputData structure. Therefore, I think we may need to
> > > > > use Alvaro's version [1] for the back branches.
> > > >
> > > > FWIW for back branches, I prefer using the foreach-pfree pattern
> > > > Michael first proposed, just in case. It's not elegant but it can
> > > > solve the problem while there is no risk of breaking non-core
> > > > extensions.
> > > >
> > >
> > > It couldn't solve the problem completely even in back-branches. The
> > > SQL API case I mentioned and tested by Hou-San in the email [1] won't
> > > be solved.
> >
> > True. There seems another place where we possibly leak memory on
> > CacheMemoryContext when using pgoutput via SQL APIs:
> >
> >         /* Map must live as long as the session does. */
> >         oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> >
> >         entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);
> >
> >         MemoryContextSwitchTo(oldctx);
> >         RelationClose(ancestor);
> >
> > entry->attrmap is pfree'd only when validating the RelationSyncEntry
> > so remains even after logical decoding API calls.
> >
>
> We have also noticed this but it needs more analysis on the fix which
> one of my colleagues is doing. I think we can fix this as a separate
> issue unless you think otherwise.

I agree to fix this as a separate patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:
> > > >
> > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > SQL API case I mentioned and tested by Hou-San in the email [1] won't
> > > > > be solved.
> > > > >
> > > > > [1] -
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > >
> > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > thanks!).
> > >
> > > Yes, that makes sense. How about something like the attached patch.
> > >
> >
> > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > - if (data->publications)
> > - {
> > - list_free_deep(data->publications);
> > - data->publications = NIL;
> > - }
> > + static MemoryContext pubctx = NULL;
> > +
> > + if (pubctx == NULL)
> > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > +    "logical replication publication list context",
> > +    ALLOCSET_SMALL_SIZES);
> > + else
> > + MemoryContextReset(pubctx);
> > +
> > + oldctx = MemoryContextSwitchTo(pubctx);
> >
> > Considering the SQL API case, why is it okay to allocate this context
> > under CacheMemoryContext?
> >
>
> On further thinking, we can't allocate it under
> LogicalDecodingContext->context because once that is freed at the end
> of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> dangling memory. One idea is that we use
> MemoryContextRegisterResetCallback() to invoke a reset callback
> function where we can reset pubctx but not sure if we want to go there
> in back branches. OTOH, the currently proposed fix won't leak memory
> on repeated calls to pg_logical_slot_get_changes(), so that might be
> okay as well.
>
> Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



RE: Memory leak in WAL sender with pgoutput (v10~)

From
"Zhijie Hou (Fujitsu)"
Date:
On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>
> wrote:
> > > >
> > > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>
> wrote:
> > > > >
> > > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > > SQL API case I mentioned and tested by Hou-San in the email [1]
> won't
> > > > > > be solved.
> > > > > >
> > > > > > [1] -
> https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F
> 2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > > >
> > > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > > thanks!).
> > > >
> > > > Yes, that makes sense. How about something like the attached patch.
> > > >
> > >
> > > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > > - if (data->publications)
> > > - {
> > > - list_free_deep(data->publications);
> > > - data->publications = NIL;
> > > - }
> > > + static MemoryContext pubctx = NULL;
> > > +
> > > + if (pubctx == NULL)
> > > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > > +    "logical replication publication list context",
> > > +    ALLOCSET_SMALL_SIZES);
> > > + else
> > > + MemoryContextReset(pubctx);
> > > +
> > > + oldctx = MemoryContextSwitchTo(pubctx);
> > >
> > > Considering the SQL API case, why is it okay to allocate this context
> > > under CacheMemoryContext?
> > >
> >
> > On further thinking, we can't allocate it under
> > LogicalDecodingContext->context because once that is freed at the end
> > of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> > dangling memory. One idea is that we use
> > MemoryContextRegisterResetCallback() to invoke a reset callback
> > function where we can reset pubctx but not sure if we want to go there
> > in back branches. OTOH, the currently proposed fix won't leak memory
> > on repeated calls to pg_logical_slot_get_changes(), so that might be
> > okay as well.
> >
> > Thoughts?
> 
> Alternative idea is to declare pubctx as a file static variable. And
> we create the memory context under LogicalDecodingContext->context in
> the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

Best Regards,
Hou zj

Re: Memory leak in WAL sender with pgoutput (v10~)

From
vignesh C
Date:
On Tue, 10 Dec 2024 at 23:36, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Dec 9, 2024 at 6:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 10, 2024 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > >
> > > > > > I realized that this patch cannot be backpatched because it introduces a new
> > > > > > field into the public PGOutputData structure. Therefore, I think we may need to
> > > > > > use Alvaro's version [1] for the back branches.
> > > > >
> > > > > FWIW for back branches, I prefer using the foreach-pfree pattern
> > > > > Michael first proposed, just in case. It's not elegant but it can
> > > > > solve the problem while there is no risk of breaking non-core
> > > > > extensions.
> > > > >
> > > >
> > > > It couldn't solve the problem completely even in back-branches. The
> > > > SQL API case I mentioned and tested by Hou-San in the email [1] won't
> > > > be solved.
> > >
> > > True. There seems another place where we possibly leak memory on
> > > CacheMemoryContext when using pgoutput via SQL APIs:
> > >
> > >         /* Map must live as long as the session does. */
> > >         oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > >
> > >         entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);
> > >
> > >         MemoryContextSwitchTo(oldctx);
> > >         RelationClose(ancestor);
> > >
> > > entry->attrmap is pfree'd only when validating the RelationSyncEntry
> > > so remains even after logical decoding API calls.
> > >
> >
> > We have also noticed this but it needs more analysis on the fix which
> > one of my colleagues is doing. I think we can fix this as a separate
> > issue unless you think otherwise.
>
> I agree to fix this as a separate patch.

Thanks Sawada-san, I have started a new thread with a test case which
can reproduce this issue at [1]:
[1] - https://www.postgresql.org/message-id/flat/CALDaNm1hewNAsZ_e6FF52a%3D9drmkRJxtEPrzCB6-9mkJyeBBqA%40mail.gmail.com

Regards,
Vignesh



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > > > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>
> > wrote:
> > > > >
> > > > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>
> > wrote:
> > > > > >
> > > > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > > > SQL API case I mentioned and tested by Hou-San in the email [1]
> > won't
> > > > > > > be solved.
> > > > > > >
> > > > > > > [1] -
> > https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F
> > 2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > > > >
> > > > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > > > thanks!).
> > > > >
> > > > > Yes, that makes sense. How about something like the attached patch.
> > > > >
> > > >
> > > > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > > > - if (data->publications)
> > > > - {
> > > > - list_free_deep(data->publications);
> > > > - data->publications = NIL;
> > > > - }
> > > > + static MemoryContext pubctx = NULL;
> > > > +
> > > > + if (pubctx == NULL)
> > > > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > > > +    "logical replication publication list context",
> > > > +    ALLOCSET_SMALL_SIZES);
> > > > + else
> > > > + MemoryContextReset(pubctx);
> > > > +
> > > > + oldctx = MemoryContextSwitchTo(pubctx);
> > > >
> > > > Considering the SQL API case, why is it okay to allocate this context
> > > > under CacheMemoryContext?
> > > >
> > >
> > > On further thinking, we can't allocate it under
> > > LogicalDecodingContext->context because once that is freed at the end
> > > of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> > > dangling memory. One idea is that we use
> > > MemoryContextRegisterResetCallback() to invoke a reset callback
> > > function where we can reset pubctx but not sure if we want to go there
> > > in back branches. OTOH, the currently proposed fix won't leak memory
> > > on repeated calls to pg_logical_slot_get_changes(), so that might be
> > > okay as well.
> > >
> > > Thoughts?
> >
> > Alternative idea is to declare pubctx as a file static variable. And
> > we create the memory context under LogicalDecodingContext->context in
> > the startup callback and free it in the shutdown callback.
>
> I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
> API, the shutdown callback function is not invoked. This would result in the
> static variable not being reset, which, I think, is why Amit mentioned the use
> of MemoryContextRegisterResetCallback().

My idea is that since that new context is cleaned up together with its
parent context (LogicalDecodingContext->context), we unconditionally
set that new context to the static variable at the startup callback.
That being said, Amit's idea would be cleaner.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
> > > wrote:
> > > >
> > > > On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>
> > > wrote:
> > > > >
> > > > > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>
> > > wrote:
> > > > > >
> > > > > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>
> > > wrote:
> > > > > > >
> > > > > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > > > > SQL API case I mentioned and tested by Hou-San in the email [1]
> > > won't
> > > > > > > > be solved.
> > > > > > > >
> > > > > > > > [1] -
> > > https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F
> > > 2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > > > > >
> > > > > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > > > > thanks!).
> > > > > >
> > > > > > Yes, that makes sense. How about something like the attached patch.
> > > > > >
> > > > >
> > > > > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > > > > - if (data->publications)
> > > > > - {
> > > > > - list_free_deep(data->publications);
> > > > > - data->publications = NIL;
> > > > > - }
> > > > > + static MemoryContext pubctx = NULL;
> > > > > +
> > > > > + if (pubctx == NULL)
> > > > > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > > > > +    "logical replication publication list context",
> > > > > +    ALLOCSET_SMALL_SIZES);
> > > > > + else
> > > > > + MemoryContextReset(pubctx);
> > > > > +
> > > > > + oldctx = MemoryContextSwitchTo(pubctx);
> > > > >
> > > > > Considering the SQL API case, why is it okay to allocate this context
> > > > > under CacheMemoryContext?
> > > > >
> > > >
> > > > On further thinking, we can't allocate it under
> > > > LogicalDecodingContext->context because once that is freed at the end
> > > > of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> > > > dangling memory. One idea is that we use
> > > > MemoryContextRegisterResetCallback() to invoke a reset callback
> > > > function where we can reset pubctx but not sure if we want to go there
> > > > in back branches. OTOH, the currently proposed fix won't leak memory
> > > > on repeated calls to pg_logical_slot_get_changes(), so that might be
> > > > okay as well.
> > > >
> > > > Thoughts?
> > >
> > > Alternative idea is to declare pubctx as a file static variable. And
> > > we create the memory context under LogicalDecodingContext->context in
> > > the startup callback and free it in the shutdown callback.
> >
> > I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
> > API, the shutdown callback function is not invoked. This would result in the
> > static variable not being reset, which, I think, is why Amit mentioned the use
> > of MemoryContextRegisterResetCallback().
>
> My idea is that since that new context is cleaned up together with its
> parent context (LogicalDecodingContext->context), we unconditionally
> set that new context to the static variable at the startup callback.
> That being said, Amit's idea would be cleaner.
>

Your preference is not completely clear. Are you okay with the idea of
Vignesh's currently proposed patch for back-branches, or do you prefer
to use a memory context reset callback, or do you have a different
idea that should be adopted for back-branches?

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Fri, Dec 13, 2024 at 1:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Dec 11, 2024 at 9:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > > > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > > > > > > SQL API case I mentioned and tested by Hou-San in the email [1]
> > > > > won't
> > > > > > > > > > be solved.
> > > > > > > > > >
> > > > > > > > > > [1] -
> > > > > https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F
> > > > > 2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > > > > > > >
> > > > > > > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > > > > > > thanks!).
> > > > > > > >
> > > > > > > > Yes, that makes sense. How about something like the attached patch.
> > > > > > > >
> > > > > > >
> > > > > > > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > > > > > > - if (data->publications)
> > > > > > > - {
> > > > > > > - list_free_deep(data->publications);
> > > > > > > - data->publications = NIL;
> > > > > > > - }
> > > > > > > + static MemoryContext pubctx = NULL;
> > > > > > > +
> > > > > > > + if (pubctx == NULL)
> > > > > > > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > > > > > > +    "logical replication publication list context",
> > > > > > > +    ALLOCSET_SMALL_SIZES);
> > > > > > > + else
> > > > > > > + MemoryContextReset(pubctx);
> > > > > > > +
> > > > > > > + oldctx = MemoryContextSwitchTo(pubctx);
> > > > > > >
> > > > > > > Considering the SQL API case, why is it okay to allocate this context
> > > > > > > under CacheMemoryContext?
> > > > > > >
> > > > > >
> > > > > > On further thinking, we can't allocate it under
> > > > > > LogicalDecodingContext->context because once that is freed at the end
> > > > > > of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> > > > > > dangling memory. One idea is that we use
> > > > > > MemoryContextRegisterResetCallback() to invoke a reset callback
> > > > > > function where we can reset pubctx but not sure if we want to go there
> > > > > > in back branches. OTOH, the currently proposed fix won't leak memory
> > > > > > on repeated calls to pg_logical_slot_get_changes(), so that might be
> > > > > > okay as well.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > Alternative idea is to declare pubctx as a file static variable. And
> > > > > we create the memory context under LogicalDecodingContext->context in
> > > > > the startup callback and free it in the shutdown callback.
> > > >
> > > > I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
> > > > API, the shutdown callback function is not invoked. This would result in the
> > > > static variable not being reset, which, I think, is why Amit mentioned the use
> > > > of MemoryContextRegisterResetCallback().
> > >
> > > My idea is that since that new context is cleaned up together with its
> > > parent context (LogicalDecodingContext->context), we unconditionally
> > > set that new context to the static variable at the startup callback.
> > > That being said, Amit's idea would be cleaner.
> > >
> >
> > Your preference is not completely clear. Are you okay with the idea of
> > Vignesh's currently proposed patch for back-branches, or do you prefer
> > to use a memory context reset callback, or do you have a different
> > idea that should be adopted for back-branches?
>
> IIUC the current Vignesh's patch[1] doesn't solve the memory leak in
> case of using logical decoding APIs, as you mentioned.
>

Right, but note that it wouldn't leak memory on repeated calls to the
API. Only if the backend ever makes a single call for get_changes will
it leak memory once, which is not ideal. Still, we can live with it if
the other approaches are complex for back branches.

> I've tried the
> idea of using memory context reset callback to reset pubctx. We need
> to register the callback to LogicalContextDecodingContext->context,
> meaning that we need to pass it to get_rel_sync_entry() (see
> fix_memory_leak_v1.patch). I don't prefer this approach as it could
> make backpatching complex in the future. Alternatively, we can declare
> pubctx as a file static variable, create the memory context at the
> startup callback, reset  the pubctx at the shutdown callback, and use
> the memory context reset callback to ensure the pubctx is reset (see
> fix_memory_leak_v2.patch).Or I think we might not necessarily need to
> use the memory context reset callback (see fix_memory_leak_v3.patch).
> I prefer the latter two approaches.
>

Will fix_memory_leak_v3.patch avoid the leak in case of an ERROR in
SQL API? If so, how?

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Tue, Dec 17, 2024 at 2:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 13, 2024 at 1:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 9:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
> > > > <houzj.fnst@fujitsu.com> wrote:
> > > > >
> > > > > On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > > > > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > > > > > > > SQL API case I mentioned and tested by Hou-San in the email [1]
> > > > > > won't
> > > > > > > > > > > be solved.
> > > > > > > > > > >
> > > > > > > > > > > [1] -
> > > > > > https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F
> > > > > > 2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > > > > > > > >
> > > > > > > > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > > > > > > > thanks!).
> > > > > > > > >
> > > > > > > > > Yes, that makes sense. How about something like the attached patch.
> > > > > > > > >
> > > > > > > >
> > > > > > > > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > > > > > > > - if (data->publications)
> > > > > > > > - {
> > > > > > > > - list_free_deep(data->publications);
> > > > > > > > - data->publications = NIL;
> > > > > > > > - }
> > > > > > > > + static MemoryContext pubctx = NULL;
> > > > > > > > +
> > > > > > > > + if (pubctx == NULL)
> > > > > > > > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > > > > > > > +    "logical replication publication list context",
> > > > > > > > +    ALLOCSET_SMALL_SIZES);
> > > > > > > > + else
> > > > > > > > + MemoryContextReset(pubctx);
> > > > > > > > +
> > > > > > > > + oldctx = MemoryContextSwitchTo(pubctx);
> > > > > > > >
> > > > > > > > Considering the SQL API case, why is it okay to allocate this context
> > > > > > > > under CacheMemoryContext?
> > > > > > > >
> > > > > > >
> > > > > > > On further thinking, we can't allocate it under
> > > > > > > LogicalDecodingContext->context because once that is freed at the end
> > > > > > > of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> > > > > > > dangling memory. One idea is that we use
> > > > > > > MemoryContextRegisterResetCallback() to invoke a reset callback
> > > > > > > function where we can reset pubctx but not sure if we want to go there
> > > > > > > in back branches. OTOH, the currently proposed fix won't leak memory
> > > > > > > on repeated calls to pg_logical_slot_get_changes(), so that might be
> > > > > > > okay as well.
> > > > > > >
> > > > > > > Thoughts?
> > > > > >
> > > > > > Alternative idea is to declare pubctx as a file static variable. And
> > > > > > we create the memory context under LogicalDecodingContext->context in
> > > > > > the startup callback and free it in the shutdown callback.
> > > > >
> > > > > I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
> > > > > API, the shutdown callback function is not invoked. This would result in the
> > > > > static variable not being reset, which, I think, is why Amit mentioned the use
> > > > > of MemoryContextRegisterResetCallback().
> > > >
> > > > My idea is that since that new context is cleaned up together with its
> > > > parent context (LogicalDecodingContext->context), we unconditionally
> > > > set that new context to the static variable at the startup callback.
> > > > That being said, Amit's idea would be cleaner.
> > > >
> > >
> > > Your preference is not completely clear. Are you okay with the idea of
> > > Vignesh's currently proposed patch for back-branches, or do you prefer
> > > to use a memory context reset callback, or do you have a different
> > > idea that should be adopted for back-branches?
> >
> > IIUC the current Vignesh's patch[1] doesn't solve the memory leak in
> > case of using logical decoding APIs, as you mentioned.
> >
>
> Right, but note that it wouldn't leak memory on repeated calls to the
> API. Only if the backend ever makes a single call for get_changes will
> it leak memory once, which is not ideal. Still, we can live with it if
> the other approaches are complex for back branches.

True. I missed that point.

>
> > I've tried the
> > idea of using memory context reset callback to reset pubctx. We need
> > to register the callback to LogicalContextDecodingContext->context,
> > meaning that we need to pass it to get_rel_sync_entry() (see
> > fix_memory_leak_v1.patch). I don't prefer this approach as it could
> > make backpatching complex in the future. Alternatively, we can declare
> > pubctx as a file static variable, create the memory context at the
> > startup callback, reset  the pubctx at the shutdown callback, and use
> > the memory context reset callback to ensure the pubctx is reset (see
> > fix_memory_leak_v2.patch).Or I think we might not necessarily need to
> > use the memory context reset callback (see fix_memory_leak_v3.patch).
> > I prefer the latter two approaches.
> >
>
> Will fix_memory_leak_v3.patch avoid the leak in case of an ERROR in
> SQL API? If so, how?

The pubctx is created as a child of LogicalDecodingContext->context.
On an error, the pubctx is cleaned up altogether when cleaning up
LogicalDecodingContext->context.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Wed, Dec 18, 2024 at 12:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Dec 17, 2024 at 2:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Will fix_memory_leak_v3.patch avoid the leak in case of an ERROR in
> > SQL API? If so, how?
>
> The pubctx is created as a child of LogicalDecodingContext->context.
> On an error, the pubctx is cleaned up altogether when cleaning up
> LogicalDecodingContext->context.
>

The difference between fix_memory_leak_v2 and fix_memory_leak_v3 is
that the earlier one resets the pubctx to NULL along with freeing the
context memory. Resetting a file-level global variable is a good idea,
similar to what we do for RelationSyncCache, so I prefer v2 over v3,
but I am fine if you would like to proceed with v3.

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Wed, Dec 18, 2024 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Dec 19, 2024 at 07:50:57AM +0530, Amit Kapila wrote:
> > The difference between fix_memory_leak_v2 and fix_memory_leak_v3 is
> > that the earlier one resets the pubctx to NULL along with freeing the
> > context memory. Resetting a file-level global variable is a good idea,
> > similar to what we do for RelationSyncCache, so I prefer v2 over v3,
> > but I am fine if you would like to proceed with v3.
>
> FWIW, I am not OK with v3.  I've raised this exact point a couple of
> days ago upthread:
> https://www.postgresql.org/message-id/Z1t5pXsNEYwS4P5k@paquier.xyz
>
> v2 does not have these weaknesses by design.

I agree that v2 is better than v3 in terms of that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Masahiko Sawada
Date:
On Thu, Dec 19, 2024 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Dec 19, 2024 at 09:27:04AM -0800, Masahiko Sawada wrote:
> > On Wed, Dec 18, 2024 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:
> >> v2 does not have these weaknesses by design.
> >
> > I agree that v2 is better than v3 in terms of that.
>
> Okay.  In terms of the backbranches, would you prefer that I handle
> this patch myself as I have done the HEAD part?  This would need a
> second, closer, review but I could do that at the beginning of next
> week.

Thanks. Please proceed with this fix as you've already fixed the HEAD part.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com