Thread: Re: Memory leak in WAL sender with pgoutput (v10~)

Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Mon, Dec 2, 2024 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Nov 30, 2024 at 09:28:31AM +0100, Alvaro Herrera wrote:
> > I'm not sure about your proposed fix.  Isn't it simpler to have another
> > memory context which we can reset instead of doing list_free_deep()?  It
> > doesn't have to be a global memory context -- since this is not
> > reentrant and not referenced anywhere else, it can be a simple static
> > variable in that block, as in the attached.  I ran the stock tests (no
> > sysbench) and at least it doesn't crash.
> >
> > This should be easily backpatchable also, since there's no ABI change.
>
> Yeah.  Using more memory contexts around these API calls done without
> the cache manipulation is something I've also mentioned to Amit a few
> days ago, and I'm feeling that this concept should be applied to a
> broader area than just the cached publication list in light of this
> thread and ea792bfd93ab.
>

We already have PGOutputData->cachectx which could be used for it. I
think we should be able to reset such a context when we are
revalidating the publications. Even, if we want a new context for some
localized handling, we should add that in PGOutputData rather than a
local context as the proposed patch is doing at the very least for
HEAD.

Can't we consider freeing the publication names individually that can
be backpatchable and have no or minimal risk of breaking anything?
>
> I am slightly concerned about the current design of GetPublication()
> in the long-term, TBH.  LoadPublications() has hidden the leak behind
> two layers of routines in the WAL sender, and that's easy to miss once
> you call anything that loads a Publication depending on how the caller
> caches its data.  So I would still choose for modifying the structure
> on HEAD removing the pstrdup() for the publication name.
>

BTW, the subscription structure also used the name in a similar way.
This will make the publication/subscription names handled differently.

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Mon, Dec 2, 2024 at 12:49 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Dec 02, 2024 at 11:47:09AM +0530, Amit Kapila wrote:
> > We already have PGOutputData->cachectx which could be used for it. I
> > think we should be able to reset such a context when we are
> > revalidating the publications. Even, if we want a new context for some
> > localized handling, we should add that in PGOutputData rather than a
> > local context as the proposed patch is doing at the very least for
> > HEAD.
>
> cachectx is used for the publications and the hash table holding
> all the RelationSyncEntry entries, but we lack control of individual
> parts within it.  So you cannot reset the whole context when
> processing a publication invalication.  Perhaps adding that to
> PGOutputData would be better, but that would be inconsistent with
> RelationSyncCache.
>

AFAICS, RelationSyncCache is not allocated in PGOutputData->cachectx.
It is allocated in CacheMemoryContext, see caller of
init_rel_sync_cache(). I think you are talking about individual hash
entries. Ideally, we can free all entries together and reset cachectx
but right now, we are freeing the allocated memory in those entries,
if required, at the next access. So, resetting the entire
PGOutputData->cachectx won't be possible. But, I don't get why adding
new context in PGOutputData for publications would be inconsistent
with RelationSyncCache? Anyway, I think it would be okay to
retail-free in this case, see the below responses.

> > Can't we consider freeing the publication names individually that can
> > be backpatchable and have no or minimal risk of breaking anything?
>
> Sure.  The first thing I did was a loop that goes through the
> publication list and does individual pfree() for the publication
> names.  That works, but IMO that's weird as we rely on the internals
> of GetPublication() hidden two levels down in pg_publication.c.
>

We can look at it from a different angle which is that the
FreePublication(s) relies on how the knowledge of Publication
structure is built. So, it doesn't look weird if we think from that
angle.

> >> I am slightly concerned about the current design of GetPublication()
> >> in the long-term, TBH.  LoadPublications() has hidden the leak behind
> >> two layers of routines in the WAL sender, and that's easy to miss once
> >> you call anything that loads a Publication depending on how the caller
> >> caches its data.  So I would still choose for modifying the structure
> >> on HEAD removing the pstrdup() for the publication name.
> >>
> >
> > BTW, the subscription structure also used the name in a similar way.
> > This will make the publication/subscription names handled differently.
>
> Good point about the inconsistency, so the name could also be switched
> to a fixed-NAMEDATALEN there if we were to do that.  The subscription
> has much more pstrdup() fields, though..  How about having some Free()
> routines instead that deal with the whole cleanup of a single list
> entry?  If that's kept close to the GetPublication() and
> GetSubscription() routines, a refresh when changing these structures
> would be hard to miss.
>

We already have FreeSubscription() which free name and other things
before calling list_free_deep. So, I thought a call on those lines for
publications wouldn't be a bad idea.

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Alvaro Herrera
Date:
On 2024-Dec-02, Amit Kapila wrote:

> We already have PGOutputData->cachectx which could be used for it.
> I think we should be able to reset such a context when we are
> revalidating the publications.

I don't see that context being reset anywhere, so I have a hard time
imagining that this would work without subtle risk of breakage
elsewhere.

> Even, if we want a new context for some localized handling, we should
> add that in PGOutputData rather than a local context as the proposed
> patch is doing at the very least for HEAD.

I don't necessarily agree, given that this context is not needed
anywhere else.

> Can't we consider freeing the publication names individually that can
> be backpatchable and have no or minimal risk of breaking anything?

That was my first thought, but then it occurred to me that such a thing
would be totally pointless.

> > you call anything that loads a Publication depending on how the caller
> > caches its data.  So I would still choose for modifying the structure
> > on HEAD removing the pstrdup() for the publication name.
> 
> BTW, the subscription structure also used the name in a similar way.
> This will make the publication/subscription names handled differently.

True (with conninfo, slotname, synccommit, and origin).

FWIW it seems FreeSubscription is incomplete, and not only because it
fails to free the publication names ...

(Why are we storing a string in Subscription->synccommit?)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia."  (El principio Dilbert)



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 3, 2024 at 2:12 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Dec-02, Amit Kapila wrote:
>
> > Even, if we want a new context for some localized handling, we should
> > add that in PGOutputData rather than a local context as the proposed
> > patch is doing at the very least for HEAD.
>
> I don't necessarily agree, given that this context is not needed
> anywhere else.
>

But that suits the current design more. We allocate PGOutputData and
other contexts in that structure in a "Logical decoding context". A
few of its members (publications, publication_names) residing in
totally unrelated contexts sounds odd. In the first place, we don't
need to allocate publications under CacheMemoryContext, they should be
allocated in PGOutputData->cachectx. However, because we need to free
those entirely at one-shot during invalidation processing, we could
use a new context as a child context of PGOutputData->cachectx. Unless
I am missing something, the current memory context usage appears more
like a coding convenience than a thoughtful design decision.

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Peter Smith
Date:
On Tue, Dec 3, 2024 at 5:47 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Dec 02, 2024 at 03:29:56PM +0530, Amit Kapila wrote:
> > We can look at it from a different angle which is that the
> > FreePublication(s) relies on how the knowledge of Publication
> > structure is built. So, it doesn't look weird if we think from that
> > angle.
>
> OK, I can live with that on all the stable branches with an extra
> list free rather than a deep list free.
>
> I agree that the memory handling of this whole area needs some rework
> to make such leaks harder to introduce in the WAL sender.  Still,
> let's first solve the problem at hand :)
>
> So how about the attached that introduces a FreePublication() matching
> with GetPublication(), used to do the cleanup?  Feel free to comment.
> --

Perhaps the patch can use foreach_ptr macro instead of foreach?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Alvaro Herrera
Date:
On 2024-Dec-03, Peter Smith wrote:

> Perhaps the patch can use foreach_ptr macro instead of foreach?

That cannot be backpatched, though.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:
> > But that suits the current design more. We allocate PGOutputData and
> > other contexts in that structure in a "Logical decoding context". A
> > few of its members (publications, publication_names) residing in
> > totally unrelated contexts sounds odd. In the first place, we don't
> > need to allocate publications under CacheMemoryContext, they should be
> > allocated in PGOutputData->cachectx. However, because we need to free
> > those entirely at one-shot during invalidation processing, we could
> > use a new context as a child context of PGOutputData->cachectx. Unless
> > I am missing something, the current memory context usage appears more
> > like a coding convenience than a thoughtful design decision.
>
> PGOutputData->cachectx has been introduced in 2022 in commit 52e4f0cd4,
> while the decision to have RelationSyncEntry and the publication list
> in CacheMemoryContext gets down to v10 where this logical replication
> has been introduced.  This was a carefully-thought choice back then
> because this is data that belongs to the process cache, so yes, this
> choice makes sense to me.
>

The parent structure (PGOutputData) was stored in the "Logical
decoding context" even in v11. So, how does storing its member
'publications' in CacheMemoryContext a good idea? It is possible that
we are leaking memory while doing decoding via SQL APIs where we free
decoding context after getting changes though I haven't tested the
same.

--
With Regards,
Amit Kapila.



Re: Memory leak in WAL sender with pgoutput (v10~)

From
Amit Kapila
Date:
On Tue, Dec 3, 2024 at 12:17 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> So how about the attached that introduces a FreePublication() matching
> with GetPublication(), used to do the cleanup?  Feel free to comment.
>

As you would have noted I am fine with the fix on these lines but I
suggest holding it till we conclude the memory context point raised by
me today. It is possible that we are still leaking some memory in
other related scenarios.

--
With Regards,
Amit Kapila.



RE: Memory leak in WAL sender with pgoutput (v10~)

From
"Zhijie Hou (Fujitsu)"
Date:
On Tuesday, December 3, 2024 4:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier <michael@paquier.xyz>
> wrote:
> >
> > On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:
> > > But that suits the current design more. We allocate PGOutputData and
> > > other contexts in that structure in a "Logical decoding context". A
> > > few of its members (publications, publication_names) residing in
> > > totally unrelated contexts sounds odd. In the first place, we don't
> > > need to allocate publications under CacheMemoryContext, they should
> > > be allocated in PGOutputData->cachectx. However, because we need to
> > > free those entirely at one-shot during invalidation processing, we
> > > could use a new context as a child context of
> > > PGOutputData->cachectx. Unless I am missing something, the current
> > > memory context usage appears more like a coding convenience than a
> thoughtful design decision.
> >
> > PGOutputData->cachectx has been introduced in 2022 in commit
> > PGOutputData->52e4f0cd4,
> > while the decision to have RelationSyncEntry and the publication list
> > in CacheMemoryContext gets down to v10 where this logical replication
> > has been introduced.  This was a carefully-thought choice back then
> > because this is data that belongs to the process cache, so yes, this
> > choice makes sense to me.
> >
> 
> The parent structure (PGOutputData) was stored in the "Logical decoding
> context" even in v11. So, how does storing its member 'publications' in
> CacheMemoryContext a good idea? It is possible that we are leaking memory
> while doing decoding via SQL APIs where we free decoding context after
> getting changes though I haven't tested the same.

Right. I think I have faced this memory leak recently. It might be true for
walsender that 'publications' is a per-process content. But SQL APIs might use
different publication names each time during execution.

I can reproduce the memory leak due to allocating the publication
names under CacheMemoryContext like the following:

--
CREATE PUBLICATION pub FOR ALL TABLES;
CREATE TABLE stream_test(a int);
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput');
INSERT INTO stream_test SELECT generate_series(1, 2, 1);

- he backend's memory usage increases with each execution of the following function
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '4',
'publication_names','pub,pub,........ <lots of pub names>');
 

Best Regards,
Hou zj



RE: Memory leak in WAL sender with pgoutput (v10~)

From
"Zhijie Hou (Fujitsu)"
Date:
On Tuesday, December 3, 2024 4:43 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 
> On 2024-Dec-02, Amit Kapila wrote:
> > > you call anything that loads a Publication depending on how the
> > > caller caches its data.  So I would still choose for modifying the
> > > structure on HEAD removing the pstrdup() for the publication name.
> >
> > BTW, the subscription structure also used the name in a similar way.
> > This will make the publication/subscription names handled differently.
> 
> True (with conninfo, slotname, synccommit, and origin).
...
>
> (Why are we storing a string in Subscription->synccommit?)

I think it's because the primary purpose of sub->synccommit is to serve as a
parameter for SetConfigOption() in the apply worker, which requires a string
value. Additionally, the existing function set_config_option() that validates
this option only accepts a string input. Although we could convert
sub->synccommit to an integer, this would necessitate additional conversion
code before passing it to these functions.

Best Regards,
Hou zj