RE: Memory leak in WAL sender with pgoutput (v10~) - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Memory leak in WAL sender with pgoutput (v10~)
Date
Msg-id OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Memory leak in WAL sender with pgoutput (v10~)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tuesday, December 3, 2024 4:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier <michael@paquier.xyz>
> wrote:
> >
> > On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:
> > > But that suits the current design more. We allocate PGOutputData and
> > > other contexts in that structure in a "Logical decoding context". A
> > > few of its members (publications, publication_names) residing in
> > > totally unrelated contexts sounds odd. In the first place, we don't
> > > need to allocate publications under CacheMemoryContext, they should
> > > be allocated in PGOutputData->cachectx. However, because we need to
> > > free those entirely at one-shot during invalidation processing, we
> > > could use a new context as a child context of
> > > PGOutputData->cachectx. Unless I am missing something, the current
> > > memory context usage appears more like a coding convenience than a
> thoughtful design decision.
> >
> > PGOutputData->cachectx has been introduced in 2022 in commit
> > PGOutputData->52e4f0cd4,
> > while the decision to have RelationSyncEntry and the publication list
> > in CacheMemoryContext gets down to v10 where this logical replication
> > has been introduced.  This was a carefully-thought choice back then
> > because this is data that belongs to the process cache, so yes, this
> > choice makes sense to me.
> >
> 
> The parent structure (PGOutputData) was stored in the "Logical decoding
> context" even in v11. So, how does storing its member 'publications' in
> CacheMemoryContext a good idea? It is possible that we are leaking memory
> while doing decoding via SQL APIs where we free decoding context after
> getting changes though I haven't tested the same.

Right. I think I have faced this memory leak recently. It might be true for
walsender that 'publications' is a per-process content. But SQL APIs might use
different publication names each time during execution.

I can reproduce the memory leak due to allocating the publication
names under CacheMemoryContext like the following:

--
CREATE PUBLICATION pub FOR ALL TABLES;
CREATE TABLE stream_test(a int);
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput');
INSERT INTO stream_test SELECT generate_series(1, 2, 1);

- he backend's memory usage increases with each execution of the following function
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '4',
'publication_names','pub,pub,........ <lots of pub names>');
 

Best Regards,
Hou zj



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: An inefficient query caused by unnecessary PlaceHolderVar
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: meson missing test dependencies