Re: Invalid pointer access in logical decoding after error - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Invalid pointer access in logical decoding after error
Date
Msg-id CAD21AoCgqZ0BUpXjVY6tD1jSLtVSdWpG+LZyZimq4Uu3TymTAA@mail.gmail.com
Whole thread Raw
In response to RE: Invalid pointer access in logical decoding after error  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Thu, Jul 3, 2025 at 7:55 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 2 Jul 2025 at 13:21, Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 2:42 PM vignesh C wrote:
> >
> > >
> > > Hi,
> > >
> > > I encountered an invalid pointer access issue. Below are the steps to
> > > reproduce the issue:
> > ...
> > > The error occurs because entry->columns is allocated in the entry
> > > private context (entry->entry_cxt) by pub_collist_to_bitmapset(). This
> > > context is a child of the PortalContext, which is cleared after an
> > > error via: AbortTransaction
> > > -> AtAbort_Portals  ->
> > > MemoryContextDeleteChildren  -> MemoryContextDelete   ->
> > > MemoryContextDeleteOnly
> > > As a result, the memory backing entry->columns is freed, but the
> > > RelationSyncCache which resides in CacheMemoryContext and thus
> > > survives the error still holds a dangling pointer to this freed
> > > memory, causing it to pfree an invalid pointer.
> > > In the normal (positive) execution flow, pgoutput_shutdown() is called
> > > to clean up the RelationSyncCache. This happens via:
> > > FreeDecodingContext -> shutdown_cb_wrapper -> pgoutput_shutdown But
> > > this is not called in case of an error case. To handle this case
> > > safely, I suggest calling FreeDecodingContext in the PG_CATCH block to
> > > ensure pgoutput_shutdown is invoked and the stale cache is cleared appropriately.
> > > Attached patch has the changes for the same.
> > > Thoughts?
> >
> > Thank you for reporting the issue and providing a fix.
> >
> > I recall that we identified this general issue with the hash table in pgoutput
> > in other threads as well [1]. The basic consensus [2] is that calling
> > FreeDecodingContext() within PG_CATCH is not ideal, as this function includes
> > user code, increasing the risk of encountering another error within PG_CATCH.
> > This scenario could prevent execution of subsequent code to invalidate syscache
> > entries, which is problematic.
>
> Yes, let's avoid this.
>
> > I think a better fix could be to introduce a memory context reset callback(on
> > data->cachectx) and perform the actions of pgoutput_shutdown() within it.
>
> The attached v2 version patch has the changes for the same.

We've addressed several memory-related issues in pgoutput. While most
of these issues didn't affect logical replication, they did impact
logical decoding called via SQL API. I find that these problems stem
from RelationSyncCache being defined as a file-scope static variable
and being allocated in CacheMemoryContext. I'm wondering if we could
move it to PGOutputData and create it under the logical decoding
context. This would ensure it's automatically cleaned up along with
the logical decoding context.

I also noticed another concerning issue: the entry->streamed_txns list
is maintained in CacheMemoryContext (see
set_schema_sent_in_streamed_txn()). This could lead to memory leaks
when logical decoding called via the SQL API encounters an error. This
issue isn't addressed in the current patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Orphan page in _bt_split
Next
From: Peter Smith
Date:
Subject: Re: Add support for specifying tables in pg_createsubscriber.