Re: Invalid pointer access in logical decoding after error - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Invalid pointer access in logical decoding after error |
Date | |
Msg-id | CAD21AoCgqZ0BUpXjVY6tD1jSLtVSdWpG+LZyZimq4Uu3TymTAA@mail.gmail.com Whole thread Raw |
In response to | RE: Invalid pointer access in logical decoding after error ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
List | pgsql-hackers |
On Thu, Jul 3, 2025 at 7:55 AM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, 2 Jul 2025 at 13:21, Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > On Wed, Jul 2, 2025 at 2:42 PM vignesh C wrote: > > > > > > > > Hi, > > > > > > I encountered an invalid pointer access issue. Below are the steps to > > > reproduce the issue: > > ... > > > The error occurs because entry->columns is allocated in the entry > > > private context (entry->entry_cxt) by pub_collist_to_bitmapset(). This > > > context is a child of the PortalContext, which is cleared after an > > > error via: AbortTransaction > > > -> AtAbort_Portals -> > > > MemoryContextDeleteChildren -> MemoryContextDelete -> > > > MemoryContextDeleteOnly > > > As a result, the memory backing entry->columns is freed, but the > > > RelationSyncCache which resides in CacheMemoryContext and thus > > > survives the error still holds a dangling pointer to this freed > > > memory, causing it to pfree an invalid pointer. > > > In the normal (positive) execution flow, pgoutput_shutdown() is called > > > to clean up the RelationSyncCache. This happens via: > > > FreeDecodingContext -> shutdown_cb_wrapper -> pgoutput_shutdown But > > > this is not called in case of an error case. To handle this case > > > safely, I suggest calling FreeDecodingContext in the PG_CATCH block to > > > ensure pgoutput_shutdown is invoked and the stale cache is cleared appropriately. > > > Attached patch has the changes for the same. > > > Thoughts? > > > > Thank you for reporting the issue and providing a fix. > > > > I recall that we identified this general issue with the hash table in pgoutput > > in other threads as well [1]. The basic consensus [2] is that calling > > FreeDecodingContext() within PG_CATCH is not ideal, as this function includes > > user code, increasing the risk of encountering another error within PG_CATCH. > > This scenario could prevent execution of subsequent code to invalidate syscache > > entries, which is problematic. > > Yes, let's avoid this. > > > I think a better fix could be to introduce a memory context reset callback(on > > data->cachectx) and perform the actions of pgoutput_shutdown() within it. > > The attached v2 version patch has the changes for the same. We've addressed several memory-related issues in pgoutput. While most of these issues didn't affect logical replication, they did impact logical decoding called via SQL API. I find that these problems stem from RelationSyncCache being defined as a file-scope static variable and being allocated in CacheMemoryContext. I'm wondering if we could move it to PGOutputData and create it under the logical decoding context. This would ensure it's automatically cleaned up along with the logical decoding context. I also noticed another concerning issue: the entry->streamed_txns list is maintained in CacheMemoryContext (see set_schema_sent_in_streamed_txn()). This could lead to memory leaks when logical decoding called via the SQL API encounters an error. This issue isn't addressed in the current patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: