RE: memory leak in logical WAL sender with pgoutput's cachectx - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: memory leak in logical WAL sender with pgoutput's cachectx |
Date | |
Msg-id | OS0PR01MB5716C4A6AEC69BE733820AA49434A@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: memory leak in logical WAL sender with pgoutput's cachectx (Xuneng Zhou <xunengzhou@gmail.com>) |
List | pgsql-hackers |
On Friday, August 15, 2025 10:59 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > Thanks for your clarification! > > On Fri, Aug 15, 2025 at 10:10 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > On Thursday, August 14, 2025 8:49 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > > > Dear Xuneng, > > > > > > > Is it safe to free the substructure from within > rel_sync_cache_relation_cb()? > > > > > > You referred the comment in rel_sync_cache_relation_cb() right? I > > > understood like that we must not access to any *system caches*, from > > > the comment. Here we do not re-build caches so that we do not access > > > to the syscaches - it is permitted. > > > > We should also avoid freeing the cache within the callback function, > > as entries might still be accessed after the callback is invoked. This > > callback can be triggered during the execution of pgoutput_change, > > which poses a risk of concurrent access issues. For reference, see > > commit 7f481b8, which addresses a similar issue. Ideally, cache > > release should occur within the normal code execution path, such as within > pgoutput_change() or get_rel_sync_entry(). > > I am still getting familiar with this module, so I cannot make a sound > confirmation for this yet. I'll take a look at the commit mentioned by Zhijie and > trace the callback paths. > > > > > > > > I’ also interested in the reasoning behind setting > > > > NINVALIDATION_THRESHOLD to 100. > > > > > > This is the debatable point of this implementation. I set to 100 > > > because it was sufficient with the provided workload, but this may cause > the replication lag. > > > We may have to consider a benchmark workload, measure data, and > > > consider the appropriate value. 100 is just an initial point. > > > > I think it's worthwhile to test the performance impact of this > > approach. If the table is altered but not dropped, removing the > > entries could introduce additional overhead. Although this overhead > > might be negligible if within an acceptable threshold, it still > > warrants consideration. Additionally, if the number of invalidations > > is significant, the cleanup process may take a considerable amount of time, > potentially resulting in performance degradation. > > In such cases, it might be beneficial to consider destroying the hash > > table and creating a new one. Therefore, analyzing these scenarios is > essential. > > Intuitively, a heuristic threshold is straightforward but also can be hard to get it > "right" since the workload varies. > Designing and writing tests to reflect the real-world use cases well may not be > that easy as well. I'm wondering whether it's beneficial to brainstorm other > potential alternatives before settling on the current approach and start > benchmarking and performance tuning. > > Sequential cleaning could be costful in certain cases. I don't know much about > RelationSyncCache's life cycle, but it seems that we should be cautious on > shortening it without causing subtle issues. I don't have a great idea that can avoid introducing overhead in all cases. An alternative approach I considered before is deleting the hash entry upon invalidation if the entry is not in use. This would require adding an in_use flag in RelationSyncEntry, setting it to true when get_rel_sync_entry is called, and marking it as false once the entry is no longer in use. This approach is inspired from the relcache invalidation. I am slightly not sure if it's worth the effort, but just share this idea with a POC patch for reference, which has been modified based on Kuroda-San's version. Best Regards, Hou zj
Attachment
pgsql-hackers by date: