On Thu, Jan 6, 2022 at 3:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Commit 6ce16088b caused me to look at pgoutput.c's handling of
> cache invalidations, and I was pretty appalled by what I found.
>
> * rel_sync_cache_relation_cb does the wrong thing when called for
> a cache flush (i.e., relid == 0). Instead of invalidating all
> RelationSyncCache entries as it should, it will do nothing.
>
> * When rel_sync_cache_relation_cb does invalidate an entry,
> it immediately zaps the entry->map structure, even though that
> might still be in use (as per the adjacent comment that carefully
> explains why this isn't safe). I'm not sure if this could lead
> to a dangling-pointer core dump, but it sure seems like it could
> lead to failing to translate tuples that are about to be sent.
>
> * Similarly, rel_sync_cache_publication_cb is way too eager to
> reset the pubactions flags, which would likely lead to failing
> to transmit changes that we should transmit.
>
Are you planning to proceed with this patch? AFAICS, this is good to
go. Yesterday, while debugging/analyzing one cfbot failure[1] with one
of my colleagues for row filter patch [2], we have seen the problem
due to the exact reason (second reason) you outlined here. After using
your patch and adapting the row filter patch atop it we see problem
got fixed.
[1] - https://cirrus-ci.com/task/5450648090050560?logs=test_world#L3975
[2] - https://commitfest.postgresql.org/36/2906/
--
With Regards,
Amit Kapila.