Thread: Selectively invalidate caches in pgoutput module

Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

Hi, this is a fork thread from [1]. I want to propose a small optimization for
logical replication system.

Background
==========

When the ALTER PUBLICATION command is executed, all entries in RelationSyncCache
will be discarded anyway. This mechanism works well but is sometimes not efficient.
For example, when the ALTER PUBLICATION DROP TABLE is executed,
1) the specific entry in RelationSyncCache will be removed, and then
2) all entries will be discarded twice.

This happens because the pgoutput plugin registers both RelcacheCallback
(rel_sync_cache_relation_cb) and SyscacheCallback (publication_invalidation_cb,
rel_sync_cache_publication_cb). Then, when ALTER PUBLICATION ADD/SET/DROP is executed,
both the relation cache of added tables and the syscache of pg_publication_rel and
pg_publication are invalidated.
The callback for the relation cache will remove an entry from the hash table, and
syscache callbacks will look up all entries and invalidate them. However, AFAICS
does not need to invalidate all of them.

I grepped source codes and found this happens since the initial version.

Currently the effect of the behavior may not be large, but [1] may affect
significantly because it propagates invalidation messages to all in-progress
decoding transactions.

Patch overview
============

Based on the background, the patch avoids dropping all entries in RelationSyncCache
when ALTER PUBLICATION is executed. It removes sys cache callbacks for pg_publication_rel
and pg_publication_namespace and avoids discarding entries in sys cache for pg_publication.

Apart from the above, this patch also ensures that relcaches of publishing tables
are invalidated when ALTER PUBLICATION is executed. ADD/SET/DROP already has this
mechanism, but ALTER PUBLICATION OWNER TO and RENAME TO do not.
Regarding RENAME TO, now we are using a common function, but it is replaced with
RenamePublication() to do invalidations.

How do you think?

[1]: https://www.postgresql.org/message-id/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Mon, Mar 3, 2025 at 1:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Hi, this is a fork thread from [1]. I want to propose a small optimization for
> logical replication system.
>
> Background
> ==========
>
> When the ALTER PUBLICATION command is executed, all entries in RelationSyncCache
> will be discarded anyway. This mechanism works well but is sometimes not efficient.
> For example, when the ALTER PUBLICATION DROP TABLE is executed,
> 1) the specific entry in RelationSyncCache will be removed, and then
> 2) all entries will be discarded twice.
>

In (2) Why twice? Is it because we call
rel_sync_cache_publication_cb() first for PUBLICATIONRELMAP and then
for PUBLICATIONOID via publication_invalidation_cb()?

> This happens because the pgoutput plugin registers both RelcacheCallback
> (rel_sync_cache_relation_cb) and SyscacheCallback (publication_invalidation_cb,
> rel_sync_cache_publication_cb). Then, when ALTER PUBLICATION ADD/SET/DROP is executed,
> both the relation cache of added tables and the syscache of pg_publication_rel and
> pg_publication are invalidated.
> The callback for the relation cache will remove an entry from the hash table, and
> syscache callbacks will look up all entries and invalidate them. However, AFAICS
> does not need to invalidate all of them.
>

IIUC, you want to say in the above case only (1) would be sufficient, right?

> I grepped source codes and found this happens since the initial version.
>
> Currently the effect of the behavior may not be large,
>

Is it possible to see the impact? I guess if there are a large number
of relations (or partitions), then all will get invalidated even if
one relation is added/dropped from the publication; if so, this should
impact the performance.

>
 but [1] may affect
> significantly because it propagates invalidation messages to all in-progress
> decoding transactions.
>
> Patch overview
> ============
>
> Based on the background, the patch avoids dropping all entries in RelationSyncCache
> when ALTER PUBLICATION is executed. It removes sys cache callbacks for pg_publication_rel
> and pg_publication_namespace and avoids discarding entries in sys cache for pg_publication.
>
> Apart from the above, this patch also ensures that relcaches of publishing tables
> are invalidated when ALTER PUBLICATION is executed. ADD/SET/DROP already has this
> mechanism, but ALTER PUBLICATION OWNER TO and RENAME TO do not.
> Regarding RENAME TO, now we are using a common function, but it is replaced with
> RenamePublication() to do invalidations.
>

For Rename/Owner, can we use a new invalidation instead of using
relcache invalidation, as that can impact other sessions for no
benefit? IIUC, changing the other properties of publication like
dropping the table requires us to invalidate even the corresponding
relcache entry because it contains the publication descriptor
(rd_pubdesc). See RelationBuildPublicationDesc().

Also, it is better to consider doing rename/owner_change related
changes in a separate patch as that will make the first patch easier
to review and commit.

--
With Regards,
Amit Kapila.



RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> > When the ALTER PUBLICATION command is executed, all entries in
> RelationSyncCache
> > will be discarded anyway. This mechanism works well but is sometimes not
> efficient.
> > For example, when the ALTER PUBLICATION DROP TABLE is executed,
> > 1) the specific entry in RelationSyncCache will be removed, and then
> > 2) all entries will be discarded twice.
> >
> 
> In (2) Why twice? Is it because we call
> rel_sync_cache_publication_cb() first for PUBLICATIONRELMAP and then
> for PUBLICATIONOID via publication_invalidation_cb()?

This part was not correct. I considered that ALTER PUBLICATION DROP TABLE also
invalidated the pg_publication syscache, but it would not do. So...the command
firstly invalidate a specific entry, and then invalidate all entries once.

> > This happens because the pgoutput plugin registers both RelcacheCallback
> > (rel_sync_cache_relation_cb) and SyscacheCallback
> (publication_invalidation_cb,
> > rel_sync_cache_publication_cb). Then, when ALTER PUBLICATION
> ADD/SET/DROP is executed,
> > both the relation cache of added tables and the syscache of pg_publication_rel
> and
> > pg_publication are invalidated.
> > The callback for the relation cache will remove an entry from the hash table, and
> > syscache callbacks will look up all entries and invalidate them. However,
> AFAICS
> > does not need to invalidate all of them.
> >
> 
> IIUC, you want to say in the above case only (1) would be sufficient, right?

Right, this is a main aim of this proposal.

> Is it possible to see the impact? I guess if there are a large number
> of relations (or partitions), then all will get invalidated even if
> one relation is added/dropped from the publication; if so, this should
> impact the performance.

Agreed. I'm planning to do and share results. Please wait...

> For Rename/Owner, can we use a new invalidation instead of using
> relcache invalidation, as that can impact other sessions for no
> benefit? IIUC, changing the other properties of publication like
> dropping the table requires us to invalidate even the corresponding
> relcache entry because it contains the publication descriptor
> (rd_pubdesc). See RelationBuildPublicationDesc().

Attached patchset implemented with the approach.
0001 contains only part to avoid whole cache invalidation, for ADD/SET/DROP command.
0002 introduces new type of invalidation message which only invalidates caches on
the decoding plugin. Better name is very welcome.
0003 uses the mechanism to avoid discarding relcache for RENAME/OWNER case.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Zhijie Hou (Fujitsu)"
Date:
On Tuesday, March 4, 2025 7:44 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:

Hi,

> 
> > > When the ALTER PUBLICATION command is executed, all entries in
> > RelationSyncCache
> > > will be discarded anyway. This mechanism works well but is sometimes
> > > not
> > efficient.
> > > For example, when the ALTER PUBLICATION DROP TABLE is executed,
> > > 1) the specific entry in RelationSyncCache will be removed, and then
> > > 2) all entries will be discarded twice.
> > >
> >
> > In (2) Why twice? Is it because we call
> > rel_sync_cache_publication_cb() first for PUBLICATIONRELMAP and then
> > for PUBLICATIONOID via publication_invalidation_cb()?
> 
> This part was not correct. I considered that ALTER PUBLICATION DROP TABLE
> also invalidated the pg_publication syscache, but it would not do. So...the
> command firstly invalidate a specific entry, and then invalidate all entries once.
> 
> > > This happens because the pgoutput plugin registers both
> > > RelcacheCallback
> > > (rel_sync_cache_relation_cb) and SyscacheCallback
> > (publication_invalidation_cb,
> > > rel_sync_cache_publication_cb). Then, when ALTER PUBLICATION
> > ADD/SET/DROP is executed,
> > > both the relation cache of added tables and the syscache of
> > > pg_publication_rel
> > and
> > > pg_publication are invalidated.
> > > The callback for the relation cache will remove an entry from the
> > > hash table, and syscache callbacks will look up all entries and
> > > invalidate them. However,
> > AFAICS
> > > does not need to invalidate all of them.
> > >
> >
> > IIUC, you want to say in the above case only (1) would be sufficient, right?
> 
> Right, this is a main aim of this proposal.
> 
> > Is it possible to see the impact? I guess if there are a large number
> > of relations (or partitions), then all will get invalidated even if
> > one relation is added/dropped from the publication; if so, this should
> > impact the performance.
> 
> Agreed. I'm planning to do and share results. Please wait...
> 
> > For Rename/Owner, can we use a new invalidation instead of using
> > relcache invalidation, as that can impact other sessions for no
> > benefit? IIUC, changing the other properties of publication like
> > dropping the table requires us to invalidate even the corresponding
> > relcache entry because it contains the publication descriptor
> > (rd_pubdesc). See RelationBuildPublicationDesc().
> 
> Attached patchset implemented with the approach.
> 0001 contains only part to avoid whole cache invalidation, for ADD/SET/DROP
> command.

I think the changes proposed in 0001 are reasonable. Basically, any
modifications to pg_publication_rel or pg_publication_namespace should trigger
relcache invalidation messages. This is necessary for rebuilding the cache
stored in RelationData::rd_pubdesc, given that these changes could impact
whether a table is published. Following this logic, it should be safe and
appropriate to depend on individual relcache invalidation messages to
invalidate the relsynccache in pgoutput, rather than invalidating the entire
relsynccache for a single catalog change.

Additionally, I've verified the scenario involving partitioned tables to ensure
the invalidation behaves correctly. When a parent table is added to a
publication (with pubviaroot=true), it should influence the publication status
of its child tables as well. And this case works as expected, since
invalidation messages are dispatched for all items in the partition tree
(GetPubPartitionOptionRelations()).

One nitpick for 0001: the comments atop of the rel_sync_cache_publication_cb()
need updating, as they still reference pg_publication_rel and
pg_publication_namespace. Aside from this, 0001 looks good to me.

Best Regards,
Hou zj

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Hou,

Thanks for reading the thread!

> > Attached patchset implemented with the approach.
> > 0001 contains only part to avoid whole cache invalidation, for ADD/SET/DROP
> > command.
> 
> I think the changes proposed in 0001 are reasonable. Basically, any
> modifications to pg_publication_rel or pg_publication_namespace should trigger
> relcache invalidation messages. This is necessary for rebuilding the cache
> stored in RelationData::rd_pubdesc, given that these changes could impact
> whether a table is published. Following this logic, it should be safe and
> appropriate to depend on individual relcache invalidation messages to
> invalidate the relsynccache in pgoutput, rather than invalidating the entire
> relsynccache for a single catalog change.

Right. I should have clarified these points on my previous post.

> One nitpick for 0001: the comments atop of the rel_sync_cache_publication_cb()
> need updating, as they still reference pg_publication_rel and
> pg_publication_namespace. Aside from this, 0001 looks good to me.

Thanks for pointing out, fixed. PSA new version.

Additionally, I found a feasibility that discarding whole-cache can be avoided
more. Currently, all caches are unconditionally removed when pg_namesace is updated.
Invalidation is needed to update the qualified name of tables in the replication
messages after the schema renames, but IIUC it is OK to do for relations that
belong to the renamed schema.

IIUC there are two approaches to implement it.
The first idea is to introduce a new type of invalidation message. This is similar
to what 0002 does. The invalidation message can contain relids that belong to
the renaming schema, and pgoutput can register callback for it.
The second idea is to use syscache callback. IIUC, the callback is passed the
hash of oid. Thus, RelSyncCache can contain hash values of its schema, and they
can be compared with a callback function.

I noted the idea on the code as XXX code comment in 0001.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

I did a performance testing with HEAD and v2-0001, and confirmed that it could
improve performance around 20% in the typical case.

Workload
======
I emulated a scenario that there are many tables to be published and only one
table is stop and resume publishing. In HEAD, ALTER PUBLICATION DROP/ADD command
invalidates all entries in the RelSyncCache, but v2 invalidates only a needed
one - this means the decoding time would be reduced after patching.

1. Initialized an instance
2. Created a root table and 5000 leaf tables.
3. Created another table which did not have parent-child relationship.
4. Created a publication which included a root table and another table
5. Created a replication slot with pgoutput plugin.
6. Executed a transaction which would be decoded. In the transaction:
    a. Inserted tuples to all the leaf tables
    b. Altered publication to drop another table
    c. Altered publication again to add the dropped one
    d. Inserted tuples to all the leaf tables again.
7. measured decoding time via SQL interfaces, five times.

Attached script automated above.

Results
=====
Each cell is a median value of 5 runs. Compared with HEAD, V2-0001 can reduce
the decoding time, relatively 20%.

head [ms]    v2-0001 [ms]
252        198

I'm planning to do further tests to prove the benefit for 0002, 0003 patches.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Wed, Mar 5, 2025 at 2:35 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> I did a performance testing with HEAD and v2-0001, and confirmed that it could
> improve performance around 20% in the typical case.
>
> Workload
> ======
> I emulated a scenario that there are many tables to be published and only one
> table is stop and resume publishing. In HEAD, ALTER PUBLICATION DROP/ADD command
> invalidates all entries in the RelSyncCache, but v2 invalidates only a needed
> one - this means the decoding time would be reduced after patching.
>
> 1. Initialized an instance
> 2. Created a root table and 5000 leaf tables.
> 3. Created another table which did not have parent-child relationship.
> 4. Created a publication which included a root table and another table
> 5. Created a replication slot with pgoutput plugin.
> 6. Executed a transaction which would be decoded. In the transaction:
>         a. Inserted tuples to all the leaf tables
>         b. Altered publication to drop another table
>         c. Altered publication again to add the dropped one
>         d. Inserted tuples to all the leaf tables again.
> 7. measured decoding time via SQL interfaces, five times.
>
> Attached script automated above.
>
> Results
> =====
> Each cell is a median value of 5 runs. Compared with HEAD, V2-0001 can reduce
> the decoding time, relatively 20%.
>

Yeah, this is a good improvement and the patch looks safe to me, so I
pushed with minor changes in the comments.

..
>
> I'm planning to do further tests to prove the benefit for 0002, 0003 patches.
>

Ideally, we need the time of only step-6d with and without the patch.
Step 6-a is required to build the cache, but in actual workloads via
walsender, we won't need that, and also, at the end of SQL API
execution, we will forcibly invalidate all caches, so that is also not
required. See, if in the performance measurement of the next set of
patches, you can avoid that.

--
With Regards,
Amit Kapila.



RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, hackers,

> Yeah, this is a good improvement and the patch looks safe to me, so I
> pushed with minor changes in the comments.

Thanks! PSA rebased patch. While considering more about 0002, I found a
conceptual bug - when relsync cache could not be re-built when SQL interfaces are
used. In old vesrion inval messages were not recorded on the WAL, so relsync
cache could not be discarded if decoding happed later.
In new version, the invalidation is handled as transactional and serialized to
the WAL. More detail, messages are stored in InvalMessageArrays and consumed at
the end of transactions. For simplfication, the subgroup of the message is set
as "relcache" when serializing. But IIUC the handling is OK - snapshot
invalidation does the same approach.

I did a performance testing with given patch. ()
Mostly same as previous test but used two publications.

1. Initialize an instance
2. Create a root table and leaf tables. The variable $NUM_PART controls
  how many partitions exist on the instance.
3. Create another table
4. Create publications (p1, p2). One includes a root table, and another one includes
  independent table.
5. Create a replication slot with pgoutput plugint.
6. Execute a transaction which would be decoded. In the transaction:
              a. Insert tuples to all the leaf tables
              b. Rename publication p2 to p2_renamed
              c. Rename publication p2_renamed to p2
              d. Insert tuples to all the leaf tables again.
7. decode all the changes via SQL interfaces.

My expectation is for HEAD, leaves are cached at a), but they would be
discarded at b) and c), then they are cached again at d).
For patched case, it only an independent table would be discarded at b) and c) so the
decoding time should be reduced.

Below results are the median of five runs. ITSM around 15% improvements.

head [ms]    patched (0001-0003)
239        200

> Ideally, we need the time of only step-6d with and without the patch.
> Step 6-a is required to build the cache, but in actual workloads via
> walsender, we won't need that, and also, at the end of SQL API
> execution, we will forcibly invalidate all caches, so that is also not
> required. See, if in the performance measurement of the next set of
> patches, you can avoid that.

Hmm. It requires additional debug messages. Let me consider.

Best regards,
Hayato Kuroda
FUJITSU LIMITED 


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Thu, Mar 6, 2025 at 5:16 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit, hackers,
>
> > Yeah, this is a good improvement and the patch looks safe to me, so I
> > pushed with minor changes in the comments.
>
> Thanks! PSA rebased patch.
>

Few comments:
1. Why do we need to invalidate relsync entries when owner of its
publication changes?

I think the owner change will impact the future Alter Publication ...
Add/Drop/Set/Rename operations as that will be allowed only to new
owner (or super users), otherwise, there shouldn't be an impact on
RelSyncCache entries. Am, I missing something?

2.
+ if (pubform->puballtables)
+ {
+ CacheInvalidateRelSyncAll();
+ }
+ else
+ {
+ List    *relids = NIL;
+ List    *schemarelids = NIL;
+
+ /*
+ * For partition table, when we insert data, get_rel_sync_entry is
+ * called and a hash entry is created for the corresponding leaf table.
+ * So invalidating the leaf nodes would be sufficient here.
+ */
+ relids = GetPublicationRelations(pubform->oid,
+ PUBLICATION_PART_LEAF);
+ schemarelids = GetAllSchemaPublicationRelations(pubform->oid,
+ PUBLICATION_PART_LEAF);
+
+ relids = list_concat_unique_oid(relids, schemarelids);
+
+ InvalidateRelSyncCaches(relids);
+ }
+
+ CatalogTupleUpdate(rel, &tup->t_self, tup);

Shouldn't we need to update the CatalogTuple before invalidations.

3.
+ if (pubform->puballtables)
+ {
+ CacheInvalidateRelSyncAll();
+ }
+ else
+ {
+ List    *relids = NIL;
+ List    *schemarelids = NIL;
+
+ /*
+ * For partition table, when we insert data, get_rel_sync_entry is
+ * called and a hash entry is created for the corresponding leaf table.
+ * So invalidating the leaf nodes would be sufficient here.
+ */
+ relids = GetPublicationRelations(pubform->oid,
+ PUBLICATION_PART_LEAF);
+ schemarelids = GetAllSchemaPublicationRelations(pubform->oid,
+ PUBLICATION_PART_LEAF);
+
+ relids = list_concat_unique_oid(relids, schemarelids);
+
+ InvalidateRelSyncCaches(relids);
+ }

This is a duplicate code. Can we write a function to eliminate this duplicacy?

--
With Regards,
Amit Kapila.



RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

> Hmm. It requires additional debug messages. Let me consider.

I did further performance testing to see the re-building time, and confirmed that
the decoding time of second INSERT part became around 1/10. In this test number
of partitions was also varied. Each cell is a median of 5 runs.

number of partitions    HEAD [micro second]    v4 patched [micro second]
--------------------------------------------------------------------------------
1000            6030            635
5000            33456            3555
15000            106688            14049

Workload
======
Workload is mostly the same, but I ensured that step6 is done within a same transaction.
I also attached just in case.

Used source
========
Attached diff file was applied atop HEAD (d611f8) and v4 patchset. Patched source outputs
current timestamp (in micro seconds) when 1) INSERT is decoded, 2) invalidation
happens or 3) WAL processing is finished for the transaction.
Also, it has a ratchet mechanism not to log every time: continuous INSERT or
INVALIDATION is ignored. Typically, the output is like below:

```
postgres=# SELECT count(*) FROM pg_logical_slot_peek_binary_changes('test', NULL, NULL, 'proto_version', '4',
'publication_names','pub1');
 
NOTICE:  XXX start inserting. Time 794645780406191
NOTICE:  XXX start invalidating. Time 794645780407423
NOTICE:  XXX start inserting. Time 794645780407448       // (a)
NOTICE:  XXX finish processing. Time 794645780407488    // (b)
count 
-------
     7
(1 row)
```

How I compared
===========
We can obtain the elapsed time for second INSERT part, by (b) - (a).
I took the calculated data for each runs and noted a median in above table.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

Thanks for the comment! PSA new version.

> 
> Few comments:
> 1. Why do we need to invalidate relsync entries when owner of its
> publication changes?
> 
> I think the owner change will impact the future Alter Publication ...
> Add/Drop/Set/Rename operations as that will be allowed only to new
> owner (or super users), otherwise, there shouldn't be an impact on
> RelSyncCache entries. Am, I missing something?

Actually, I did not find reasons to invalidate even OWNER command for now. I
included it to 1) follow current rule, and to 2) prepare for future update.
 
1) - Currently RelSyncCache entries are discarded even by ALTER PUBLICATION OWNER
     statements. I wanted to preserve it.
2) - In future versions, privilege might be introduced for the publications,
     some publications would not be visible for other db users. In this case
     I feel we should modify RelationSyncEntry::pubactions,  columns and
     exprstate thus entries must be discarded.

Now I removed invalidations for OWNER command. Let's revert the change if we miss
something.

> 2.
> + if (pubform->puballtables)
> + {
> + CacheInvalidateRelSyncAll();
> + }
> + else
> + {
> + List    *relids = NIL;
> + List    *schemarelids = NIL;
> +
> + /*
> + * For partition table, when we insert data, get_rel_sync_entry is
> + * called and a hash entry is created for the corresponding leaf table.
> + * So invalidating the leaf nodes would be sufficient here.
> + */
> + relids = GetPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> + schemarelids = GetAllSchemaPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> +
> + relids = list_concat_unique_oid(relids, schemarelids);
> +
> + InvalidateRelSyncCaches(relids);
> + }
> +
> + CatalogTupleUpdate(rel, &tup->t_self, tup);
> 
> Shouldn't we need to update the CatalogTuple before invalidations.

Right, fixed.

> 3.
> + if (pubform->puballtables)
> + {
> + CacheInvalidateRelSyncAll();
> + }
> + else
> + {
> + List    *relids = NIL;
> + List    *schemarelids = NIL;
> +
> + /*
> + * For partition table, when we insert data, get_rel_sync_entry is
> + * called and a hash entry is created for the corresponding leaf table.
> + * So invalidating the leaf nodes would be sufficient here.
> + */
> + relids = GetPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> + schemarelids = GetAllSchemaPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> +
> + relids = list_concat_unique_oid(relids, schemarelids);
> +
> + InvalidateRelSyncCaches(relids);
> + }
> 
> This is a duplicate code. Can we write a function to eliminate this duplicacy?

Since the part has been removed from OWNER command, duplicacy was removed.
I did not introduce a function for this.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Fri, Mar 7, 2025 at 4:28 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit,
>
> Thanks for the comment! PSA new version.
>

Don't we need to call this invalidation function from
InvalidateSystemCachesExtended()?

--
With Regards,
Amit Kapila.



RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> Don't we need to call this invalidation function from
> InvalidateSystemCachesExtended()?

Hmm. I did not call relsync callback functions in InvalidateSystemCachesExtended()
intentionally, but added now. Please see my analysis below and let me know how
you think.

In the first place, InvalidateSystemCachesExtended() can be called by
1) InvalidateSystemCaches(), and 2) AcceptInvalidationMessages().

Regarding the 1), it could be used when a) the parallel worker is launched,
b) decoder starts decoding, or c) decoder finishes decoding and after decoding context is free'd.
However, parallel worker won't do a decoding, initially the relsync cache is empty
and relsync would be dropped when the decding context is removed.
Based on the fact I did not called the function.

As for the 2), the path exists only when the debug build mode and debug_discard_caches is set.
According to the manual and code comments, it must discard all caches anytime.
So...OK, I decided to follow the rule.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
I found cfbot got angry due to a variable-shadowing. PSA fixed version.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers,

I did a self-reviewing and updated a patch. PSA new version. What's new:

1. Fixed a bug which existing name can be specified by ALTER PUBLICATION RENAME.
   A validation is added in RenamePublication().

Just in case, we want to discuss the case that the renaming publication has thousands
of tables. Before the patch, The ALTER PUBLICATION RENAME command just invalidates
a syscache, and registered callback would invalidate all of them.
With the patch, however, all relsync entries are invalidated within the command
so that the execution time can be longer. I feel this can be acceptable because
some codes have such a part; E.g., ATTACH/DETACH partition command invalidates
relaches of the children when the specified relation also has children - even
if it has thousands of leaves. (See d6f1e16). Also note that all other processes
like walsender can be faster by avoiding discarding entire caches.

how do you feel?

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Zhijie Hou (Fujitsu)"
Date:
On Monday, March 10, 2025 7:00 PM Kuroda, Hayato <kuroda.hayato@fujitsu.com> wrote:
> 
> I did a self-reviewing and updated a patch. PSA new version. What's new:

Thanks for updating the patch.

I tested the behavior for partitioned table and have a comment on this.

> +        relids = GetPublicationRelations(pubform->oid,
> +                                         PUBLICATION_PART_LEAF);

Currently, only the leaf partition is invalidated when the published table is
partitioned. However, I think pgoutput could cache both the partitioned table
and the leaf partition table as relsync entries.

For INSERT/UPDATE/DELETE on a partitioned table, only the leaf partition's
relsync entry is used in pgoutput, but the TRUNCATE references the parent
table's relsync entry.

For example[1], if the parent table's relsync entry is not invalidated after a
RENAME operation, it results in the TRUNCATE to be missed. So I think we should
Invalidate all the tables in the partition tree by passing
PUBLICATION_PART_ALL, which is also consistent with ALTER PUB ADD/SET/DROP
TABLE.


[1]--- Example ---
create table test(a int primary key) PARTITION BY RANGE (a);
CREATE TABLE test_1 PARTITION OF test
    FOR VALUES FROM (0) TO (2);
CREATE TABLE test_2 PARTITION OF test
    FOR VALUES FROM (2) TO (4);

CREATE PUBLICATION pub;
CREATE PUBLICATION pub2 FOR TABLE test WITH (PUBLISH_VIA_PARTITION_ROOT);

SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput');

TRUNCATE test;
ALTER PUBLICATION pub RENAME TO pub3;
ALTER PUBLICATION pub2 RENAME TO pub;
TRUNCATE test;

-- I can consume some changes using the following function on HEAD, but got
-- nothing after applying the patch.
SELECT * FROM pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '4',
'publication_names','pub', 'streaming', 'on');
 

---

Best Regards,
Hou zj

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Hou,

> Currently, only the leaf partition is invalidated when the published table is
> partitioned. However, I think pgoutput could cache both the partitioned table
> and the leaf partition table as relsync entries.
> 
> For INSERT/UPDATE/DELETE on a partitioned table, only the leaf partition's
> relsync entry is used in pgoutput, but the TRUNCATE references the parent
> table's relsync entry.

I think your analysis is correct. PSA new version. Below part contains my analysis.

In ExecuteTruncate(), if the specified relation has children, all of them are
checked via find_all_inheritors() and listed as target. Also, ExecuteTruncateGuts()
serializes both a parent and children in XLOG_HEAP_TRUNCATE WAL record.
Decoding layer passes relations as-is. These facts mean that output plugins can
store caches on the memory.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Mon, Mar 10, 2025 at 6:42 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > Currently, only the leaf partition is invalidated when the published table is
> > partitioned. However, I think pgoutput could cache both the partitioned table
> > and the leaf partition table as relsync entries.
> >
> > For INSERT/UPDATE/DELETE on a partitioned table, only the leaf partition's
> > relsync entry is used in pgoutput, but the TRUNCATE references the parent
> > table's relsync entry.
>
> I think your analysis is correct. PSA new version. Below part contains my analysis.
>

I have made several cosmetic changes atop 0001 patch in the attached.
Additionally, fixed an issue in AddRelsyncInvalidationMessage() to
consider invalidating all the RelSyncCache entries. Kindly include
these in the next version if you find the changes are okay.

--
With Regards,
Amit Kapila.

Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Zhijie Hou (Fujitsu)"
Date:
On Monday, March 10, 2025 9:12 PM Kuroda, Hayato <kuroda.hayato@fujitsu.com> wrote:
> 
> > Currently, only the leaf partition is invalidated when the published
> > table is partitioned. However, I think pgoutput could cache both the
> > partitioned table and the leaf partition table as relsync entries.
> >
> > For INSERT/UPDATE/DELETE on a partitioned table, only the leaf
> > partition's relsync entry is used in pgoutput, but the TRUNCATE
> > references the parent table's relsync entry.
> 
> I think your analysis is correct. PSA new version. Below part contains my
> analysis.
> 
> In ExecuteTruncate(), if the specified relation has children, all of them are
> checked via find_all_inheritors() and listed as target. Also,
> ExecuteTruncateGuts() serializes both a parent and children in
> XLOG_HEAP_TRUNCATE WAL record.
> Decoding layer passes relations as-is. These facts mean that output plugins
> can store caches on the memory.

Thanks for updating the patch.

I have reviewed patch 0001 and did not find issues, aside from a few issues for
code comments that were mentioned in Amit's email. Here are some analyses and
insights gathered during the review of 0001:

The functions and variables in the 0001 patch uses 'Relsync' (e.g.,
RegisterRelsyncInvalidation) instead of the longer 'RelsyncCache'. After
internal discussions, we think it's OK, as using 'RelsyncCache' could
unnecessarily lengthen the names.

Furthermore, considering we're introducing a new invalidation message for the
RelationSyncCache within pgoutput, which is an output plugin, we discussed
whether a more general invalidation name should be adopted in case other output
plugins might use it. However, after reviewing all the plugins listed in
Wiki[1], we did not find any other plugins that reference the built-in
publication catalog. Therefore, this scenario appears to be uncommon.
Additionally, the current naming convention is sufficiently intuitive for
output plugin developers. Hence, we feel it is reasonable to retain the
existing names.


For patch 0002, I think the implementation could be improved. The
current patch introduces a new function, RenamePublication, to replace the
existing generic approach in ExecRenameStmt->AlterObjectRename_internal.
However, this creates inconsistency because the original code uses
AccessExclusiveLock for publication renaming, making concurrent renaming
impossible. The current patch seems to overlook this aspect.

Additionally, introducing a new function results in code duplication, which can
be avoided. After further consideration, handling the publication rename
separately seems unnecessary, given it requires only sending a few extra
invalidation messages. Therefore, I suggest reusing the generic handling and
simply extending AlterObjectRename_internal to include the additional
invalidation messages.

I've attached a diff with the proposed changes for 0002.

Best Regards,
Hou zj

Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Hou,

Thanks for giving comments!

> For patch 0002, I think the implementation could be improved. The
> current patch introduces a new function, RenamePublication, to replace the
> existing generic approach in ExecRenameStmt->AlterObjectRename_internal.
> However, this creates inconsistency because the original code uses
> AccessExclusiveLock for publication renaming, making concurrent renaming
> impossible. The current patch seems to overlook this aspect.

Oh, I missed the point, thanks.

> Additionally, introducing a new function results in code duplication, which can
> be avoided. After further consideration, handling the publication rename
> separately seems unnecessary, given it requires only sending a few extra
> invalidation messages. Therefore, I suggest reusing the generic handling and
> simply extending AlterObjectRename_internal to include the additional
> invalidation messages.
> 
> I've attached a diff with the proposed changes for 0002.

Hmm, possible. previously I introduced new function to preserve existing codes as
much as possible. But this introduced some duplicy. Your approach which extends
the common function looks nice, apart from two points;

1. Relsync cache invalidations should be done after the catalog update, but
   proposed one did before that. I preferred to add new if-statementfor post catalog-update.
2. Also, common check for the name conflict can be reused.


Attached patch address comments by you and Amit [1]. What's new;

* Skip adding new relsync messages when message for InvalidOid has already exist.
  Per comment from Amit [1].
* Update some code comments. Some of them are pointed out by [1].
* Stop using RenamePublication() and extend the common function.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Tue, Mar 11, 2025 at 5:47 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Attached patch address comments by you and Amit [1]. What's new;
>

Thanks, the patch looks mostly good to me. I have made few cosmetic
changes in the attached and combined both the patches. The existing
Alter Publication ... Rename tests don't test invalidations arising
from that command. As this patch changes that code path, it would be
good to add a few tests for the same. We can add one for individual
relations and another for ALL Tables publication.

--
With Regards,
Amit Kapila.

Attachment

RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> Thanks, the patch looks mostly good to me. I have made few cosmetic
> changes in the attached and combined both the patches.

Thanks, it looks good to me.

> The existing
> Alter Publication ... Rename tests don't test invalidations arising
> from that command. As this patch changes that code path, it would be
> good to add a few tests for the same. We can add one for individual
> relations and another for ALL Tables publication.

I created new patch which adds a test code. I added in 007_ddl.pl, but I feel
it is OK to introduce new test file. How do you think?

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Wed, Mar 12, 2025 at 5:46 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > Thanks, the patch looks mostly good to me. I have made few cosmetic
> > changes in the attached and combined both the patches.
>
> Thanks, it looks good to me.
>
> > The existing
> > Alter Publication ... Rename tests don't test invalidations arising
> > from that command. As this patch changes that code path, it would be
> > good to add a few tests for the same. We can add one for individual
> > relations and another for ALL Tables publication.
>
> I created new patch which adds a test code.
>

Thanks. I have pushed the patch after minor changes in the test.

--
With Regards,
Amit Kapila.



Re: Selectively invalidate caches in pgoutput module

From
Andres Freund
Date:
Hi,

On 2025-03-13 14:13:39 +0530, Amit Kapila wrote:
> On Wed, Mar 12, 2025 at 5:46 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > Thanks, the patch looks mostly good to me. I have made few cosmetic
> > > changes in the attached and combined both the patches.
> >
> > Thanks, it looks good to me.
> >
> > > The existing
> > > Alter Publication ... Rename tests don't test invalidations arising
> > > from that command. As this patch changes that code path, it would be
> > > good to add a few tests for the same. We can add one for individual
> > > relations and another for ALL Tables publication.
> >
> > I created new patch which adds a test code.
> >
>
> Thanks. I have pushed the patch after minor changes in the test.

I think the new tests just failed in CI:
https://cirrus-ci.com/task/5602950271205376?logs=test_world#L268

[13:34:38.562] ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
[13:34:38.562] stderr:
[13:34:38.562] #   Failed test 'check replication worked well before renaming a publication'
[13:34:38.562] #   at /tmp/cirrus-ci-build/src/test/subscription/t/007_ddl.pl line 93.
[13:34:38.562] #          got: ''
[13:34:38.562] #     expected: '1'
[13:34:38.562] #   Failed test 'check the tuple inserted after the RENAME was not replicated'
[13:34:38.562] #   at /tmp/cirrus-ci-build/src/test/subscription/t/007_ddl.pl line 110.
[13:34:38.562] #          got: ''
[13:34:38.562] #     expected: '1'
[13:34:38.562] # Looks like you failed 2 tests of 8.
[13:34:38.562]
[13:34:38.562] (test program exited with status code 2)
[13:34:38.562] ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
[13:34:38.562]

Greetings,

Andres Freund



RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Andres,

> I think the new tests just failed in CI:
> https://cirrus-ci.com/task/5602950271205376?logs=test_world#L268

Thanks for reporting, I'll look into it.

Best regards,
Hayato Kuroda
FUJITSU LIMITED 


Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Fri, Mar 28, 2025 at 6:03 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Andres,
>
> > I think the new tests just failed in CI:
> > https://cirrus-ci.com/task/5602950271205376?logs=test_world#L268
>
> Thanks for reporting, I'll look into it.
>

The problem here is that after ALTER SUBSCRIPTION tap_sub SET
PUBLICATION ..., we didn't wait for the new walsender on publisher to
start. We must use wait_for_subscription_sync both after the "CREATE
SUBSCRIPTION ..." and the "ALTER SUBSCRIPTION ..." commands and keep
copy_data=true to ensure the initial replication is setup between
publisher and subscriber. This is how we use these commands at other
places.

--
With Regards,
Amit Kapila.



RE: Selectively invalidate caches in pgoutput module

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> The problem here is that after ALTER SUBSCRIPTION tap_sub SET
> PUBLICATION ..., we didn't wait for the new walsender on publisher to
> start. We must use wait_for_subscription_sync both after the "CREATE
> SUBSCRIPTION ..." and the "ALTER SUBSCRIPTION ..." commands and keep
> copy_data=true to ensure the initial replication is setup between
> publisher and subscriber. This is how we use these commands at other
> places.

Agreed. PSA the patch to fix the issue.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Selectively invalidate caches in pgoutput module

From
Amit Kapila
Date:
On Fri, Mar 28, 2025 at 10:46 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > The problem here is that after ALTER SUBSCRIPTION tap_sub SET
> > PUBLICATION ..., we didn't wait for the new walsender on publisher to
> > start. We must use wait_for_subscription_sync both after the "CREATE
> > SUBSCRIPTION ..." and the "ALTER SUBSCRIPTION ..." commands and keep
> > copy_data=true to ensure the initial replication is setup between
> > publisher and subscriber. This is how we use these commands at other
> > places.
>
> Agreed. PSA the patch to fix the issue.
>

Pushed after slight modification.

--
With Regards,
Amit Kapila.