Re: Excessive LOG messages from replication slot sync worker - Mailing list pgsql-hackers

From Japin Li
Subject Re: Excessive LOG messages from replication slot sync worker
Date
Msg-id ME0P300MB04452400335D1F925CCDC865B62BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: Excessive LOG messages from replication slot sync worker  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Excessive LOG messages from replication slot sync worker
List pgsql-hackers
On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote:
> On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > Hi,
> >
> > While testing the replication slot synchronization feature, I set up
> > three instances on the same machine:
> >
> > - Physical replication primary (also Logical replication publisher)
> > - Physical replication standby
> > - Logical replication subscriber
> >
> > ----------------------------------------
> > initdb -D pri --encoding=UTF8 --locale=C
> > cat <<EOF >> pri/postgresql.conf
> > wal_level = logical
> > log_line_prefix = '%m [pri][%b] '
> > logging_collector = on
> > EOF
> > pg_ctl -D pri start
> > pgbench -i
> > cat <<EOF | psql
> > CREATE PUBLICATION mypub FOR ALL TABLES;
> > SELECT pg_create_physical_replication_slot('pslot');
> > EOF
> > cat <<EOF >> pri/postgresql.conf
> > synchronized_standby_slots = 'pslot'
> > EOF
> > pg_ctl -D pri reload
> >
> > pg_basebackup -D sby -c fast
> > touch sby/standby.signal
> > cat <<EOF >> sby/postgresql.conf
> > port = 5433
> > primary_conninfo = 'host=localhost port=5432 dbname=postgres'
> > primary_slot_name = 'pslot'
> > sync_replication_slots = on
> > hot_standby_feedback = on
> > log_line_prefix = '%m [sby][%b] '
> > EOF
> > pg_ctl -D sby start
> >
> > initdb -D sub --encoding=UTF8 --locale=C
> > cat <<EOF >> sub/postgresql.conf
> > port = 5434
> > wal_level = logical
> > log_line_prefix = '%m [sub][%b] '
> > logging_collector = on
> > EOF
> > pg_ctl -D sub start
> > pgbench -i -Itp -p 5434
> > cat <<EOF | psql -p 5434
> > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
> > dbname=postgres' PUBLICATION mypub WITH (failover);
> > EOF
> > ----------------------------------------
> >
> >
> > I noticed that the replication slot sync worker on the standby generates
> > a large number of LOG messages, about two per 200–300 ms, during
> > pgbench runs (pgbench -c 10 -j 10 -T 60). For example,
> >
> > ----------------------------------------
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/05111420, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > ----------------------------------------
> >
> >
> >  Is this expected behavior, a configuration mistake on my side, or perhaps
> >  an oversight in the commit of the replication slot synchronization feature?
> >
> > This seems more like debug output left over from development rather than
> > something useful for users. If this is true, would it make sense to change
> > these messages from LOG to DEBUG?
> >
> 
> Thank You for reporting the issue.
> 
> It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
> SnapBuildSnapshotExists(), altering the format used for snapshot file
> names during the search. However, SnapBuildSerialize still uses the
> old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
> unable to locate existing snapshot files. Reverting the format
> resolved the issue for me. Could you please check if the attached
> patch works for you as well?
> 
> [1]:
> SnapBuildSnapshotExists()
> - sprintf(path, "%s/%X-%X.snap",
> + sprintf(path, "%s/%08X-%08X.snap",
> 

Sorry, it's my fault.  I forgot to update the format in SnapBuildSerialize().

I'd prefer to update the format in SnapBuildSerialize() instead of reverting
this change.

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8532bfd27e5..db7051170fc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1529,7 +1529,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
      * unless the user used pg_resetwal or similar. If a user did so, there's
      * no hope continuing to decode anyway.
      */
-    sprintf(path, "%s/%X-%X.snap",
+    sprintf(path, "%s/%08X-%08X.snap",
             PG_LOGICAL_SNAPSHOTS_DIR,
             LSN_FORMAT_ARGS(lsn));

@@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
     elog(DEBUG1, "serializing snapshot to %s", path);

     /* to make sure only we will write to this tempfile, include pid */
-    sprintf(tmppath, "%s/%X-%X.snap.%d.tmp",
+    sprintf(tmppath, "%s/%08X-%08X.snap.%d.tmp",
             PG_LOGICAL_SNAPSHOTS_DIR,
             LSN_FORMAT_ARGS(lsn), MyProcPid);

-- 
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.



pgsql-hackers by date:

Previous
From: Gavin Panella
Date:
Subject: Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected
Next
From: Japin Li
Date:
Subject: Update the LSN format in the comment example