Thread: Re: Documentation update of wal_retrieve_retry_interval to mention table sync worker

Re: Documentation update of wal_retrieve_retry_interval to mention table sync worker

From

Peter Smith

Date:

31 December 2024, 03:17:51

On Thu, Dec 26, 2024 at 1:37 AM vignesh C <vignesh21@gmail.com> wrote:
>
> Hi,
>
> Currently, we restart the table synchronization worker after the
> duration specified by wal_retrieve_retry_interval following the last
> failure. While this behavior is documented for apply workers, it is
> not mentioned for table synchronization workers. I believe this detail
> should be included in the documentation for table synchronization
> workers as well. Attached is a patch to address this omission.
>
> Regards,
> Vignesh

Hi Vignesh,

Here are some review comments for your v1 patch.

+1 to enhance the documentation.

======

1.
        <para>
         In logical replication, this parameter also limits how often a failing
-        replication apply worker will be respawned.
+        replication apply worker, and table synchronization worker will be
+        respawned.
        </para>

/, and/or/

SUGGESTION
In logical replication, this parameter also limits how often a failing
replication apply worker or table synchronization worker will be
respawned.

======

2.
I think the reader might never be aware of any of this (throttled
relaunch) behaviour unless they accidentally stumble across the docs
for this GUC, so IMO this information should be mentioned elsewhere --
wherever the tablesync worker errors are documented. But, TBH, I can't
find anywhere in the PostgreSQL docs where it even mentions
re-launching failed tablesync workers!

Anyway, I think it might be good to include such information in some
suitable place (maybe in the CREATE SUBSCRIPTION notes? or maybe in
Chapter 29?) to say something like...

SUGGESTION:
In practice, if a table synchronization worker fails during logical
replication, the apply worker detects the failure and attempts to
respawn the table synchronization worker to continue the
synchronization process. This behaviour ensures that transient errors
do not permanently disrupt the replication setup. See also
wal_retrieve_retry_interval.

======
Kind Regards,
Peter Smith.
Fujitsu Australia