Re: synchronized_standby_slots used in logical replication - Mailing list pgsql-hackers
From | Fabrice Chapuis |
---|---|
Subject | Re: synchronized_standby_slots used in logical replication |
Date | |
Msg-id | CAA5-nLCKwP3qHUH7z0=bh4Uzwe5Km9T_v5M9mAefp29hMPzTqg@mail.gmail.com Whole thread Raw |
In response to | Re: synchronized_standby_slots used in logical replication (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: synchronized_standby_slots used in logical replication
|
List | pgsql-hackers |
Thank you very much for the detailed response. I will proceed with the native implementation for synchronizing logical replication slots. In a maintenance context, when standby is shutdown, it's possible to temporarily disable the
synchronized_standby_slots
parameter to avoid blocking logical replication on the primary.Regards
Fabrice
On Thu, Jun 5, 2025 at 8:57 AM shveta malik <shveta.malik@gmail.com> wrote:
On Wed, Jun 4, 2025 at 4:01 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Hi,
>
> I'm working with logical replication in a PostgreSQL 17 setup, and I'm exploring the new synchronized_standby_slots parameter to make replication slots failover safe in a highly available environment using physical standby nodes managed by Patroni.
>
> While testing this feature, I encountered a blocking behavior, when a standby is listed in synchronized_standby_slots and that standby goes offline, logical replication on the primary stops progressing. From what I understand, the primary node waits for the standby to acknowledge received wal records, effectively stalling WAL decoding for the logical slot. I noticed that the failover slot on the standby continue to be synced.
Yes, your understanding is correct.
>
> This raises several questions about the tradeoffs and implications of using this feature:
>
> What are the risks or limitations if synchronized_standby_slots is left empty (the default)? Is there a risk of data loss or inconsistency for logical subscribers in such cases?
If the 'synchronized_standby_slots' setting is left unset, logical
replication subscribers may progress ahead of the physical standby
servers. In the event of a failover under such conditions, the new
primary might lack the necessary data to continue supporting logical
replication, even if synchronized slots are in place, resulting in
unexpected behavior. Therefore, it is strongly recommended to
configure 'synchronized_standby_slots' properly to ensure that all
configured physical standbys have received and flushed the changes
before those changes are made visible to logical replication
subscribers.
> Is it expected behavior that any failure of a standby listed in synchronized_standby_slots stalls logical decoding on the primary? If so, are there any ways to avoid blocking WAL decoding while still having slot synchronization?
Yes, this is expected behavior. It is similar to how
'synchronous_standby_names' works, where a commit on the primary is
allowed to proceed only after the configured standby servers
acknowledge receipt of the data. The main difference is that
'synchronous_standby_names' provides more configuration options, such
as FIRST and ANY, allowing the system to wait for a subset of standbys
rather than all of them. However, if none of the configured standbys
are available, the primary will still wait, just like in this case
until a standby becomes available or the configuration is changed. In
the future, if needed, similar flexibility (e.g., support for ANY,
FIRST) could potentially be extended to 'synchronized_standby_slots'
as well. For now, the way to move forward is either by updating the
configuration or by restoring the standby to an operational state.
> Patroni is managing FO slots better than native Postgres impletmentation?
I'm not entirely certain about that. However, PostgreSQL does handle
several complex scenarios, such as:
--Ensuring seamless logical replication on failover by allowing users
to configure potential failover candidates via
synchronized_standby_slots, making synced slots ready for failover in
all the situations.
--To ensure consistency, we avoid direct copy of slot unless a
consistent point could be reached with the new values. Otherwise after
promotion, the slots may not reach a consistent point, potentially
resulting in data loss.
--Supporting two-phase transactions for failover slots, where
transactions prepared before two_phase decoding is enabled are handled
correctly even if the failover occurs immediately afterward.
You may want to check with the Patroni community for more detailed
insights. We're open to considering any gaps or missing functionality
in PostgreSQL as well.
thanks
Shveta
pgsql-hackers by date: