Re: Unexpected Standby Shutdown on sync_replication_slots change - Mailing list pgsql-bugs
From | shveta malik |
---|---|
Subject | Re: Unexpected Standby Shutdown on sync_replication_slots change |
Date | |
Msg-id | CAJpy0uBY7_V8fdA6X2Ajq3zaEgSp8wyUVqoVGM-bhoBUoDt5dw@mail.gmail.com Whole thread Raw |
In response to | Re: Unexpected Standby Shutdown on sync_replication_slots change (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Unexpected Standby Shutdown on sync_replication_slots change
|
List | pgsql-bugs |
On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote: > > > > > > Hello, > > > > > > I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slotsparameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, thestandby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change,especially when the documentation doesn't indicate such an outcome. > > > > > > Steps to Reproduce > > > > > > Set up a physical replication between two PostgreSQL 17.5 instances. > > > > > > Ensure wal_level on the primary (and consequently on the standby) is set to replica. > > > > > > Start both the primary and standby instances, confirming replication is active. > > > > > > On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTERSYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();) > > > > > > Expected Behavior > > > > > > I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaveswhen not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled).A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipatedbehavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatiblewal_level would lead to a shutdown. > > > > > > Actual Behavior > > > > > > Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instanceimmediately shuts down with the following log messages: > > > > > > LOG: database system is ready to accept read-only connections > > > LOG: started streaming WAL from primary at 0/3000000 on timeline 1 > > > LOG: received SIGHUP, reloading configuration files > > > LOG: parameter "sync_replication_slots" changed to "on" > > > FATAL: replication slot synchronization requires "wal_level" >= "logical" > > > > > > Environment > > > > > > PostgreSQL Version: 17.5 > > > > Thanks for the report! > > > > I was able to reproduce the issue even on the latest master (v19dev). > > I agree that the current behavior—where changing a GUC parameter can > > cause the server to shut down—is unexpected and should be avoided. > > > > From what I’ve seen in the code, the problem stems from postmaster > > calling ValidateSlotSyncParams() before starting the slot sync worker. > > That function raises an ERROR if wal_level is not logical while > > sync_replication_slots is enabled. Since ERROR is treated as FATAL > > in postmaster, it causes the server to exit. > > > > To fix this, we could modify ValidateSlotSyncParams() so it doesn’t > > raise an ERROR in this case, as follows. > > > > ValidateSlotSyncParams(int elevel) > > { > > /* > > * Logical slot sync/creation requires wal_level >= logical. > > - * > > - * Since altering the wal_level requires a server restart, so error out in > > - * this case regardless of elevel provided by caller. > > */ > > if (wal_level < WAL_LEVEL_LOGICAL) > > - ereport(ERROR, > > + { > > + ereport(elevel, > > errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("replication slot synchronization requires \"wal_level\" >= > > \"logical\"")); > > + return false; > > + } > > I've created a patch to implement the above—attached. Thank You for the patch. > Note that this patch does not change the existing behavior when > the misconfiguration (sync_replication_slots enabled but wal_level not > set to logical) is detected at server startup. In that case, the server > still shuts down with a FATAL error, which is consistent with other > settings like summarize_wal. > Validated the behaviour, the patch looks good to me. thanks Shveta
pgsql-bugs by date: