Improve handling of parameter differences in physical replication - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Improve handling of parameter differences in physical replication
Date
Msg-id 4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
Whole thread Raw
Responses Re: Improve handling of parameter differences in physical replication  (Sergei Kornilov <sk@zsrv.org>)
Re: Improve handling of parameter differences in physical replication  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Re: Improve handling of parameter differences in physical replication  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
When certain parameters are changed on a physical replication primary, 
   this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL 
record.  The standby then checks whether its own settings are at least 
as big as the ones on the primary.  If not, the standby shuts down with 
a fatal error.

The correspondence of settings between primary and standby is required 
because those settings influence certain shared memory sizings that are 
required for processing WAL records that the primary might send.  For 
example, if the primary sends a prepared transaction, the standby must 
have had max_prepared_transaction set appropriately or it won't be able 
to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of 
the parameter change record might be a bit of an overreaction.  The 
resources related to those settings are not required immediately at that 
point, and might never be required if the activity on the primary does 
not exhaust all those resources.  An extreme example is raising 
max_prepared_transactions on the primary but never actually using 
prepared transactions.

Where this becomes a serious problem is if you have many standbys and 
you do a failover.  If the newly promoted standby happens to have a 
higher setting for one of the relevant parameters, all the other 
standbys that have followed it then shut down immediately and won't be 
able to continue until you change all their settings.

If we didn't do the hard shutdown and we just let the standby roll on 
with recovery, nothing bad will happen and it will eventually produce an 
appropriate error when those resources are required (e.g., "maximum 
number of prepared transactions reached").

So I think there are better ways to handle this.  It might be reasonable 
to provide options.  The attached patch doesn't do that but it would be 
pretty easy.  What the attached patch does is:

Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but 
only issue a warning and set a global flag if there is a problem.  Then 
when we actually hit the resource issue and the flag was set, we issue 
another warning message with relevant information.  Additionally, at 
that point we pause recovery instead of shutting down, so a hot standby 
remains usable.  (That could certainly be configurable.)

Btw., I think the current setup is slightly buggy.  The MaxBackends 
value that is used to size shared memory is computed as MaxConnections + 
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but 
we don't track autovacuum_max_workers in WAL.

(This patch was developed together with Simon Riggs.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Hubert Zhang
Date:
Subject: Re: Yet another vectorized engine
Next
From: Sandro Santilli
Date:
Subject: Re: [postgis-devel] About EXTENSION from UNPACKAGED on PostgreSQL 13