Thread: max_connections and standby server
Today I encountered an interesting situation. 1) A streaming replication primary server and a standby server is running. At this point max_connections = 100 on both servers. 2) Shutdown both servers. 3) Change max_connections to 1100 on both servers and restart both servers. 4) The primary server happily started but the standby server won't because of lacking resource. 5) Shutdown both servers. 6) Restore max_connections to 100 on both servers and restart both servers. 7) The primary server happily started but the standby server won't because of the reason below. 32695 2015-08-11 13:46:22 JST FATAL: hot standby is not possible because max_connections = 100 is a lower setting than onthe master server (its value was 1100) 32695 2015-08-11 13:46:22 JST CONTEXT: xlog redo parameter change: max_connections=1100 max_worker_processes=8 max_prepared_xacts=10max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off 32693 2015-08-11 13:46:22 JST LOG: startup process (PID 32695) exited with exit code 1 32693 2015-08-11 13:46:22 JST LOG: terminating any other active server processes I think this is because pg_control on the standby remembers that the previous primary server's max_connections = 1100 even if the standby server fails to start. Shouldn't we update pg_control file only when standby succeeds to start? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: > I think this is because pg_control on the standby remembers that the > previous primary server's max_connections = 1100 even if the standby > server fails to start. Shouldn't we update pg_control file only when > standby succeeds to start? Somebody refresh my memory as to why we have this restriction (that is, slave's max_connections >= master's max_connections) in the first place? Seems like it should not be a necessary requirement, and working towards getting rid of it would be far better than any other answer. regards, tom lane
> Somebody refresh my memory as to why we have this restriction (that is, > slave's max_connections >= master's max_connections) in the first place? > Seems like it should not be a necessary requirement, and working towards > getting rid of it would be far better than any other answer. If you care about max_connections, you might want to care about below as well (from xlog.c) RecoveryRequiresIntParameter("max_worker_processes", max_worker_processes, ControlFile->max_worker_processes); RecoveryRequiresIntParameter("max_prepared_transactions", max_prepared_xacts, ControlFile->max_prepared_xacts); RecoveryRequiresIntParameter("max_locks_per_transaction", Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Tatsuo Ishii <ishii@postgresql.org> writes: >> I think this is because pg_control on the standby remembers that the >> previous primary server's max_connections = 1100 even if the standby >> server fails to start. Shouldn't we update pg_control file only when >> standby succeeds to start? > > Somebody refresh my memory as to why we have this restriction (that is, > slave's max_connections >= master's max_connections) in the first place? > Seems like it should not be a necessary requirement, and working towards > getting rid of it would be far better than any other answer. If I recall correctly, that's because KnownAssignedXIDs and the lock table need to be large enough on the standby for the largest snapshot possible (procarray.c). -- Michael
On Tue, Aug 11, 2015 at 2:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Tatsuo Ishii <ishii@postgresql.org> writes: >>> I think this is because pg_control on the standby remembers that the >>> previous primary server's max_connections = 1100 even if the standby >>> server fails to start. Shouldn't we update pg_control file only when >>> standby succeeds to start? >> >> Somebody refresh my memory as to why we have this restriction (that is, >> slave's max_connections >= master's max_connections) in the first place? >> Seems like it should not be a necessary requirement, and working towards >> getting rid of it would be far better than any other answer. > > If I recall correctly, that's because KnownAssignedXIDs and the lock > table need to be large enough on the standby for the largest snapshot > possible (procarray.c). ... And the maximum number of locks possible on master (for the lock table, wasn't it for the concurrent number of AccessExclusiveLocks, btw?). -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Somebody refresh my memory as to why we have this restriction (that is, >> slave's max_connections >= master's max_connections) in the first place? >> Seems like it should not be a necessary requirement, and working towards >> getting rid of it would be far better than any other answer. > If I recall correctly, that's because KnownAssignedXIDs and the lock > table need to be large enough on the standby for the largest snapshot > possible (procarray.c). Hm. Surely KnownAssignedXIDs could be resized at need. As for the shared lock table on the standby, that could be completely occupied by locks taken by hot-standby backend processes, so I don't see why we're insisting on anything particular as to its size. regards, tom lane
On 2015-08-11 13:53:15 +0900, Tatsuo Ishii wrote: > Today I encountered an interesting situation. > > 1) A streaming replication primary server and a standby server is > running. At this point max_connections = 100 on both servers. > > 2) Shutdown both servers. > > 3) Change max_connections to 1100 on both servers and restart both > servers. > > 4) The primary server happily started but the standby server won't > because of lacking resource. > > 5) Shutdown both servers. > > 6) Restore max_connections to 100 on both servers and restart both > servers. > > 7) The primary server happily started but the standby server won't > because of the reason below. > > 32695 2015-08-11 13:46:22 JST FATAL: hot standby is not possible because max_connections = 100 is a lower setting thanon the master server (its value was 1100) > 32695 2015-08-11 13:46:22 JST CONTEXT: xlog redo parameter change: max_connections=1100 max_worker_processes=8 max_prepared_xacts=10max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off > 32693 2015-08-11 13:46:22 JST LOG: startup process (PID 32695) exited with exit code 1 > 32693 2015-08-11 13:46:22 JST LOG: terminating any other active server processes > > I think this is because pg_control on the standby remembers that the > previous primary server's max_connections = 1100 even if the standby > server fails to start. Shouldn't we update pg_control file only when > standby succeeds to start? I don't think that'd help. There's a WAL record generated that contains the master's settings (C.f. XLogReportParameters()) and when replaying we check that the local settings are compatible with the master's. So you'll either have to have higher settings on the standby for at least one restart or, maybe easier given 4), simply start the standby for a second with hot_standby = off, and then re-enable it after it has replayed pending WAL. Greetings, Andres Freund
On 2015-08-11 02:06:53 -0400, Tom Lane wrote: > Hm. Surely KnownAssignedXIDs could be resized at need. It's in shared memory so GetSnapshotData() can access it, so not trivially. > lock table on the standby, that could be completely occupied by locks > taken by hot-standby backend processes, so I don't see why we're insisting > on anything particular as to its size. The startup process alone needs to be able to hold all the master's exclusive locks at once since they're WAL logged (and have to be). Idon't think common locks held by other processes are an actual problem - if max_connections and max_locks_per_xact is the same they can only hold as many locks as the master could. They'd all conflict with WAL replay of the exclusive locks anyway. Now you probably could create a problematic situation by creating hundres of advisory locks or something. But that's a fairly different scenario from an idle server not being able to replay the primary's WAL records because it can't keep track of all the locks. Now you can argue that it's uncommon to hold that many AE locks on the primary in the first place. But i'm not sure it's true. The most common reasons I've seen for exceeding locks are dumps and restores - and the latter is all AELs. Greetings, Andres Freund
On 11 August 2015 at 06:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--
Tatsuo Ishii <ishii@postgresql.org> writes:
> I think this is because pg_control on the standby remembers that the
> previous primary server's max_connections = 1100 even if the standby
> server fails to start. Shouldn't we update pg_control file only when
> standby succeeds to start?
Somebody refresh my memory as to why we have this restriction (that is,
slave's max_connections >= master's max_connections) in the first place?
Seems like it should not be a necessary requirement, and working towards
getting rid of it would be far better than any other answer.
That was the consensus on how to control things on the standby, as 9.0 closed.
Yes, there are various other ways of specifying those things and these days they could be made to react more dynamically.
There are various major improvements to hot standby that could have happened, but that time has been spent on the more useful logical replication which is slowly making its way into core. More coming in 9.6.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services