Thread: max_connections and standby server

max_connections and standby server

From
Tatsuo Ishii
Date:
Today I encountered an interesting situation.

1) A streaming replication primary server and a standby server is  running. At this point max_connections = 100 on both
servers.

2) Shutdown both servers.

3) Change max_connections to 1100 on both servers and restart both  servers.

4) The primary server happily started but the standby server won't  because of lacking resource.

5) Shutdown both servers.

6) Restore max_connections to 100 on both servers and restart both  servers.

7) The primary server happily started but the standby server won't  because of the reason below.

32695 2015-08-11 13:46:22 JST FATAL:  hot standby is not possible because max_connections = 100 is a lower setting than
onthe master server (its value was 1100)
 
32695 2015-08-11 13:46:22 JST CONTEXT:  xlog redo parameter change: max_connections=1100 max_worker_processes=8
max_prepared_xacts=10max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off
 
32693 2015-08-11 13:46:22 JST LOG:  startup process (PID 32695) exited with exit code 1
32693 2015-08-11 13:46:22 JST LOG:  terminating any other active server processes

I think this is because pg_control on the standby remembers that the
previous primary server's max_connections = 1100 even if the standby
server fails to start. Shouldn't we update pg_control file only when
standby succeeds to start?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: max_connections and standby server

From
Tom Lane
Date:
Tatsuo Ishii <ishii@postgresql.org> writes:
> I think this is because pg_control on the standby remembers that the
> previous primary server's max_connections = 1100 even if the standby
> server fails to start. Shouldn't we update pg_control file only when
> standby succeeds to start?

Somebody refresh my memory as to why we have this restriction (that is,
slave's max_connections >= master's max_connections) in the first place?
Seems like it should not be a necessary requirement, and working towards
getting rid of it would be far better than any other answer.
        regards, tom lane



Re: max_connections and standby server

From
Tatsuo Ishii
Date:
> Somebody refresh my memory as to why we have this restriction (that is,
> slave's max_connections >= master's max_connections) in the first place?
> Seems like it should not be a necessary requirement, and working towards
> getting rid of it would be far better than any other answer.

If you care about max_connections, you might want to care about below as well (from xlog.c)
    RecoveryRequiresIntParameter("max_worker_processes",                                 max_worker_processes,
                      ControlFile->max_worker_processes);    RecoveryRequiresIntParameter("max_prepared_transactions",
                              max_prepared_xacts,                                 ControlFile->max_prepared_xacts);
RecoveryRequiresIntParameter("max_locks_per_transaction",

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: max_connections and standby server

From
Michael Paquier
Date:
On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tatsuo Ishii <ishii@postgresql.org> writes:
>> I think this is because pg_control on the standby remembers that the
>> previous primary server's max_connections = 1100 even if the standby
>> server fails to start. Shouldn't we update pg_control file only when
>> standby succeeds to start?
>
> Somebody refresh my memory as to why we have this restriction (that is,
> slave's max_connections >= master's max_connections) in the first place?
> Seems like it should not be a necessary requirement, and working towards
> getting rid of it would be far better than any other answer.

If I recall correctly, that's because KnownAssignedXIDs and the lock
table need to be large enough on the standby for the largest snapshot
possible (procarray.c).
-- 
Michael



Re: max_connections and standby server

From
Michael Paquier
Date:
On Tue, Aug 11, 2015 at 2:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Tatsuo Ishii <ishii@postgresql.org> writes:
>>> I think this is because pg_control on the standby remembers that the
>>> previous primary server's max_connections = 1100 even if the standby
>>> server fails to start. Shouldn't we update pg_control file only when
>>> standby succeeds to start?
>>
>> Somebody refresh my memory as to why we have this restriction (that is,
>> slave's max_connections >= master's max_connections) in the first place?
>> Seems like it should not be a necessary requirement, and working towards
>> getting rid of it would be far better than any other answer.
>
> If I recall correctly, that's because KnownAssignedXIDs and the lock
> table need to be large enough on the standby for the largest snapshot
> possible (procarray.c).

... And the maximum number of locks possible on master (for the lock
table, wasn't it for the concurrent number of AccessExclusiveLocks,
btw?).
-- 
Michael



Re: max_connections and standby server

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Somebody refresh my memory as to why we have this restriction (that is,
>> slave's max_connections >= master's max_connections) in the first place?
>> Seems like it should not be a necessary requirement, and working towards
>> getting rid of it would be far better than any other answer.

> If I recall correctly, that's because KnownAssignedXIDs and the lock
> table need to be large enough on the standby for the largest snapshot
> possible (procarray.c).

Hm.  Surely KnownAssignedXIDs could be resized at need.  As for the shared
lock table on the standby, that could be completely occupied by locks
taken by hot-standby backend processes, so I don't see why we're insisting
on anything particular as to its size.
        regards, tom lane



Re: max_connections and standby server

From
Andres Freund
Date:
On 2015-08-11 13:53:15 +0900, Tatsuo Ishii wrote:
> Today I encountered an interesting situation.
> 
> 1) A streaming replication primary server and a standby server is
>    running. At this point max_connections = 100 on both servers.
> 
> 2) Shutdown both servers.
> 
> 3) Change max_connections to 1100 on both servers and restart both
>    servers.
> 
> 4) The primary server happily started but the standby server won't
>    because of lacking resource.
> 
> 5) Shutdown both servers.
> 
> 6) Restore max_connections to 100 on both servers and restart both
>    servers.
> 
> 7) The primary server happily started but the standby server won't
>    because of the reason below.
> 
> 32695 2015-08-11 13:46:22 JST FATAL:  hot standby is not possible because max_connections = 100 is a lower setting
thanon the master server (its value was 1100)
 
> 32695 2015-08-11 13:46:22 JST CONTEXT:  xlog redo parameter change: max_connections=1100 max_worker_processes=8
max_prepared_xacts=10max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off
 
> 32693 2015-08-11 13:46:22 JST LOG:  startup process (PID 32695) exited with exit code 1
> 32693 2015-08-11 13:46:22 JST LOG:  terminating any other active server processes
> 
> I think this is because pg_control on the standby remembers that the
> previous primary server's max_connections = 1100 even if the standby
> server fails to start. Shouldn't we update pg_control file only when
> standby succeeds to start?

I don't think that'd help. There's a WAL record generated that contains
the master's settings (C.f. XLogReportParameters()) and when replaying
we check that the local settings are compatible with the master's. So
you'll either have to have higher settings on the standby for at least
one restart or, maybe easier given 4), simply start the standby for a
second with hot_standby = off, and then re-enable it after it has
replayed pending WAL.

Greetings,

Andres Freund



Re: max_connections and standby server

From
Andres Freund
Date:
On 2015-08-11 02:06:53 -0400, Tom Lane wrote:
> Hm.  Surely KnownAssignedXIDs could be resized at need.

It's in shared memory so GetSnapshotData() can access it, so not trivially.

> lock table on the standby, that could be completely occupied by locks
> taken by hot-standby backend processes, so I don't see why we're insisting
> on anything particular as to its size.

The startup process alone needs to be able to hold all the master's
exclusive locks at once since they're WAL logged (and have to be).

Idon't think common locks held by other processes are an actual problem
- if max_connections and max_locks_per_xact is the same they can only
hold as many locks as the master could. They'd all conflict with WAL
replay of the exclusive locks anyway.

Now you probably could create a problematic situation by creating
hundres of advisory locks or something. But that's a fairly different
scenario from an idle server not being able to replay the primary's WAL
records because it can't keep track of all the locks.


Now you can argue that it's uncommon to hold that many AE locks on the
primary in the first place. But i'm not sure it's true. The most common
reasons I've seen for exceeding locks are dumps and restores - and the
latter is all AELs.

Greetings,

Andres Freund



Re: max_connections and standby server

From
Simon Riggs
Date:
On 11 August 2015 at 06:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tatsuo Ishii <ishii@postgresql.org> writes:
> I think this is because pg_control on the standby remembers that the
> previous primary server's max_connections = 1100 even if the standby
> server fails to start. Shouldn't we update pg_control file only when
> standby succeeds to start?

Somebody refresh my memory as to why we have this restriction (that is,
slave's max_connections >= master's max_connections) in the first place?
Seems like it should not be a necessary requirement, and working towards
getting rid of it would be far better than any other answer.

That was the consensus on how to control things on the standby, as 9.0 closed.

Yes, there are various other ways of specifying those things and these days they could be made to react more dynamically.

There are various major improvements to hot standby that could have happened, but that time has been spent on the more useful logical replication which is slowly making its way into core. More coming in 9.6.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services