Thread: Streaming replication document improvements

Streaming replication document improvements

From
Fujii Masao
Date:
Hi,

The attached patch improves the documents about SR as follows:

* Improve the description about the setup procedure of SR.
  o Add the link to the recovery.conf parameter if needed.
  o Add the example of recovery.conf setting.
* Document what should be monitored for avoiding the disk full
  failure on the primary.
* Clarify how to safely take an incrementally updated backups.
* Define the recovery.conf parameters as an index term.
* Describe how to use recovery.conf.

Any comments are welcome.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [DOCS] Streaming replication document improvements

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> ***************
> *** 829,834 **** if (!triggered)
> --- 826,834 ----
>         <para>
>          Set the maximum number of concurrent connections from the standby servers
>          (see <xref linkend="guc-max-wal-senders"> for details).
> +        Since those connections are for superusers,
> +        <xref linkend="guc-superuser-reserved-connections"> should be
> +        set accordingly.
>         </para>
>        </listitem>
>        <listitem>

That's an interesting point, do streaming replication connections
consume superuser_reserved_connections slots? How should
superuser_reserved_connections be set?

> *** a/src/backend/access/transam/recovery.conf.sample
> --- b/src/backend/access/transam/recovery.conf.sample
> ***************
> *** 88,94 ****
>   #recovery_target_timeline = '33'        # number or 'latest'
>   #
>   #---------------------------------------------------------------------------
> ! # LOG-STREAMING REPLICATION PARAMETERS
>   #---------------------------------------------------------------------------
>   #
>   # When standby_mode is enabled, the PostgreSQL server will work as
> --- 88,94 ----
>   #recovery_target_timeline = '33'        # number or 'latest'
>   #
>   #---------------------------------------------------------------------------
> ! # STANDBY SERVER PARAMETERS
>   #---------------------------------------------------------------------------
>   #
>   # When standby_mode is enabled, the PostgreSQL server will work as

Hmm, that makes the HOT STANDBY notice after that section look weird:

> #---------------------------------------------------------------------------
> # HOT STANDBY PARAMETERS
> #---------------------------------------------------------------------------
> #
> # Hot Standby related parameters are listed in postgresql.conf
> #
> #---------------------------------------------------------------------------

Do we need that notice? Maybe just move that sentence to the "STANDBY
SERVER PARAMETERS" section.


I just committed a patch to move around the high-availability sections a
bit. That caused conflicts with this patch, so I picked the changes from
the patch and applied them over the new layout, and I also did a lot of
other editing. So, I committed most parts of this patch (except the
above), with a lot of changes to fix the bit-rot, and also other editing
to my liking. I hope I made it better not worse.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Thu, Apr 1, 2010 at 5:39 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> ***************
>> *** 829,834 **** if (!triggered)
>> --- 826,834 ----
>>         <para>
>>          Set the maximum number of concurrent connections from the standby servers
>>          (see <xref linkend="guc-max-wal-senders"> for details).
>> +        Since those connections are for superusers,
>> +        <xref linkend="guc-superuser-reserved-connections"> should be
>> +        set accordingly.
>>         </para>
>>        </listitem>
>>        <listitem>
>
> That's an interesting point, do streaming replication connections
> consume superuser_reserved_connections slots?

Yes. Since SR connection is superuser connection, setting
superuser_reserved_connections appropriately would be useful
to prevent non-superuser connections from blocking the connection
from the standby.

> How should
> superuser_reserved_connections be set?

Well.. setting the number of superuser connection required for
maintenance plus max_wal_senders seems to be reasonable.

>> *** a/src/backend/access/transam/recovery.conf.sample
>> --- b/src/backend/access/transam/recovery.conf.sample
>> ***************
>> *** 88,94 ****
>>   #recovery_target_timeline = '33'            # number or 'latest'
>>   #
>>   #---------------------------------------------------------------------------
>> ! # LOG-STREAMING REPLICATION PARAMETERS
>>   #---------------------------------------------------------------------------
>>   #
>>   # When standby_mode is enabled, the PostgreSQL server will work as
>> --- 88,94 ----
>>   #recovery_target_timeline = '33'            # number or 'latest'
>>   #
>>   #---------------------------------------------------------------------------
>> ! # STANDBY SERVER PARAMETERS
>>   #---------------------------------------------------------------------------
>>   #
>>   # When standby_mode is enabled, the PostgreSQL server will work as
>
> Hmm, that makes the HOT STANDBY notice after that section look weird:
>
>> #---------------------------------------------------------------------------
>> # HOT STANDBY PARAMETERS
>> #---------------------------------------------------------------------------
>> #
>> # Hot Standby related parameters are listed in postgresql.conf
>> #
>> #---------------------------------------------------------------------------
>
> Do we need that notice? Maybe just move that sentence to the "STANDBY
> SERVER PARAMETERS" section.

I don't think that the notice is necessary. But I agree to the move.

> I just committed a patch to move around the high-availability sections a
> bit. That caused conflicts with this patch, so I picked the changes from
> the patch and applied them over the new layout, and I also did a lot of
> other editing. So, I committed most parts of this patch (except the
> above), with a lot of changes to fix the bit-rot, and also other editing
> to my liking. I hope I made it better not worse.

Thanks a lot!

doc/src/sgml/monitoring.sgml
> +    In streaming replication (see <xref linkend="streaming-replication"> for details),
> +    WAL sender process is listed in the <command>ps</> display on the primary server.
> +    The form of its command line display is:
> + <screen>
> + postgres: wal sender process <replaceable>user</> <replaceable>host</> streaming <replaceable>location</>
> + </screen>
> +    The user and host items remain the same for the life of the replication connection.
> +    The location indicates how far WAL sender process has sent the WAL.
> +    On the other hand, WAL receiver process is listed in the <command>ps</> display
> +    on the standby server. The form of its command line display is:
> + <screen>
> + postgres: wal receiver process streaming <replaceable>location</>
> + </screen>
> +    The location indicates how far WAL receiver has received the WAL.
> +   </para>
> +
> +   <para>

The original patch includes the above change for monitoring.sgml, but
it's been excluded from the commit. You think that it's not worth applying
the change?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> That's an interesting point, do streaming replication connections
>> consume superuser_reserved_connections slots?
>
> Yes. Since SR connection is superuser connection, setting
> superuser_reserved_connections appropriately would be useful
> to prevent non-superuser connections from blocking the connection
> from the standby.

I think this is wacky, especially since we'd someday like to have
replication be a separate privilege from superuser.  I think we should
change this.

...Robert


Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Thu, Apr 1, 2010 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> That's an interesting point, do streaming replication connections
>>> consume superuser_reserved_connections slots?
>>
>> Yes. Since SR connection is superuser connection, setting
>> superuser_reserved_connections appropriately would be useful
>> to prevent non-superuser connections from blocking the connection
>> from the standby.
>
> I think this is wacky, especially since we'd someday like to have
> replication be a separate privilege from superuser.  I think we should
> change this.

You mean that we should change replication connection not to consume
superuser_reserved_connections slots in 9.0?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Apr 1, 2010 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> That's an interesting point, do streaming replication connections
>>>> consume superuser_reserved_connections slots?
>>>
>>> Yes. Since SR connection is superuser connection, setting
>>> superuser_reserved_connections appropriately would be useful
>>> to prevent non-superuser connections from blocking the connection
>>> from the standby.
>>
>> I think this is wacky, especially since we'd someday like to have
>> replication be a separate privilege from superuser.  I think we should
>> change this.
>
> You mean that we should change replication connection not to consume
> superuser_reserved_connections slots in 9.0?

Yes.

...Robert


Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Thu, Apr 1, 2010 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Apr 1, 2010 at 10:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Mar 31, 2010 at 9:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> That's an interesting point, do streaming replication connections
>>>>> consume superuser_reserved_connections slots?
>>>>
>>>> Yes. Since SR connection is superuser connection, setting
>>>> superuser_reserved_connections appropriately would be useful
>>>> to prevent non-superuser connections from blocking the connection
>>>> from the standby.
>>>
>>> I think this is wacky, especially since we'd someday like to have
>>> replication be a separate privilege from superuser.  I think we should
>>> change this.
>>
>> You mean that we should change replication connection not to consume
>> superuser_reserved_connections slots in 9.0?
>
> Yes.

Preventing superuser connections from consuming superuser_reserved_connections
slots seems strange for me. So I'm leaning toward just removing superuser
privilege from replication connection again. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [DOCS] Streaming replication document improvements

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Apr 1, 2010 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> You mean that we should change replication connection not to consume
>>> superuser_reserved_connections slots in 9.0?
>> Yes.

I think it's good that walsenders can use the superuser reserved slots,
that way a client that opens max_connections connections can't block out
standby servers from connecting.

> Preventing superuser connections from consuming superuser_reserved_connections
> slots seems strange for me. So I'm leaning toward just removing superuser
> privilege from replication connection again. Thought?

That would be good, but I fear it's a bigger change than we should be
doing at this point.

How about we adjust the backends math a bit:

Currently:

ReservedBackends = superuser_reserved_connections
MaxBackends = max_connections + autovacuum_max_workers + 1;

Proposal:

ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

So we implicitly reserve a slot and a superuser reserved slot for each
walsender. Walsenders use the slots reserved for superusers, but if you
set superuser_reserved_connections=3, there's still always at least
three slots available for superuser to log in with psql, even if the
maximum number of walsenders are connected.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Fujii Masao wrote:
>> Preventing superuser connections from consuming superuser_reserved_connections
>> slots seems strange for me. So I'm leaning toward just removing superuser
>> privilege from replication connection again. Thought?

> That would be good, but I fear it's a bigger change than we should be
> doing at this point.

Exactly.  Despite Robert's unhappiness, NONE of this should get changed
right now.  When and if we create a separate replication privilege,
it'll be time enough to revisit the connection limit arithmetic.

            regards, tom lane

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Thu, Apr 1, 2010 at 9:09 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>> On Thu, Apr 1, 2010 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Mar 31, 2010 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> You mean that we should change replication connection not to consume
>>>> superuser_reserved_connections slots in 9.0?
>>> Yes.
>
> I think it's good that walsenders can use the superuser reserved slots,
> that way a client that opens max_connections connections can't block out
> standby servers from connecting.
>
>> Preventing superuser connections from consuming superuser_reserved_connections
>> slots seems strange for me. So I'm leaning toward just removing superuser
>> privilege from replication connection again. Thought?
>
> That would be good, but I fear it's a bigger change than we should be
> doing at this point.
>
> How about we adjust the backends math a bit:
>
> Currently:
>
> ReservedBackends = superuser_reserved_connections
> MaxBackends = max_connections + autovacuum_max_workers + 1;
>
> Proposal:
>
> ReservedBackends = superuser_reserved_connections + max_wal_senders
> MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1
>
> So we implicitly reserve a slot and a superuser reserved slot for each
> walsender. Walsenders use the slots reserved for superusers, but if you
> set superuser_reserved_connections=3, there's still always at least
> three slots available for superuser to log in with psql, even if the
> maximum number of walsenders are connected.

That seems pretty reasonable to me.  I haven't checked how much code
impact there is.  I know Tom doesn't think we should change it at all,
but surely pre-beta is the time to fix nasty corner cases that were
added by recently committed patches?

...Robert


Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> That seems pretty reasonable to me.  I haven't checked how much code
> impact there is.  I know Tom doesn't think we should change it at all,
> but surely pre-beta is the time to fix nasty corner cases that were
> added by recently committed patches?

What nasty corner case?  Having replication connections use superuser
reserved slots seems exactly the behavior I'd expect given that they are
running as superuser.  I agree it would be good to decouple that later,
but we already decided we are not going to try to separate replication
privilege from superuser in 9.0.

(Also, autovacuum workers are a quite separate concept since the DBA
doesn't set them up or deal with them directly.  So I'm unimpressed by
pointing to the treatment of autovacuum_max_workers as a precedent.)

            regards, tom lane

Re: [DOCS] Streaming replication document improvements

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Having replication connections use superuser reserved slots seems
> exactly the behavior I'd expect given that they are running as
> superuser.
It seems within the realm of possibility that not all users would
think to boost superuser_reserved_connections by the number of
replication connections, and be surprised when they are unable to
connect in an emergency.
-Kevin


Re: [DOCS] Streaming replication document improvements

From
Josh Berkus
Date:
On 4/1/10 10:44 AM, Kevin Grittner wrote:
> It seems within the realm of possibility that not all users would
> think to boost superuser_reserved_connections by the number of
> replication connections, and be surprised when they are unable to
> connect in an emergency.

Well, that's easy to add to the documentation.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Thu, Apr 1, 2010 at 1:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 4/1/10 10:44 AM, Kevin Grittner wrote:
>> It seems within the realm of possibility that not all users would
>> think to boost superuser_reserved_connections by the number of
>> replication connections, and be surprised when they are unable to
>> connect in an emergency.
>
> Well, that's easy to add to the documentation.

It's probably also easy to fix so that it doesn't NEED to be documented.

The thing is, when dealing with new features, we reduce our overall
maintenance burden if we get it right the first time.  Obviously it's
too late for major changes, but minor adjustments to maintain the POLA
seem like exactly what we SHOULD be doing right now.

...Robert


Re: [DOCS] Streaming replication document improvements

From
Josh Berkus
Date:
> The thing is, when dealing with new features, we reduce our overall
> maintenance burden if we get it right the first time.  Obviously it's
> too late for major changes, but minor adjustments to maintain the POLA
> seem like exactly what we SHOULD be doing right now.

Oh, I agree.  Since we have a separate WALSender limit, it seems
counter-intuitive and difficult-to-admin to have the WALSenders also
limited by superuser_connections.  They should be their own separate
connection pool, just like the other "background" processes.

However, if this was somehow infeasible, it wouldn't be hard to
document.  That's all.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: [DOCS] Streaming replication document improvements

From
Dimitri Fontaine
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Oh, I agree.  Since we have a separate WALSender limit, it seems
> counter-intuitive and difficult-to-admin to have the WALSenders also
> limited by superuser_connections.  They should be their own separate
> connection pool, just like the other "background" processes.

+1

Regards,
-- 
dim


Re: [DOCS] Streaming replication document improvements

From
Simon Riggs
Date:
On Thu, 2010-04-01 at 11:49 -0700, Josh Berkus wrote:
> > The thing is, when dealing with new features, we reduce our overall
> > maintenance burden if we get it right the first time.  Obviously it's
> > too late for major changes, but minor adjustments to maintain the POLA
> > seem like exactly what we SHOULD be doing right now.
>
> Oh, I agree.  Since we have a separate WALSender limit, it seems
> counter-intuitive and difficult-to-admin to have the WALSenders also
> limited by superuser_connections.  They should be their own separate
> connection pool, just like the other "background" processes.
>
> However, if this was somehow infeasible, it wouldn't be hard to
> document.  That's all.

+1

--
 Simon Riggs           www.2ndQuadrant.com


Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Fri, Apr 2, 2010 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> It's probably also easy to fix so that it doesn't NEED to be documented.
>
> The thing is, when dealing with new features, we reduce our overall
> maintenance burden if we get it right the first time.  Obviously it's
> too late for major changes, but minor adjustments to maintain the POLA
> seem like exactly what we SHOULD be doing right now.

The attached patch implements the Heikki's proposal:

----------
ReservedBackends = superuser_reserved_connections + max_wal_senders
MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1
----------

This change looks like minor adjustments rather than major changes.

I have one question:
In the patch, max_wal_senders is ignored by CheckRequiredParameterValues()
as autovacuum_max_workers is already. Is this OK for HS?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Fri, Apr 2, 2010 at 2:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Apr 2, 2010 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It's probably also easy to fix so that it doesn't NEED to be documented.
>>
>> The thing is, when dealing with new features, we reduce our overall
>> maintenance burden if we get it right the first time.  Obviously it's
>> too late for major changes, but minor adjustments to maintain the POLA
>> seem like exactly what we SHOULD be doing right now.
>
> The attached patch implements the Heikki's proposal:
>
> ----------
> ReservedBackends = superuser_reserved_connections + max_wal_senders
> MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1
> ----------
>
> This change looks like minor adjustments rather than major changes.

Instead of doing this, could we just change the logic in InitPostgres?

Current logic says we hit the connection limit if:

        if (!am_superuser &&
                ReservedBackends > 0 &&
                !HaveNFreeProcs(ReservedBackends))

Couldn't we just change this to:

        if ((!am_superuser || am_walsender) &&
                ReservedBackends > 0 &&
                !HaveNFreeProcs(ReservedBackends))

Seems like that'd be a whole lot simpler, if it'll do the job...

...Robert

Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Tue, Apr 20, 2010 at 11:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Instead of doing this, could we just change the logic in InitPostgres?
>
> Current logic says we hit the connection limit if:
>
>        if (!am_superuser &&
>                ReservedBackends > 0 &&
>                !HaveNFreeProcs(ReservedBackends))
>
> Couldn't we just change this to:
>
>        if ((!am_superuser || am_walsender) &&
>                ReservedBackends > 0 &&
>                !HaveNFreeProcs(ReservedBackends))
>
> Seems like that'd be a whole lot simpler, if it'll do the job...

It's very simple, but prevents superuser replication connection
from being established when connection limit exceeds for
non-superusers. It seems strange to me that superuser cannot use
superuser_reserved_connections slots. If we'd like to forbid
replication connection to use the slots, I think that we should
just get rid of a superuser privilege from it instead.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Tue, Apr 20, 2010 at 5:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Apr 20, 2010 at 11:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Instead of doing this, could we just change the logic in InitPostgres?
>>
>> Current logic says we hit the connection limit if:
>>
>>        if (!am_superuser &&
>>                ReservedBackends > 0 &&
>>                !HaveNFreeProcs(ReservedBackends))
>>
>> Couldn't we just change this to:
>>
>>        if ((!am_superuser || am_walsender) &&
>>                ReservedBackends > 0 &&
>>                !HaveNFreeProcs(ReservedBackends))
>>
>> Seems like that'd be a whole lot simpler, if it'll do the job...
>
> It's very simple, but prevents superuser replication connection
> from being established when connection limit exceeds for
> non-superusers. It seems strange to me that superuser cannot use
> superuser_reserved_connections slots. If we'd like to forbid
> replication connection to use the slots, I think that we should
> just get rid of a superuser privilege from it instead.

Let's just stop for a second and think about why we have
superuser_reserved_connections in the first place.  As I understand
it, the point is that we want to make sure that superusers don't get
locked out of the database, because superuser intervention might be
necessary to recover from whatever series of unfortunate events has
caused all of the connection slots to get used up.  For example, if
there are several different applications that connect to the database,
the superuser might like to log in and see which application has
requested more than its usual allotment of connections, or the
superuser might like to log in and terminate those backends which, in
his judgement, ought to be terminated.  In other words, the purpose of
superuser_reserved_connections is to allow the database to recover
from a bad state that it has gotten into: specifically, a state where
all the connection slots have been used up and regular users can't
connect.

If replication connections can use up superuser_reserved_connections
slots, then it's very possible that this safety valve will fail
completely.  If the server is being flooded with connection attempts,
and one of the streaming replication connection dies, then a regular
backend will immediate grab that slot.  When the streaming replication
slave automatically tries to reconnect, it will now grab one of the
superuser_reserved_connections slots, putting us one step closer to
the bad scenario where there's no way for the superuser to log in
interactively and troubleshoot.

In other words, I don't care whether or not the replication connection
is or is not technically a superuser connection.  What I think is
important is trying to preserve the ability for a superuser to log in
interactively and clean up the mess even when the regular supply of
connections is maxed out.

...Robert

Re: [DOCS] Streaming replication document improvements

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: 
> I don't care whether or not the replication connection is or is
> not technically a superuser connection.  What I think is important
> is trying to preserve the ability for a superuser to log in
> interactively and clean up the mess even when the regular supply
> of connections is maxed out.
+1
-Kevin


Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If replication connections can use up superuser_reserved_connections
> slots, then it's very possible that this safety valve will fail
> completely.

Only if replication can use up *all* the superuser_reserved_connections
slots.

            regards, tom lane

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Tue, Apr 20, 2010 at 9:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If replication connections can use up superuser_reserved_connections
>> slots, then it's very possible that this safety valve will fail
>> completely.
>
> Only if replication can use up *all* the superuser_reserved_connections
> slots.

Sure.  In many cases someone will have 3
superuser_reserved_connections and only 1 wal_sender, so it won't be
an issue.  However, I still think we oughta make this (apparently
one-line) fix so that people will have the number of emergency
superuser connections that they expect to have, rather than some
smaller number that might be 0 if they have a number of SR slaves.

...Robert

Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Tue, Apr 20, 2010 at 7:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Let's just stop for a second and think about why we have
> superuser_reserved_connections in the first place.  As I understand
> it, the point is that we want to make sure that superusers don't get
> locked out of the database, because superuser intervention might be
> necessary to recover from whatever series of unfortunate events has
> caused all of the connection slots to get used up.  For example, if
> there are several different applications that connect to the database,
> the superuser might like to log in and see which application has
> requested more than its usual allotment of connections, or the
> superuser might like to log in and terminate those backends which, in
> his judgement, ought to be terminated.  In other words, the purpose of
> superuser_reserved_connections is to allow the database to recover
> from a bad state that it has gotten into: specifically, a state where
> all the connection slots have been used up and regular users can't
> connect.
>
> If replication connections can use up superuser_reserved_connections
> slots, then it's very possible that this safety valve will fail
> completely.  If the server is being flooded with connection attempts,
> and one of the streaming replication connection dies, then a regular
> backend will immediate grab that slot.  When the streaming replication
> slave automatically tries to reconnect, it will now grab one of the
> superuser_reserved_connections slots, putting us one step closer to
> the bad scenario where there's no way for the superuser to log in
> interactively and troubleshoot.
>
> In other words, I don't care whether or not the replication connection
> is or is not technically a superuser connection.  What I think is
> important is trying to preserve the ability for a superuser to log in
> interactively and clean up the mess even when the regular supply of
> connections is maxed out.

Yeah, I agree with you, but the difference is only how to achieve.
ISTM that there are three choices:

1. Heikki's proposal
> ReservedBackends = superuser_reserved_connections + max_wal_senders
> MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

2. My proposal
    Remove superuser privilege from replication connection

3. Your proposal
    Treat superuser replication connection like non-superuser one

Since 3. is confusing for me, I like 1. or 2.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Tue, Apr 20, 2010 at 9:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Yeah, I agree with you, but the difference is only how to achieve.
> ISTM that there are three choices:
>
> 1. Heikki's proposal
>> ReservedBackends = superuser_reserved_connections + max_wal_senders
>> MaxBackends = max_connections + autovacuum_max_workers + max_wal_senders + 1

This seemed sensible to me when Heikki first described it, but now it
seems overly complex.

> 2. My proposal
>    Remove superuser privilege from replication connection

I'm not sure this really fixes the problem.  If we add a separate
replication privilege, then presumably superusers will automatically
have that privilege, in accord with our usual policy on such things.
So potentially someone could still set up replication using a
superuser account and then they could still get bitten by this
problem.

> 3. Your proposal
>    Treat superuser replication connection like non-superuser one

Well, only for this one very specific purpose.  I would adjust the
docs like this:

Determines the number of connection "slots" that are reserved for
connections by PostgreSQL  superusers. At most max_connections
connections can ever be active simultaneously. Whenever the number of
active concurrent connections is at least max_connections minus
superuser_reserved_connections, new connections will be accepted only
for superusers, and no new replication connections will be accepted.

I think that's pretty simple and clear.  If we want to burn an extra
sentence explaining what this is all about, we could add:

(If replication connections were permitted to use the reserved
connection slots, an installation with max_wal_senders set to a value
greater than or equal to the value set for
superuser_reserved_connections might find that no reserved connections
remained for interactive access to the database.)

> Since 3. is confusing for me, I like 1. or 2.

What do others think?

...Robert

Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Apr 20, 2010 at 9:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> 3. Your proposal
>> � �Treat superuser replication connection like non-superuser one

> Well, only for this one very specific purpose.  I would adjust the
> docs like this:

> Determines the number of connection "slots" that are reserved for
> connections by PostgreSQL  superusers. At most max_connections
> connections can ever be active simultaneously. Whenever the number of
> active concurrent connections is at least max_connections minus
> superuser_reserved_connections, new connections will be accepted only
> for superusers, and no new replication connections will be accepted.

> I think that's pretty simple and clear.

+1.  I'm not sure about the consequences of converting walsender to
non-superuser across the board, and would prefer not to experiment
with it at this stage of the release cycle.

            regards, tom lane

Re: [DOCS] Streaming replication document improvements

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Fujii Masao <masao.fujii@gmail.com> wrote:
>>> 3. Your proposal
>>>    Treat superuser replication connection like non-superuser one
>
>> Well, only for this one very specific purpose.  I would adjust
>> the docs like this:
>
>> Determines the number of connection "slots" that are reserved for
>> connections by PostgreSQL  superusers. At most max_connections
>> connections can ever be active simultaneously. Whenever the
>> number of active concurrent connections is at least
>> max_connections minus superuser_reserved_connections, new
>> connections will be accepted only for superusers, and no new
>> replication connections will be accepted.
>
>> I think that's pretty simple and clear.
>
> +1.  I'm not sure about the consequences of converting walsender
> to non-superuser across the board, and would prefer not to
> experiment with it at this stage of the release cycle.

+1

-Kevin

Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Current logic says we hit the connection limit if:

>         if (!am_superuser &&
>                 ReservedBackends > 0 &&
>                 !HaveNFreeProcs(ReservedBackends))

> Couldn't we just change this to:

>         if ((!am_superuser || am_walsender) &&
>                 ReservedBackends > 0 &&
>                 !HaveNFreeProcs(ReservedBackends))

As of the patch I just committed, that code is not reached anymore by a
walsender process.  However, it shouldn't be hard to put a similar test
into the walsender code path.

            regards, tom lane

Re: [DOCS] Streaming replication document improvements

From
Fujii Masao
Date:
On Tue, Apr 20, 2010 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 3. Your proposal
>>    Treat superuser replication connection like non-superuser one
>
> Well, only for this one very specific purpose.  I would adjust the
> docs like this:
>
> Determines the number of connection "slots" that are reserved for
> connections by PostgreSQL  superusers. At most max_connections
> connections can ever be active simultaneously. Whenever the number of
> active concurrent connections is at least max_connections minus
> superuser_reserved_connections, new connections will be accepted only
> for superusers, and no new replication connections will be accepted.
>
> I think that's pretty simple and clear.

Yeah, I'm sold.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Tue, Apr 20, 2010 at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Current logic says we hit the connection limit if:
>
>>         if (!am_superuser &&
>>                 ReservedBackends > 0 &&
>>                 !HaveNFreeProcs(ReservedBackends))
>
>> Couldn't we just change this to:
>
>>         if ((!am_superuser || am_walsender) &&
>>                 ReservedBackends > 0 &&
>>                 !HaveNFreeProcs(ReservedBackends))
>
> As of the patch I just committed, that code is not reached anymore by a
> walsender process.  However, it shouldn't be hard to put a similar test
> into the walsender code path.

Thanks for the heads up.  It doesn't look hard to put a similar test
in the walsender code path, but is there any reason to duplicate the
code?  Seems like we might be able to just put this test (with the
necessary modification) right before this comment:

    /*
     * If walsender, we're done here --- we don't want to connect to any
     * particular database.
     */

In fact, in some ways, it seems better to put it up there.  If the
database is really being flooded with connection attempts, we want to
ephemerally consume a backend slot for as little time as possible...

...Robert

Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Thanks for the heads up.  It doesn't look hard to put a similar test
> in the walsender code path, but is there any reason to duplicate the
> code?  Seems like we might be able to just put this test (with the
> necessary modification) right before this comment:

Hm, actually I think you're right: we could move both of those
connection-rejecting tests up to before the walsender exit.  The
only extra state we need is ReservedBackends, which should be valid
at that point (in particular, it can't be affected by any process-local
GUC settings).

+1 for just moving the test.

            regards, tom lane

Re: [DOCS] Streaming replication document improvements

From
Robert Haas
Date:
On Wed, Apr 21, 2010 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Thanks for the heads up.  It doesn't look hard to put a similar test
>> in the walsender code path, but is there any reason to duplicate the
>> code?  Seems like we might be able to just put this test (with the
>> necessary modification) right before this comment:
>
> Hm, actually I think you're right: we could move both of those
> connection-rejecting tests up to before the walsender exit.  The

I am guessing that by "both of those connection-rejecting tests", you
mean the one that can throw "must be superuser to connect during
database shutdown" as well as the one we were discussing, which can
throw "connection limit exceeded for non-superusers", in which case...

> only extra state we need is ReservedBackends, which should be valid
> at that point (in particular, it can't be affected by any process-local
> GUC settings).
>
> +1 for just moving the test.

...shouldn't we move the "tests", plural, rather than just the one?
It seems right to reject new SR connections during shutdown.

...Robert

Re: [DOCS] Streaming replication document improvements

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ...shouldn't we move the "tests", plural, rather than just the one?
> It seems right to reject new SR connections during shutdown.

Yeah; you'd also need to adjust both of them to consider am_walsender.
(IOW, we want to treat SR connections as non-superuser for both tests.)

            regards, tom lane