Re: Patch for reserved connections for replication users - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Patch for reserved connections for replication users
Date
Msg-id CAA4eK1LxK75xhMVfEPwaKTb+J2C15FxYf1rNry=8-CA8zMvtsA@mail.gmail.com
Whole thread Raw
In response to Re: Patch for reserved connections for replication users  (Gibheer <gibheer@zero-knowledge.org>)
Responses Re: Patch for reserved connections for replication users  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Oct 16, 2013 at 4:30 AM, Gibheer <gibheer@zero-knowledge.org> wrote:
> On Mon, 14 Oct 2013 11:52:57 +0530
> Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>> On Sun, Oct 13, 2013 at 2:08 PM, Gibheer <gibheer@zero-knowledge.org>
>> wrote:
>> > On Sun, 13 Oct 2013 11:38:17 +0530
>> > Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> >> On Thu, Oct 10, 2013 at 3:17 AM, Gibheer
>> >> <gibheer@zero-knowledge.org> wrote:
>> >> > On Mon, 7 Oct 2013 11:39:55 +0530
>> >> > Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >> >> Robert Haas wrote:
>> >> >> On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
>> >> >> <andres(at)2ndquadrant(dot)com> wrote:
>> >> >> >>> Hmm.  It seems like this match is making MaxConnections no
>> >> >> >>> longer mean the maximum number of connections, but rather
>> >> >> >>> the maximum number of non-replication connections.  I don't
>> >> >> >>> think I support that definitional change, and I'm kinda
>> >> >> >>> surprised if this is sufficient to implement it anyway
>> >> >> >>> (e.g. see InitProcGlobal()).
>> >> >> >
>> >> >> >> I don't think the implementation is correct, but why don't
>> >> >> >> you like the definitional change? The set of things you can
>> >> >> >> do from replication connections are completely different
>> >> >> >> from a normal connection. So using separate "pools" for them
>> >> >> >> seems to make sense. That they end up allocating similar
>> >> >> >> internal data seems to be an implementation detail to me.
>> >> >>
>> >> >> > Because replication connections are still "connections".  If I
>> >> >> > tell the system I want to allow 100 connections to the server,
>> >> >> > it should allow 100 connections, not 110 or 95 or any other
>> >> >> > number.
>> >> >>
>> >> >> I think that to reserve connections for replication, mechanism
>> >> >> similar to superuser_reserved_connections be used rather than
>> >> >> auto vacuum workers or background workers.
>> >> >> This won't change the definition of MaxConnections. Another
>> >> >> thing is that rather than introducing new parameter for
>> >> >> replication reserved connections, it is better to use
>> >> >> max_wal_senders as it can serve the purpose.
>> >> >>
>> >> >> Review for replication_reserved_connections-v2.patch,
>> >> >> considering we are going to use mechanism similar to
>> >> >> superuser_reserved_connections and won't allow definition of
>> >> >> MaxConnections to change.
>> >> >>
>> >>
>> >> >
>> >> > Hi,
>> >> >
>> >> > I took the time and reworked the patch with the feedback till
>> >> > now. Thank you very much Amit!
>> >> >
>> >> > So this patch uses max_wal_senders together with the idea of the
>> >> > first patch I sent. The error messages are also adjusted to make
>> >> > it obvious, how it is supposed to be and all checks work, as far
>> >> > as I could tell.
>> >>
>> >> If I understand correctly, now the patch has implementation such
>> >> that a. if the number of connections left are (ReservedBackends +
>> >> max_wal_senders), then only superusers or replication connection's
>> >> will be allowed
>> >> b. if the number of connections left are ReservedBackend, then only
>> >> superuser connections will be allowed.
>> >
>> > That is correct.
>> >
>> >> So it will ensure that max_wal_senders is used for reserving
>> >> connection slots from being used by non-super user connections. I
>> >> find new usage of max_wal_senders acceptable, if anyone else thinks
>> >> otherwise, please let us know.
>> >>
>> >>
>> >> 1.
>> >> +        <varname>superuser_reserved_connections</varname>
>> >> +        <varname>max_wal_senders</varname> only superuser and WAL
>> >> connections
>> >> +        are allowed.
>> >>
>> >> Here minus seems to be missing before max_wal_senders and I think
>> >> it will be better to use replication connections rather than WAL
>> >> connections.
>> >
>> > This is fixed.
>> >
>> >> 2.
>> >> -        new replication connections will be accepted.
>> >> +        new WAL or other connections will be accepted.
>> >>
>> >> I think as per new implementation, we don't need to change this
>> >> line.
>> >
>> > I reverted that change.
>> >
>> >> 3.
>> >> + * reserved slots from max_connections for wal senders. If the
>> >> number of free
>> >> + * slots (max_connections - max_wal_senders) is depleted.
>> >>
>> >>  Above calculation (max_connections - max_wal_senders) needs to
>> >> include super user reserved connections.
>> >
>> > My first thought was, that I would not add it here. When superuser
>> > reserved connections are not set, then only max_wal_senders would
>> > count.
>> > But you are right, it has to be set, as 3 connections are reserved
>> > by default for superusers.
>>
>> + * slots (max_connections - superuser_reserved_connections -
>> max_wal_senders) here it should be ReservedBackends rather than
>> superuser_reserved_connections.
>
> fixed
>
>> >> 4.
>> >> + /*
>> >> + * Although replication connections currently require superuser
>> >> privileges, we
>> >> + * don't allow them to consume the superuser reserved slots, which
>> >> are
>> >> + * intended for interactive use.
>> >>   */
>> >>   if ((!am_superuser || am_walsender) &&
>> >>   ReservedBackends > 0 &&
>> >>   !HaveNFreeProcs(ReservedBackends))
>> >>   ereport(FATAL,
>> >>   (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
>> >> - errmsg("remaining connection slots are reserved for
>> >> non-replication superuser connections")));
>> >> + errmsg("remaining connection slots are reserved for superuser
>> >> connections")));
>> >>
>> >> Will there be any problem if we do the above check before the check
>> >> for wal senders and reserved replication connections (+
>> >> !HaveNFreeProcs(max_wal_senders + ReservedBackends))) and don't
>> >> change the error message in this check. I think this will ensure
>> >> that users who doesn't enable max_wal_senders will see the same
>> >> error message as before and the purpose to reserve connections for
>> >> replication can be served by your second check.
>> >
>> > I have attached two patches, one that checks only max_wal_senders
>> > first and the other checks reserved_backends first. Both return the
>> > original message, when max_wal_sender is not set and I think it is
>> > only a matter of taste, which comes first.
>> > Me, I would prefer max_wal_senders to get from more connections to
>> > less.
>>
>> I think there is no major problem even if we keep max_wal_senders
>> check first, but there could be error message inconsistency. Please
>> consider below scenario:
>> max_connections=5
>> superuser_reserved_connections=3
>> max_wal_senders = 2
>>
>> 1. Start primary database server M1
>> 2. Start first standby S1
>> 3. Start second standby S2
>> 4. Now try to connect by non-super user to M1 -- here it will return
>> error msg as "psql: FATAL:  remaining connection slots are reserved
>> for replication
>>     and superuser connections"
>>     By above error message, it seems that replication connection is
>> allowed, but actually it will give error for new replication
>> connection, see next step
>> 5. Start third standby S3 -- here an appropriate error message "FATAL:
>>  could not connect to the primary server: FATAL:  remaining connection
>>     slots are reserved for non-replication superuser connections"
>>
>> I feel there is minor inconsistency in step-4 if we use
>> max_wal_senders check before ReservedBackends check.
>
> Yes, that is rather annoying. I fixed that with the other hint, that
> there should be a check for ReservedBackends + max_wal_senders <
> max_connections to have at least one free connection.
>
>> >> 5.
>> >> + gettext_noop("Sets the maximum number wal sender connections and
>> >> reserves them."),
>> >>
>> >> Sets the maximum number 'of' wal sender connections and reserves
>> >> them. a. 'of' is missing in above line.
>> >> b. I think 'reserves them' is not completely right, because super
>> >> user connections will be allowed. How about if we add something
>> >> similar to 'and reserves them from being used by non-superuser
>> >> connections' in above line.
>> >
>> > This is fixed.
>> + gettext_noop("Sets the maximum number of wal sender connections and
>> reserves "
>> + "them from being used non-superuser connections."),
>>
>> "them from being used 'by' non-superuser connections."
>> 'by' is missing in above line.
>>
>> 1.
>> if (ReservedBackends >= MaxConnections)
>> {
>> write_stderr("%s: superuser_reserved_connections must be less than
>> max_connections\n", progname);
>> ExitPostmaster(1);
>> }
>>
>> I think we should have one check similar to above for (max_wal_senders
>> + ReservedBackends >= MaxConnections) to avoid server starting
>> with values, where not even 1 non-superuser connection will be
>> allowed. I think this is a reasoning similar to why the check for
>> ReservedBackends exists.
>
> I have added this, see comment above.
>
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
> I would be glad, if you could also test the patch again, as I'm nearly
> code blind after testing it for 4 hours.
> I had the problem, that I could not attach as many replication
> connections as I wanted, as they were denied as normal connections. I
> think I got it fixed, but I'm not 100% sure at the moment.
> After some sleep, I will read the code again and test it again, to make
> sure, it really does what it is supposed to do.

You have forgotten to attach the patch. However, now it is important
to first get the consensus on approach to do this feature, currently
there are 3 approaches:
1. Have replication_reserved_connections as a separate parameter to
reserve connections for replication
2. Consider max_wal_sender to reserve connections for replication
3. Treat replication connections as a pool outside max_connections

Apart from above approaches, we need to think how user can view the
usage of connections, as pg_stat_activity doesn't show replication
connections, so its difficult for user to see how the connections are
used.

I am really not sure what is best way to goahead from here, but I
think it might be helpful if we can study some use cases or how other
databases solve this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: James Sewell
Date:
Subject: Re: [PATCH] Add an ldapoption to disable chasing LDAP referrals
Next
From: Stephen Frost
Date:
Subject: Re: FDW API / flow charts for the docs?