Re: Sync Rep v19 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Sync Rep v19
Date
Msg-id AANLkTiko6-COABo+oVnRJ+t6Vh99FvYAM3Seu30=tnef@mail.gmail.com
Whole thread Raw
In response to Re: Sync Rep v19  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Sync Rep v19
Re: Sync Rep v19
List pgsql-hackers
On Mon, Mar 7, 2011 at 4:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mar 6, 2011, at 9:44 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sun, Mar 6, 2011 at 5:02 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>>> On Sun, Mar 6, 2011 at 8:58 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>
>>>> If unfortunately all connection slots are used by backends waiting for
>>>> replication, we cannot execute such a function. So it makes more sense
>>>> to introduce something like "pg_ctl standalone" command?
>>>
>>> If it is only for shutdown, maybe pg_ctl stop -m standalone?
>>
>> It's for not only shutdown but also running the primary in standalone mode.
>> So something like "pg_ctl standalone" is better.
>>
>> For now I think that pg_ctl command is better than built-in function because
>> sometimes we might want to wake waiters up even during shutdown in
>> order to cause shutdown to end. During shutdown, the server doesn't
>> accept any new connection (even from the standby). So, without something
>> like "pg_ctl standalone", there is no way to cause shutdown to end.
>
> This sounds like an awful hack to work around a bad design. Surely once shutdown reaches a point where new
replicationconnections can no longer be accepted, any standbys hung on commit need to close the connection without
respondingto the COMMIT, per previous discussion.  It's completely unreasonable for sync rep to break the shutdown
sequence.

Yeah, let's think about how shutdown should work. I'd like to propose the
following. Thought?

* Smart shutdown
Smart shutdown should wait for all the waiting backends to be acked, and
should not cause them to forcibly exit. But this leads shutdown to get stuck
infinitely if there is no walsender at that time. To enable them to be acked
even in that situation, we need to change postmaster so that it accepts the
replication connection even during smart shutdown (until we reach
PM_SHUTDOWN_2 state). Postmaster has already accepted the superuser
connection to cancel backup during smart shutdown. So I don't think that
the idea to accept the replication connection during smart shutdown is so
ugly.

* Fast shutdown
I agree with you about fast shutdown. Fast shutdown should cause all the
backends including waiting ones to exit immediately. At that time, the
non-acked backend should not return the success, according to the
definition of sync rep. So we need to change a backend so that it gets rid
of itself from the waiting queue and exits before returning the success,
when it receives SIGTERM. This change leads the waiting backends to
do the same even when pg_terminate_backend is called. But since
they've not been acked yet, it seems to be reasonable to prevent them
from returning the COMMIT.

Comments? I'll create the patch barring objection.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,
Next
From: Robert Haas
Date:
Subject: Re: Parallel make problem with git master