On Thu, 2010-04-01 at 06:48 -0400, Robert Haas wrote:
> On Thu, Apr 1, 2010 at 4:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Thu, Apr 1, 2010 at 12:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Wed, Mar 31, 2010 at 5:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>>> > >From what I have seen, the comment about PM_WAIT_BACKENDS is incorrect.
> >>>> > "backends might be waiting for the WAL record that conflicts with their
> >>>> > queries to be replayed". Recovery sometimes waits for backends, but
> >>>> > backends never wait for recovery.
> >>>>
> >>>> Really? As Heikki explained before, backends might wait for the lock
> >>>> taken by the startup process.
> >>>> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02984.php
> >>>
> >>> Backends wait for locks, yes, but they could be waiting for user locks
> >>> also. That is not "waiting for the WAL record", that concept does not
> >>> exist.
> >>
> >> Hmm... this is a good point, on two levels. First, the comment is not
> >> as well-phrased as it could be. Second, I wonder why we can't kill
> >> the startup process and WAL receiver right away, and then wait for the
> >> backends to die off afterwards.
> >
> > I tested whether killing the startup process and walreceiver releases
> > the lock which the backends are waiting for. Unfortunately it doesn't,
> > and the backends have gotten stuck in my box. The behavior which the
> > startup process shuts down without releasing the lock is a bug?
>
> I think that what this shows is that the original design of Hot
> Standby didn't contemplate ever having Hot Standby up without the
> startup process running. In retrospect, maybe we want to allow that,
> because a smart shutdown would be more likely to complete in a timely
> fashion if we stopped replication first and then waited for the
> backends to die rather than waiting for the backends to die first and
> then stopping replication. That's because, for so long as replication
> continues, it may take new locks as well as releasing old ones, to say
> nothing of using other system resources like CPU and I/O bandwidth.
> But, for 9.0, I'm not sure we have any real choice, unless making the
> startup process release locks when it goes away is a very simple
> change. Assuming that's not the case, I think we should apply this
> patch with some updates to the comments, document how it works and
> that it may change in a future release, and add a TODO for 9.1.
I'm not willing to investigate this further myself at this stage. This
looks like risk for little benefit.
-- Simon Riggs www.2ndQuadrant.com