Re: Streaming replication - unable to stop the standby - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Streaming replication - unable to stop the standby
Date
Msg-id AANLkTimRQAtM-EEnVYxTgBqQlgAYW74rRAX1PVyP00ko@mail.gmail.com
Whole thread Raw
In response to Re: Streaming replication - unable to stop the standby  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, May 3, 2010 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Hmm.  When I committed that patch to fix smart shutdown on the
>> standby, we discussed the fact that the startup process can't simply
>> release its locks and die at shutdown time because the locks it holds
>> prevent other backends from seeing the database in an inconsistent
>> state.  Therefore, if we were to terminate recovery as soon as the
>> smart shutdown request is received, we might never complete, because a
>> backend might be waiting on a lock that will never get released.  If
>> that's really a danger scenario, then it follows that we might also
>> fail to shut down if we can't connect to the primary, because we might
>> not be able to replay enough WAL to release the locks the remaining
>> backends are waiting for.  That sort of looks like what is happening
>> to you, except based on your test scenario I can't figure out where
>> this came from:
>
>> FATAL:  replication terminated by primary server
>
> I suspect you have it right, because my experiments where the standby
> did shut down correctly were all done with an idle master.
>
> Seems like we could go ahead and forcibly kill the startup process *once
> all the standby backends are gone*.  There is then no need to worry
> about not releasing locks, and re-establishing a consistent state when
> we later restart is logic that we have to have anyway.

That's exactly what we already do.  The problem is that smart shutdown
doesn't actually kill off the standby backends - it waits for them to
exit on their own.  Except, if they're blocking on a lock that's never
going to get released, then they never do.

...Robert


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Streaming replication - unable to stop the standby
Next
From: "Kevin Grittner"
Date:
Subject: Re: pg_start_backup and pg_stop_backup Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct