Re: 9.2.3 crashes during archive recovery - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: 9.2.3 crashes during archive recovery
Date
Msg-id CAHGQGwHHci4daMLxJqoxgcJxyo8ZeH3hmQ3kJHaB2r5FCPaUSw@mail.gmail.com
Whole thread Raw
In response to Re: 9.2.3 crashes during archive recovery  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: 9.2.3 crashes during archive recovery
List pgsql-hackers
On Thu, Feb 14, 2013 at 5:15 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 13.02.2013 17:02, Tom Lane wrote:
>>
>> Heikki Linnakangas<hlinnakangas@vmware.com>  writes:
>>>
>>> At least in back-branches, I'd call this a pilot error. You can't turn a
>>> master into a standby just by creating a recovery.conf file. At least
>>> not if the master was not shut down cleanly first.
>>> ...
>>> I'm not sure that's worth the trouble, though. Perhaps it would be
>>> better to just throw an error if the control file state is
>>> DB_IN_PRODUCTION and a recovery.conf file exists.
>>
>>
>> +1 for that approach, at least until it's clear there's a market for
>> doing this sort of thing.  I think the error check could be
>> back-patched, too.
>
>
> Hmm, I just realized a little problem with that approach. If you take a base
> backup using an atomic filesystem backup from a running server, and start
> archive recovery from that, that's essentially the same thing as Kyotaro's
> test case.

Yes. And the resource agent for streaming replication in Pacemaker (it's the
OSS clusterware) is the user of that archive recovery scenario, too. When it
starts up the server, it always creates the recovery.conf and starts the server
as the standby. It cannot start the master directly, IOW the server is always
promoted to the master from the standby. So when it starts up the server
after the server crashes, obviously it executes the same recovery scenario
(i.e., force archive recovery instead of crash one) as Kyotaro described.

The reason why that resource agent cannot start up the master directly is
that it manages three server states, called Master, Slave and Down. It can
move the server state from Down to Slave, and the reverse direction.
Also it can move the state from Slave to Master, and the reverse direction.
But there is no way to move the state between Down and Master directly.
This kind of the state transition model is isolated case in
clusterware, I think.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Jonathan Rogers
Date:
Subject: Re: [RFC] ideas for a new Python DBAPI driver (was Re: libpq test suite)
Next
From: Fujii Masao
Date:
Subject: Re: 9.2.3 crashes during archive recovery