Home > mailing lists

Re: Skip checkpoint on promoting from streaming replication - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Skip checkpoint on promoting from streaming replication
Date	January 24, 2013 20:44:34
Msg-id	CA+U5nMLfun0KcusiEn7iB1o-HYcGQ8B0VG0HE7U9byJtf9r6Sg@mail.gmail.com Whole thread Raw
In response to	Re: Skip checkpoint on promoting from streaming replication (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: Skip checkpoint on promoting from streaming replication Re: Skip checkpoint on promoting from streaming replication
List	pgsql-hackers

Tree view

On 24 January 2013 16:52, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 24.01.2013 18:24, Simon Riggs wrote:
>>
>> On 6 January 2013 21:58, Simon Riggs<simon@2ndquadrant.com>  wrote:
>>>
>>> I've been torn between the need to remove the checkpoint for speed and
>>>
>>> being worried about the implications of doing so.
>>>
>>> We promote in multiple use cases. When we end a PITR, or are
>>> performing a switchover, it doesn't really matter how long the
>>> shutdown checkpoint takes, so I'm inclined to leave it there in those
>>> cases. For failover, we need fast promotion.
>>>
>>> So my thinking is to make   pg_ctl promote -m fast
>>> be the way to initiate a fast failover that skips the shutdown
>>> checkpoint.
>>>
>>> That way all existing applications work the same as before, while new
>>> users that explicitly choose to do so will gain from the new option.
>>
>>
>> Here's a patch to skip checkpoint when we do
>>
>>    pg_ctl promote -m fast
>>
>> We keep the end of recovery checkpoint in all other cases.
>
>
> Hmm, there seems to be no way to do a "fast" promotion with a trigger file.

True. I thought we were moving away from trigger files to use of "promote"

> I'm a bit confused why there needs to be special mode for this. Can't we
> just always do the "fast" promotion? I agree that there's no urgency when
> you're doing PITR, but shouldn't do any harm either. Or perhaps always do
> "fast" promotion when starting up from standby mode, and "slow" otherwise.
>
> Are we comfortable enough with this to skip the checkpoint after crash
> recovery?

I'm not. Maybe if we get no bugs we can make it do this always, in next release.

It;s fast when it needs to be and safe otherwise.


> I may be missing something, but it looks like after a "fast" promotion, you
> don't request a new checkpoint. So it can take quite a while for the next
> checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be
> a problem, but I feel that it'd be prudent to request a new checkpoint
> immediately (not necessarily an "immediate" checkpoint, though).

I thought of that and there is a long comment to explain why I didn't.

Two problems:

1) an immediate checkpoint can cause a disk/resource usage spike,
which is definitely not what you need just when a spike of connections
and new SQL hits the system.

2) If we did that, we would have an EndOfRecovery record, some other
records and then a Shutdown checkpoint.
As I right this, (2) is wrong, because we shouldn't do a a Shutdown
checkpoint anyway.

But I still think (1) is a valid concern.

>> The only thing left from Kyotaro's patch is a single line of code -
>> the call to ReadCheckpointRecord() that checks to see if the WAL
>> records for the last two restartpoints is on disk, which was an
>> important line of code.
>
>
> Why's that important, just for paranoia? If the last two restartpoints have
> disappeared, something's seriously wrong, and you will be in trouble e.g if
> you crash at that point. Do we need to be extra paranoid when doing a "fast"
> promotion?

The check is cheap, so what do we gain by skipping the check?

>> Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
>> on replay like a shutdown checkpoint record. I put this back in from
>> my patch because I believe its important that we have a clear place
>> where the WAL history changes timelineId. WAL format change bump
>> required.
>
>
> Agreed, such a WAL record is essential.
>
> At replay, an end-of-recovery record should be a signal to the hot standby
> mechanism that there are no transactions running in the master at that
> point, same as a shutdown checkpoint.

I had a reason why I didn't do that, but it seems to have slipped my mind.

If I can't remember, I'll add it.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Andres Freund
Date: 24 January 2013, 20:39:38
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

From: David Fetter
Date: 24 January 2013, 20:51:51
Subject: LATERAL, UNNEST and spec compliance

Re: Skip checkpoint on promoting from streaming replication - Mailing list pgsql-hackers

Previous

Next