Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Date
Msg-id CAB7nPqT81wuVzo1-rOFXTcagLviyX_rhqDPb1XkhfnQ9tTiPRA@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby  (Andres Freund <andres@anarazel.de>)
Responses Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Wed, Nov 4, 2015 at 7:33 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-11-04 16:01:28 +0900, Michael Paquier wrote:
>> On Wed, Nov 4, 2015 at 8:39 AM, Andres Freund <andres@anarazel.de> wrote:
>> > On November 4, 2015 12:37:02 AM GMT+01:00, Michael Paquier wrote:
>> >>On a completely idle system, I don't think we should log any standby
>> >>records. This is what ~9.3 does.
>> >
>> > Are you sure? I think it'll around checkpoints, no? I thought Heikki had fixed that, but looking sound that
doesn'tseem to be the case.
 
>>
>> Er, yes, sorry. I should have used clearer words: I meant idle system
>> with something running nothing including internal checkpoints.
>
> Uh, but you'll always have checkpoints happen on wal_level =
> hot_standby, even in 9.3?  Maybe I'm not parsing your sentence right.

Reading again my previous sentence I cannot get the meaning of it
myself :) Well, I just meant that in ~9.3 LogStandbySnapshot() is
called at each checkpoint, checkpoints occurring after
checkpoint_timeout even if the system is idle.

> As soon as a single checkpoint ever happened the early-return logic in
> CreateCheckPoint() will fail to take the LogStandbySnapshot() in
> CreateCheckPoint() into account. The test is:
>     if (curInsert == ControlFile->checkPoint +
>         MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
>         ControlFile->checkPoint == ControlFile->checkPointCopy.redo)
> which obviously doesn't work if there's been a WAL record logged after
> the redo pointer has been determined etc.

Yes. If segment switches are enforced at a pace faster than
checkpoint_timeout, this check considers that a checkpoint needs to
happen because a SWITCH_XLOG record is in-between. I am a bit
surprised that this should happen actually. The segment switch
triggers a checkpoint record, and vice-versa, even for idle systems.
Shouldn't we make this check a bit smarter then?

> The reason that a single checkpoint is needed to "jumpstart" the
> pointless checkpoints is that otherwise we'll never have issued a
> LogStandbySnapshot() and thus the above code block works if we started
> from a proper shutdown checkpoint.
>
> Independent of the idle issue, it seems to me that the location of the
> LogStandbySnapshot() is actually rather suboptimal - it really should
> really be before the CheckPointGuts(), not afterwards. As closer it's to
> the redo pointer of the checkpoint a hot standby node starts up from,
> the sooner that node can reach consistency.  There's no difference for
> the first time a node starts from a basebackup (since we gotta replay
> that checkpoint anyway before we're consistent), but if we start from a
> restartpoint...

Agreed. LogStandbySnapshot() is called after CheckPointGuts() since
its introduction in efc16ea5. This may save time. This would surely be
a master-only optimization though.
-- 
Michael



pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: Parallel Seq Scan
Next
From: Catalin Iacob
Date:
Subject: Re: proposal: PL/Pythonu - function ereport