Re: Possible missing segments in archiving on standby - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Possible missing segments in archiving on standby
Date
Msg-id 20210908.104526.1076005617766473945.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Possible missing segments in archiving on standby  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Possible missing segments in archiving on standby
List pgsql-hackers
At Tue, 7 Sep 2021 17:03:06 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > + if (XLogArchivingAlways())
> > + XLogArchiveNotify(xlogfilename, true);
> > +                        else
> > + XLogArchiveForceDone(xlogfilename);
> > The path is used both for crash and archive recovery. If we pass there
> > while crash recovery on a primary with archive_mode=on, the file could
> > be marked .done before actually archived. On the other hand when
> > archive_mode=always, the file could be re-marked .ready even after it
> > has been already archived.  Why isn't it XLogArchiveCheckDone?
> 
> Yeah, you're right. ISTM what we should do is to just call
> XLogArchiveCheckDone() for the segment including XLOG_SWITCH record,
> i.e., to create .ready file if the segment has no archive notification
> file yet
> and archive_mode is set to 'always'. Even if we don't do this when we
> reach
> XLOG_SWITCH record, subsequent restartpoints eventually will call
> XLogArchiveCheckDone() for such segments.
> 
> One issue of this approach is that the WAL segment including
> XLOG_SWITCH
> record may be archived before its previous segments are. That is,
> the notification file of current segment is created when it's replayed
> because it includes XLOG_SWIATCH, but the notification files of
> its previous segments will be created by subsequent restartpoints
> because they don't have XLOG_SWITCH. Probably we should avoid this?

Anyway there's no guarantee on the archive ordering. As discussed in
the nearby thread [1], newer segment is often archived earlier. I
agree that that happens mainly on busy servers, though. The archiver
is designed to handle such "gaps" and/or out-of-order segment
notifications.  We could impose a strict ordering on archiving but I
think we would take total performance than such strictness.

In short, no.

> If yes, one approach for this issue is to call XLogArchiveCheckDone()
> for
> not only the segment including XLOG_SWITCH but also all the segments
> older than that. Thought?

At least currently, recovery fetches segments by a single process and
every file is marked immediately after being filled-up, so all files
other than the latest one in pg_wal including history files should
have been marked for sure unless file system gets into a trouble. So I
think we don't need to do that even if we want the strictness.

Addition to that that takes too long time when many segments reside in
pg_wal so I think we never want to run such a scan at every segment
end that recovery passes.  If I remember correctly, the reason we
don't fix archive status at start up but at checkpoint is we avoided
extra startup time.

> Anyway, I extracted the changes in walreceiver from the patch,
> because it's self-contained and can be applied separately.
> Patch attached.

I'm not sure I like that XLogWalRcvClose hides the segment-switch
condition.  If we do that check in the function, I'd like to name the
function XLogWalRcvCloseIfSwitched or something indicates the
condition.  I'd like to invert the condition to reduce indentation,
too.

Why don't we call it just after writing data, as my first proposal
did? There's no difference in functionality between doing that and the
patch.  If we do so, recvFile>=0 is always true and that condition can
be removed but that would be optional.  Anyway, by doing that, no
longer need to call the function twice or we can even live without the
new function.

[1] https://www.postgresql.org/message-id/20210504042755.ehuaoz5blcnjw5yk%40alap3.anarazel.de

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: The Free Space Map: Problems and Opportunities
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Remove Value node struct