Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id F133B7B1-451E-41E9-9C07-B5019F641CBF@amazon.com
Whole thread Raw
In response to Re: archive status ".ready" files may be created too early  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: archive status ".ready" files may be created too early  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
On 12/23/19, 6:09 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> You're right. That doesn't seem to work. Another thing I had in my
> mind was that XLogWrite had an additional flag so that
> AdvanceXLInsertBuffer can tell not to mark .ready. The function is
> called while it *is* writing a complete record. So even if
> AdvanceXLInsertBuffer inhibit marking .ready the succeeding bytes
> comes soon and we can mark the old segment as .ready at the time.
>
> ..
> + * If record_write == false, we don't mark the last segment as .ready
> + * if the caller requested to write up to segment boundary.
> ..
>   static void
> - XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
> + XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool record_write)
>
> When XLogWrite is called with record_write = false, we don't mark
> .ready and don't advance lastSegSwitchTime/LSN. At the next time
> XLogWrite is called with record_write=true, if lastSegSwitchLSN is
> behind the latest segment boundary before or equal to
> LogwrtResult.Write, mark the skipped segments as .ready and update
> lastSegSwitchTime/LSN.

Thanks for the suggestion.  I explored this proposal a bit today.
It looks like there are three places where XLogWrite() is called:
AdvanceXLInsertBuffer(), XLogFlush(), and XLogBackgroundFlush().  IIUC
while XLogFlush() generally seems to be used to write complete records
to disk, this might not be true for XLogBackgroundFlush(), and we're
reasonably sure we cannot make such an assumption for
AdvanceXLInsertBuffer().  Therefore, we would likely only set
record_write to true for XLogFlush() and for certain calls to
XLogBackgroundFlush (e.g. flushing asynchronous commits).

I'm worried that this approach could be fragile and that we could end
up waiting an arbitrarily long time before marking segments as ready
for archival.  Even if we pay very close attention to the latest
flushed LSN, it seems possible that a non-record_write call to
XLogWrite() advances things such that we avoid ever calling it with
record_write = true.  For example, XLogBackgroundFlush() may have
flushed the completed blocks, which we cannot assume are complete
records.  Then, XLogFlush() would skip calling XLogWrite() if
LogwrtResult.Flush is sufficiently far ahead.  In this scenario, I
don't think we would mark any eligible segments as ".ready" until the
next call to XLogWrite() with record_write = true, which may never
happen.

The next approach I'm going to try is having the callers of
XLogWrite() manage marking segments ready.  That might make it easier
to mitigate some of my concerns above, but I'm not tremendously
optimistic that this approach will fare any better.

Nathan


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [PATCH] Increase the maximum value track_activity_query_size
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum