Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: .ready and .done files considered harmful
Date
Msg-id BA908168-9407-4706-BE22-FCE8A1F33562@amazon.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: .ready and .done files considered harmful
List pgsql-hackers
On 9/13/21, 1:14 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Thu, Sep 2, 2021 at 5:52 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
>> yet.  The following directory scan ends up finding 11.ready.  Just
>> before we update the PgArch state, XLogArchiveNotify() is called and
>> creates 10.ready.  However, pg_readyXlog() has already decided to
>> return WAL segment 11 and update the state to look for 12 next.  If we
>> just used '<', we won't force a directory scan, and segment 10 will
>> not be archived until the next one happens.  If we use '<=', I don't
>> think we have the same problem.
>
> The latest post on this thread contained a link to this one, and it
> made me want to rewind to this point in the discussion. Suppose we
> have the following alternative scenario:
>
> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
> yet.  The following directory scan ends up finding 12.ready.  Just
> before we update the PgArch state, XLogArchiveNotify() is called and
> creates 11.ready.  However, pg_readyXlog() has already decided to
> return WAL segment 12 and update the state to look for 13 next.
>
> Now, if I'm not mistaken, using <= doesn't help at all.

I think this is the scenario I was trying to touch on in the paragraph
immediately following the one you mentioned.  My theory was that we'll
still skip forcing a directory scan until 10.ready is created, so it
would eventually work out as long as we can safely assume that all
.ready files that should be created eventually will be.  Thinking
further, I don't think that's right.  We might've already renamed
10.ready to 10.done and removed it long ago, so there's a chance that
we wouldn't go back and pick up 11.ready until one of our "fallback"
directory scans forced by the checkpointer.  So, yes, I think you are
right.

> In my opinion, the problem here is that the natural way to ask "is
> this file being archived out of order?" is to ask yourself "is the
> file that I'm marking as ready for archiving now the one that
> immediately follows the last one I marked as ready for archiving?" and
> then invert the result. That is, if I last marked 10 as ready, and now
> I'm marking 11 as ready, then it's in order, but if I'm now marking
> anything else whatsoever, then it's out of order. But that's not what
> this does. Instead of comparing what it's doing now to what it did
> last, it compares what it did now to what the archiver did last.
>
> And it's really not obvious that that's correct. I think that the
> above argument actually demonstrates a flaw in the logic, but even if
> not, or even if it's too small a flaw to be a problem in practice, it
> seems a lot harder to reason about.

I certainly agree that it's harder to reason about.  If we were to go
the keep-trying-the-next-file route, we could probably minimize a lot
of the handling for these rare cases by banking on the "fallback"
directory scans.  Provided we believe these situations are extremely
rare, some extra delay for an archive every once in a while might be
acceptable.

Nathan


pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Next
From: Chris Cleveland
Date:
Subject: Re: 64 bit TID?