Re: .ready and .done files considered harmful - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: .ready and .done files considered harmful |
Date | |
Msg-id | BA908168-9407-4706-BE22-FCE8A1F33562@amazon.com Whole thread Raw |
In response to | Re: .ready and .done files considered harmful (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: .ready and .done files considered harmful
|
List | pgsql-hackers |
On 9/13/21, 1:14 PM, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Thu, Sep 2, 2021 at 5:52 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist >> yet. The following directory scan ends up finding 11.ready. Just >> before we update the PgArch state, XLogArchiveNotify() is called and >> creates 10.ready. However, pg_readyXlog() has already decided to >> return WAL segment 11 and update the state to look for 12 next. If we >> just used '<', we won't force a directory scan, and segment 10 will >> not be archived until the next one happens. If we use '<=', I don't >> think we have the same problem. > > The latest post on this thread contained a link to this one, and it > made me want to rewind to this point in the discussion. Suppose we > have the following alternative scenario: > > Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist > yet. The following directory scan ends up finding 12.ready. Just > before we update the PgArch state, XLogArchiveNotify() is called and > creates 11.ready. However, pg_readyXlog() has already decided to > return WAL segment 12 and update the state to look for 13 next. > > Now, if I'm not mistaken, using <= doesn't help at all. I think this is the scenario I was trying to touch on in the paragraph immediately following the one you mentioned. My theory was that we'll still skip forcing a directory scan until 10.ready is created, so it would eventually work out as long as we can safely assume that all .ready files that should be created eventually will be. Thinking further, I don't think that's right. We might've already renamed 10.ready to 10.done and removed it long ago, so there's a chance that we wouldn't go back and pick up 11.ready until one of our "fallback" directory scans forced by the checkpointer. So, yes, I think you are right. > In my opinion, the problem here is that the natural way to ask "is > this file being archived out of order?" is to ask yourself "is the > file that I'm marking as ready for archiving now the one that > immediately follows the last one I marked as ready for archiving?" and > then invert the result. That is, if I last marked 10 as ready, and now > I'm marking 11 as ready, then it's in order, but if I'm now marking > anything else whatsoever, then it's out of order. But that's not what > this does. Instead of comparing what it's doing now to what it did > last, it compares what it did now to what the archiver did last. > > And it's really not obvious that that's correct. I think that the > above argument actually demonstrates a flaw in the logic, but even if > not, or even if it's too small a flaw to be a problem in practice, it > seems a lot harder to reason about. I certainly agree that it's harder to reason about. If we were to go the keep-trying-the-next-file route, we could probably minimize a lot of the handling for these rare cases by banking on the "fallback" directory scans. Provided we believe these situations are extremely rare, some extra delay for an archive every once in a while might be acceptable. Nathan
pgsql-hackers by date: