Re: .ready and .done files considered harmful - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: .ready and .done files considered harmful |
Date | |
Msg-id | D68A2BF6-3BCE-4122-9CA8-40486812AFAE@amazon.com Whole thread Raw |
In response to | Re: .ready and .done files considered harmful (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: .ready and .done files considered harmful
Re: .ready and .done files considered harmful |
List | pgsql-hackers |
On 8/17/21, 12:11 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 8/17/21, 11:28 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: >> I can't actually see that there's any kind of hard synchronization >> requirement here at all. What we're trying to do is guarantee that if >> the timeline changes, we'll pick up the timeline history for the new >> timeline next, and that if files are archived out of order, we'll >> switch to archiving the oldest file that is now present rather than >> continuing with consecutive files. But suppose we just use an >> unsynchronized bool. The worst case is that we'll archive one extra >> file proceeding in order before we jump to the file that we were >> supposed to archive next. It's not evident to me that this is all that >> bad. The same thing would have happened if the previous file had been >> archived slightly faster than it actually was, so that we began >> archiving the next file just before, rather than just after, the >> notification was sent. And if it is bad, wrapping an LWLock around the >> accesses to the flag variable, or using an atomic, does nothing to >> stop it. > > I am inclined to agree. The archiver only ever reads the flag and > sets it to false (if we are doing a directory scan). Others only ever > set the flag to true. The only case I can think of where we might > miss the timeline switch or out-of-order .ready file is when the > archiver sets the flag to false and then ReadDir() fails. However, > that seems to cause the archiver process to restart, and we always > start with a directory scan at first. Thinking further, I think the most important thing to ensure is that resetting the flag happens before we begin the directory scan. Consider the following scenario in which a timeline history file would potentially be lost: 1. Archiver completes directory scan. 2. A timeline history file is created and the flag is set. 3. Archiver resets the flag. I don't think there's any problem with the archiver reading a stale value for the flag. It should eventually be updated and route us to the directory scan code path. I'd also note that we're depending on the directory scan logic for picking up all timeline history files and out-of-order .ready files that may have been created each time the flag is set. AFAICT that is safe since we prioritize timeline history files and reset the archiver state anytime we do a directory scan. We'll first discover timeline history files via directory scans, and then we'll move on to .ready files, starting at the one with the lowest segment number. If a new timeline history file or out-of-order .ready file is created, the archiver is notified, and we start over. Nathan
pgsql-hackers by date: