At Fri, 6 Aug 2021 02:34:24 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
> On 8/5/21, 6:26 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> > It works the current way always at the first iteration of
> > pgarch_ArchiveCopyLoop() becuse in the last iteration of
> > pgarch_ArchiveCopyLoop(), pgarch_readyXlog() erases the last
> > anticipated segment. The shortcut works only when
> > pgarch_ArchiveCopyLoop archives more than once successive segments at
> > once. If the anticipated next segment found to be missing a .ready
> > file while archiving multiple files, pgarch_readyXLog falls back to
> > the regular way.
> >
> > So I don't see the danger to happen perhaps you are considering.
>
> I think my concern is that there's no guarantee that we will ever do
> another directory scan. A server that's generating a lot of WAL could
> theoretically keep us in the next-anticipated-log code path
> indefinitely.
Theoretically possible. Supposing that .ready may be created
out-of-order (for the following reason, as a possibility), when once
the fast path bailed out then the fallback path finds that the second
oldest file has .ready, the succeeding fast path continues running
leaving the oldest file.
> > In the first place, .ready are added while holding WALWriteLock in
> > XLogWrite, and while removing old segments after a checkpoint (which
> > happens while recovery). Assuming that no one manually remove .ready
> > files on an active server, the former is the sole place doing that. So
> > I don't see a chance that .ready files are created out-of-order way.
>
> Perhaps a more convincing example is when XLogArchiveNotify() fails.
> AFAICT this can fail without ERROR-ing, in which case the server can
> continue writing WAL and creating .ready files for later segments. At
> some point, the checkpointer process will call RemoveOldXlogFiles()
> and try to create the missing .ready file.
Mmm. Assuming that could happen, a history file gets cursed to lose a
chance to be archived forever once that disaster falls onto it. Apart
from this patch, maybe we need a measure to notify the history files
that are once missed a chance.
Assuming that all such forgotten files would be finally re-marked as
.ready anywhere, they can be re-found by archiver by explicitly
triggering the fallback path. Currently the trigger fires implicitly
by checking shared timeline movement, but by causing the trigger by,
for example by a signal as mentioned in a nearby message, that
behavior would be easily to implement.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center