Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: .ready and .done files considered harmful
Date
Msg-id 20210806.133958.962850705445445360.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: .ready and .done files considered harmful
List pgsql-hackers
At Fri, 6 Aug 2021 02:34:24 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 8/5/21, 6:26 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> > It works the current way always at the first iteration of
> > pgarch_ArchiveCopyLoop() becuse in the last iteration of
> > pgarch_ArchiveCopyLoop(), pgarch_readyXlog() erases the last
> > anticipated segment.  The shortcut works only when
> > pgarch_ArchiveCopyLoop archives more than once successive segments at
> > once.  If the anticipated next segment found to be missing a .ready
> > file while archiving multiple files, pgarch_readyXLog falls back to
> > the regular way.
> >
> > So I don't see the danger to happen perhaps you are considering.
> 
> I think my concern is that there's no guarantee that we will ever do
> another directory scan.  A server that's generating a lot of WAL could
> theoretically keep us in the next-anticipated-log code path
> indefinitely.

Theoretically possible. Supposing that .ready may be created
out-of-order (for the following reason, as a possibility), when once
the fast path bailed out then the fallback path finds that the second
oldest file has .ready, the succeeding fast path continues running
leaving the oldest file.

> > In the first place, .ready are added while holding WALWriteLock in
> > XLogWrite, and while removing old segments after a checkpoint (which
> > happens while recovery). Assuming that no one manually remove .ready
> > files on an active server, the former is the sole place doing that. So
> > I don't see a chance that .ready files are created out-of-order way.
> 
> Perhaps a more convincing example is when XLogArchiveNotify() fails.
> AFAICT this can fail without ERROR-ing, in which case the server can
> continue writing WAL and creating .ready files for later segments.  At
> some point, the checkpointer process will call RemoveOldXlogFiles()
> and try to create the missing .ready file.

Mmm. Assuming that could happen, a history file gets cursed to lose a
chance to be archived forever once that disaster falls onto it.  Apart
from this patch, maybe we need a measure to notify the history files
that are once missed a chance.

Assuming that all such forgotten files would be finally re-marked as
.ready anywhere, they can be re-found by archiver by explicitly
triggering the fallback path.  Currently the trigger fires implicitly
by checking shared timeline movement, but by causing the trigger by,
for example by a signal as mentioned in a nearby message, that
behavior would be easily to implement.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
Next
From: Yura Sokolov
Date:
Subject: Re: RFC: Improve CPU cache locality of syscache searches