Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Dipesh Pandit
Subject Re: .ready and .done files considered harmful
Date
Msg-id CAN1g5_G9WYmMM=2rJ3JKDf9x2YUBBPP80FWJe6C=tNGqKjzKUg@mail.gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
Hi,

> 1. I've removed several calls to PgArchForceDirScan() in favor of
>     calling it at the top of pgarch_ArchiverCopyLoop().  I believe
>     there is some disagreement about this change, but I don't think
>     we gain enough to justify the complexity.  The main reason we
>     exit pgarch_ArchiverCopyLoop() should ordinarily be that we've
>     run out of files to archive, so incurring a directory scan the
>     next time it is called doesn't seem like it would normally be too
>     bad.  I'm sure there are exceptions (e.g., lots of .done files,
>     archive failures), but the patch is still not making things any
>     worse than they presently are for these cases.

Yes, I think when archiver is lagging behind then a call to force
directory scan at the top of pgarch_ArchiverCopyLoop() does not
have any impact. This may result into a directory scan in next cycle
only when the archiver is ahead or in sync but in that case also a
directory scan may not incur too much cost since the archiver is
ahead.I agree that we can remove the separate calls to force a
directory scan in failure scenarios with a single call at the top of
PgArchForceDirScan().

> 2. I removed all the logic that attempted to catch out-of-order
>     .ready files.  Instead, XLogArchiveNotify() only forces a
>     directory scan for files other than regular WAL files, and we
>     depend on our periodic directory scans to pick up anything that's
>     been left behind.
> 3. I moved the logic that forces directory scans every once in a
>     while.  We were doing that in the checkpoint/restartpoint logic,
>     which, upon further thought, might not be the best idea.  The
>     checkpoint interval can vary widely, and IIRC we won't bother
>     creating checkpoints at all if database activity stops.  Instead,
>     I've added logic in pgarch_readyXlog() that forces a directory
>     scan if one hasn't happened in a few minutes.
> 4. Finally, I've tried to ensure comments are clear and that the
>     logic is generally easy to reason about.
>
> What do you think?

I agree, If we force a periodic directory scan then we may not
require any special logic for handling scenarios where a .ready file
is created out of order in XLogArchiveNotify(). We need to force a
directory scan only in case of a non-regular WAL file in
XLogArchiveNotify().

Overall I think the periodic directory scan simplifies the patch and
makes sure that any missing file gets archived within a few mins.

Thanks,
Dipesh

pgsql-hackers by date:

Previous
From: Nikolay Samokhvalov
Date:
Subject: Re: Release 14 Schedule
Next
From: Pavel Luzanov
Date:
Subject: Re: psql: \dl+ to list large objects privileges