Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: .ready and .done files considered harmful
Date
Msg-id C472F58F-4E9B-49C2-82FF-2E373ADEFE34@amazon.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Dipesh Pandit <dipesh.pandit@gmail.com>)
Responses Re: .ready and .done files considered harmful
Re: .ready and .done files considered harmful
Re: .ready and .done files considered harmful
List pgsql-hackers
On 9/15/21, 4:28 AM, "Dipesh Pandit" <dipesh.pandit@gmail.com> wrote:
> I have incorporated these changes and updated a new patch. PFA patch.

Thanks for the new patch.

I've attached my take on the latest version.  My main focus this time
was simplifying the patch for this approach as much as possible.
Specifically, I've done the following:

  1. I've removed several calls to PgArchForceDirScan() in favor of
     calling it at the top of pgarch_ArchiverCopyLoop().  I believe
     there is some disagreement about this change, but I don't think
     we gain enough to justify the complexity.  The main reason we
     exit pgarch_ArchiverCopyLoop() should ordinarily be that we've
     run out of files to archive, so incurring a directory scan the
     next time it is called doesn't seem like it would normally be too
     bad.  I'm sure there are exceptions (e.g., lots of .done files,
     archive failures), but the patch is still not making things any
     worse than they presently are for these cases.
  2. I removed all the logic that attempted to catch out-of-order
     .ready files.  Instead, XLogArchiveNotify() only forces a
     directory scan for files other than regular WAL files, and we
     depend on our periodic directory scans to pick up anything that's
     been left behind.
  3. I moved the logic that forces directory scans every once in a
     while.  We were doing that in the checkpoint/restartpoint logic,
     which, upon further thought, might not be the best idea.  The
     checkpoint interval can vary widely, and IIRC we won't bother
     creating checkpoints at all if database activity stops.  Instead,
     I've added logic in pgarch_readyXlog() that forces a directory
     scan if one hasn't happened in a few minutes.
  4. Finally, I've tried to ensure comments are clear and that the
     logic is generally easy to reason about.

What do you think?

Nathan


Attachment

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Possible fault with resolve column name (plpgsql)
Next
From: Sehrope Sarkuni
Date:
Subject: Re: Add jsonlog log_destination for JSON server logs