Thread: pgsql: Backpatch critical performance fixes to pgarch.c

pgsql: Backpatch critical performance fixes to pgarch.c

From
Álvaro Herrera
Date:
Backpatch critical performance fixes to pgarch.c

This backpatches commits beb4e9ba1652 and 1fb17b190341 (originally
appearing in previously in REL_15_STABLE) to REL_14_STABLE.  Performance
of the WAL archiver can become pretty critical at times, and reports
exist of users getting in serious trouble (hours of downtime, loss of
replicas) because of lack of this optimization.

We'd like to backpatch these to REL_13_STABLE too, but because of the
very invasive changes made by commit d75288fb27b8 in the 14 timeframe,
we deem it too risky :-(

Original commit messages appear below.

Discussion: https://postgr.es/m/202411131605.m66syq5i5ucl@alvherre.pgsql

  commit beb4e9ba1652a04f66ff20261444d06f678c0b2d
  Author:     Robert Haas <rhaas@postgresql.org>
  AuthorDate: Thu Nov 11 15:02:53 2021 -0500

  Improve performance of pgarch_readyXlog() with many status files.

  Presently, the archive_status directory was scanned for each file to
  archive.  When there are many status files, say because archive_command
  has been failing for a long time, these directory scans can get very
  slow.  With this change, the archiver remembers several files to archive
  during each directory scan, speeding things up.

  To ensure timeline history files are archived as quickly as possible,
  XLogArchiveNotify() forces the archiver to do a new directory scan as
  soon as the .ready file for one is created.

  Nathan Bossart, per a long discussion involving many people. It is
  not clear to me exactly who out of all those people reviewed this
  particular patch.

  Discussion: http://postgr.es/m/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com
  Discussion: http://postgr.es/m/620F3CE1-0255-4D66-9D87-0EADE866985A@amazon.com

  commit 1fb17b1903414676bd371068739549cd2966fe87
  Author:     Tom Lane <tgl@sss.pgh.pa.us>
  AuthorDate: Wed Dec 29 17:02:50 2021 -0500

  Fix issues in pgarch's new directory-scanning logic.

  The arch_filenames[] array elements were one byte too small, so that
  a maximum-length filename would get corrupted if another entry
  were made after it.  (Noted by Thomas Munro, fix by Nathan Bossart.)

  Move these arrays into a palloc'd struct, so that we aren't wasting
  a few kilobytes of static data in each non-archiver process.

  Add a binaryheap_reset() call to make it plain that we start the
  directory scan with an empty heap.  I don't think there's any live
  bug of that sort, but it seems fragile, and this is very cheap
  insurance.

  Cleanup for commit beb4e9ba1, so no back-patch needed.

  Discussion: https://postgr.es/m/CA+hUKGLHAjHuKuwtzsW7uMJF4BVPcQRL-UMZG_HM-g0y7yLkUg@mail.gmail.com

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/4abf615cc8a8ca80430b5a0bfa18be6efcea96b2

Modified Files
--------------
src/backend/access/transam/xlogarchive.c |  14 +++
src/backend/postmaster/pgarch.c          | 208 +++++++++++++++++++++++++++----
src/include/postmaster/pgarch.h          |   1 +
3 files changed, 197 insertions(+), 26 deletions(-)