On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
pg_archivecleanup currently takes a WAL file name as input to delete
the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
automatically detect the last checkpoint (from control file) LSN,
calculate the lowest restart_lsn required by the replication slots, if
any (by reading the replication slot info from pg_logical directory),
archive the unneeded (an archive_command similar to that of the one
provided in the server config can be provided as an input) WAL files
before finally deleting them? Making pg_archivecleanup tool as an
end-to-end solution will help greatly in disk full situations because
of WAL files growth (inactive replication slots, archive command
failures, infrequent checkpoint etc.).
pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
suggesting to use it for removing files from pg_wal directory too? No, thanks.
WAL files are a key component for backup and replication. Hence, you cannot
deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
wouldn't occur if you have a monitoring system and alerts and someone to keep
an eye on it. If the disk full situation was caused by a failed archive command
or a disconnected standby, it is easy to figure out; the fix is simple.