On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
> Recycling and preallocation are wasteful during archive recovery, because
> KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to
> fix the race by adding an XLogCtl flag indicating which regime currently owns
> the right to add long-term pg_wal directory entries. In the archive recovery
> regime, the checkpointer will not preallocate and will unlink old segments
> instead of recycling them (like wal_recycle=off). XLogFileInit() will fail.
Here's the implementation. Patches 1-4 suffice to stop the user-visible
ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem
writes, and it provides some future-proofing.
I was tempted to (but did not) just remove preallocation. Creating one file
per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
expect it's hard to isolate any benefit. Under the old checkpoint_segments=3
default, a preallocated segment covered a respectable third of the next
checkpoint. Before commit 63653f7 (2002), preallocation created more files.