Hello,
The attached patch speeds up the removal of WAL files in the old timelines. I'll add this to the next CF.
BACKGROUND
==================================================
We need to meet a severe availability requirement of a potential customer. They will use synchronous streaming
replication. The allowed failover duration, from the failure through failure detection to the failover completion, is
10seconds. Even one second is precious.
During a testing on a fast machine with SSD, we observed about 2 seconds between these messages. There were no other
messagesbetween them.
LOG: archive recovery complete
LOG: MultiXact member wraparound protections are now enabled
CAUSE
==================================================
Examining the source code, RemoveNonParentXlogFiles() seems to account for the time. It syncs pg_wal directory every
timeit deletes a WAL file. max_wal_size was set to 48GB, so about 1,000 WAL files were probably deleted and hence the
pg_waldirectory was synced as much.
FIX
==================================================
unlink() the WAL files, then sync the pg_wal directory once at the end.
Unfortunately, the original machine is now not available, so I confirmed the speedup on a VM with HDD.
[time to remove 1,000 WAL files including the directory sync]
nonpatched: 2.45 seconds
patched: 0.81 seconds
Regards
Takayuki Tsunakawa