I don't like much this patch. While it takes correctly advantage of
the backward record read logic from SimpleXLogPageRead() able to
handle correctly timeline jumps, it creates a hidden dependency in the
code between the hash table from filemap.c and the page callback.
Wouldn't it be simpler to build a list of the segment names using the
information from WALOpenSegment and build this list in
findLastCheckpoint()?
I think the first version of the patch more or less did that. Not necessarily a list, but a hash table of WAL file names that we want to keep. But Kyotaro Horiguchi didn't like it and suggested creating entries in the filemap.c hash table instead.
But, I agree, doing it directly from the findLastCheckpoint() makes the code easier to understand.
Also, I am wondering if we should be smarter
with any potential conflict handling between the source and the
target, rather than just enforcing a FILE_ACTION_NONE for all these
files. In short, could it be better to copy the WAL file from the
source if it exists there?
Before the switchpoint these files are supposed to be the same on the old primary, new primary, and also in the archive. Also, if there is a restore_command postgres will fetch the same file from the archive even if it already exists in pg_wal, which effectively discards all pg_rewind efforts on copying WAL files.
+ /*
+ * Some entries (WAL segments) already have an action assigned
+ * (see SimpleXLogPageRead()).
+ */
+ if (entry->action == FILE_ACTION_UNDECIDED)
+ entry->action = decide_file_action(entry);
This change makes me a bit uneasy, per se my previous comment with the
additional code dependencies.
We can revert to the original approach (see v1-0001-pg_rewind-wal-deletion.patch from the very first email) if you like.
I think that this scenario deserves a test case. If one wants to
emulate a delay in WAL archiving, it is possible to set
archive_command to a command that we know will fail, for instance.
Yes, I totally agree, it is on our radar, but meanwhile please see the new version, just to check if I correctly understood your idea.
Regards,
--
Alexander Kukushkin