> Should we update ControlFile->data_checksum_version at the end-of-recovery? If not, the disk > is stale compared to in memory until the next checkpoint. The code two lines below updates > the control file anyways to set the DB_IN_PRODUCTION. Maybe combine the update with that? > It's no big deal if we don't do it because it will be self correct but tools like pg_controldata > give stale value for some time.
I've been thinking about this one and in the end went with adding it. It is written already with PerformRecoveryXLogAction but we might have an inprogress-off->off transition after that which seems better to reflect in the controlfile.
Also, an off-list question on yesterday's bugfix made me realize that it can be simplified even further. Instead of tracking which processes need cleanup in the exit handler, we can simply defer installing it at all till we know that there will be cleanup to process.
The attached also updates the commit messages to reflect the reviewers of this patchset as well as email id's.
Thanks for the updated patchset. I applied v3 on top of current master and did
a focused review/test pass.
Reran the duplicate-launcher reproducer three times against v3 with a more
aggressive variant, could not see any issue.
The v3 launcher_exit change looks good to me. Deferring installation of the on_shmem_exit() handler until after the process has confirmed that it owns the launcher role avoids the duplicate-launcher cleanup problem.
I also looked at the end-of-recovery control-file update. Copying XLogCtl->data_checksum_version into ControlFile->data_checksum_version under ControlFileLock before the existing UpdateControlFile() call is a good idea.