Hackers,
This is greatly simplified implementation of the patch proposed in [1]
and hopefully it addresses the concerns expressed there. Since the
implementation is quite different it seemed like a new thread was
appropriate, especially since the old thread title would be very
misleading regarding the new functionality.
The basic idea is to harden recovery by returning a copy of pg_control
from pg_backup_stop() that has a flag set to prevent recovery if the
backup_label file is missing. Instead of backup software copying
pg_control from PGDATA, it stores an updated version that is returned
from pg_backup_stop(). This is better for the following reasons:
* The user can no longer remove backup_label and get what looks like a
successful recovery (while almost certainly causing corruption). If
backup_label is removed the cluster will not start. The user may try
pg_resetwal, but that tool makes it pretty clear that corruption will
result from its use.
* We don't need to worry about backup software seeing a torn copy of
pg_control, since Postgres can safely read it out of memory and provide
a valid copy via pg_backup_stop(). This solves torn reads without
needing to write pg_control via a temp file, which may affect
performance on a standby.
* For backup from standby, we no longer need to instruct the backup
software to copy pg_control last. In fact the backup software should not
copy pg_control from PGDATA at all.
These changes have no impact on current backup software and they are
free to use the pg_control available from pg_stop_backup() or continue
to use pg_control from PGDATA. Of course they will miss the benefits of
getting a consistent copy of pg_control and the backup_label checking,
but will be no worse off than before.
I'll register this in the July CF.
Regards,
-David
[1]
https://www.postgresql.org/message-id/2daf8adc-8db7-4204-a7f2-a7e94e2bfa4b@pgmasters.net