Re: Add recovery to pg_control and remove backup_label - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Add recovery to pg_control and remove backup_label |
Date | |
Msg-id | ZVGn6PY0LQ7RbDVL@paquier.xyz Whole thread Raw |
In response to | Re: Add recovery to pg_control and remove backup_label (David Steele <david@pgmasters.net>) |
List | pgsql-hackers |
On Fri, Nov 10, 2023 at 02:55:19PM -0400, David Steele wrote: > On 11/10/23 00:37, Michael Paquier wrote: >> I've done a few more dozen runs, and still nothing. I am wondering >> what this disturbance was. > > OK, hopefully it was just a blip. Still nothing on this side. So that seems really like a random blip in the matrix. > This has been split out. Thanks, applied 0001. >> The term "backup recovery", that we've never used in the tree until >> now as far as I know. Perhaps this recovery method should just be >> referred as "recovery from backup"? > > Well, "backup recovery" is less awkward, I think. For instance "backup > recovery field" vs "recovery from backup field". Not sure. I've never used this term when referring to recovery from a backup. Perhaps I'm just not used to it, still that sounds a bit confusing here. >> Something in this area is that backupRecoveryRequired is the switch >> controlling if the fields set by the recovery initialization. Could >> it be actually useful to leave the other fields as they are and only >> reset backupRecoveryRequired before the first control file update? >> This would leave a trace of the backup history directly in the control >> file. > > Since the other recovery fields are cleared in ReachedEndOfBackup() this > would be a change from what we do now. > > None of these fields are ever visible (with the exception of > minRecoveryPoint/TLI) since they are reset when the database becomes > consistent and before logons are allowed. Viewing them with pg_controldata > makes sense, but I'm not sure pg_control_recovery() does. > > In fact, are backup_start_lsn, backup_end_lsn, and > end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe > I'm missing something? Yeah, but custom backup/restore tools may want manipulate the contents of the control file for their own work, so at least for the sake of visibility it sounds important to me to show all the information at hand, and that there is no need to. - The backup label - file includes the label string you gave to <function>pg_backup_start</function>, + The backup history file (which is archived like WAL) includes the label + string you gave to <function>pg_backup_start</function>, as well as the time at which <function>pg_backup_start</function> was run, and the name of the starting WAL file. In case of confusion it is therefore - possible to look inside a backup file and determine exactly which + possible to look inside a backup history file and determine exactly which As a side note, it is a bit disappointing that we lose the backup label from the backup itself, even if the patch changes correctly the documentation to reflect the new behavior. It is in the backup history file on the node from where the base backup has been taken or in the archives, hopefully. However there is nothing that remains in the base backup itself, and backups can be self-contained (easy with pg_basebackup --wal-method=stream). I think that we should retain a minimum amount of information as a replacement for the backup_label, at least. With this state, the current patch slightly reduces the debuggability of deployments. That could be annoying for some users. > New patches attached based on eb81e8e790. Diving into the code for references about the backup label file, I have spotted this log in pg_rewind that is now incorrect: if (showprogress) pg_log_info("creating backup label and updating control file"); + printf(_("Backup start location's timeline: %u\n"), + ControlFile->backupStartPointTLI); printf(_("Backup end location: %X/%X\n"), LSN_FORMAT_ARGS(ControlFile->backupEndPoint)); Perhaps these two should be reversed to match with the header file. + /* + * After pg_backup_stop() returns this field will contain a copy of + * pg_control that should be stored with the backup. Fields have been + * updated for recovery and the CRC has been recalculated. The buffer + * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always + * a consistent size but smaller (and hopefully easier to handle) than + * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed. + */ + uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE]; I don't mind the addition of a control file with the max safe size, because it will never be higher than that. However: + /* End the backup before sending pg_control */ + basebackup_progress_wait_wal_archive(&state); + do_pg_backup_stop(backup_state, !opt->nowait); + + /* Send copy of pg_control containing recovery info */ + sendFileWithContent(sink, XLOG_CONTROL_FILE, + (char *)backup_state->controlFile, + PG_CONTROL_MAX_SAFE_SIZE, &manifest); It seems to me that the base backup protocol should always send an 8k file for the control file so as we maintain consistency with the on-disk format. Currently, a base backup taken with this patch results in a control file of size 512B. + /* Build the contents of pg_control */ + pg_control_bytea = (bytea *) palloc(PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ); + SET_VARSIZE(pg_control_bytea, PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ); + memcpy(VARDATA(pg_control_bytea), backup_state->controlFile, PG_CONTROL_MAX_SAFE_SIZE); Similar comment for the control file returned by pg_backup_stop(), which could just be made a 8k field? + <function>pg_backup_stop</function> returns the + <filename>pg_control</filename> file, which must be stored in the + <filename>global</filename> directory of the backup. It also returns the And perhaps emphasize that this file should be an 8kB file in the paragraph mentioning the data returned by pg_backup_stop()? - Create a <filename>backup_label</filename> file to begin WAL replay at + Update <filename>pg_control</filename> file to begin WAL replay at the checkpoint created at failover and configure the <filename>pg_control</filename> file with a minimum consistency LSN pg_control is mentioned twice, so perhaps this could be worded better? PG_CONTROL_VERSION is important to not forget about.. Perhaps this should be noted somewhere, or just changed in the patch itself. Contrary to catalog changes, we do few of these in the control file so there is close to zero risk of conflicts with other patches in the CF app. -- Michael
Attachment
pgsql-hackers by date: