> After some investigation, I finally concluded that we should reset
> abortedRecPtr and missingContrecPtr at processing
> XLOG_OVERWRITE_CONTRECORD.
> --- a/src/backend/access/transam/xlogrecovery.c
> +++ b/src/backend/access/transam/xlogrecovery.c
> @@ -1953,6 +1953,11 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
timestamptz_to_str(xlrec.overwrite_time))));
> + /* We have safely skipped the aborted record */
> + abortedRecPtr = InvalidXLogRecPtr;
> + missingContrecPtr = InvalidXLogRecPtr;
> +
> /* Verifying the record should only happen once */
> record->overwrittenRecPtr = InvalidXLogRecPtr;
> }
> The last check in the test against "resetting aborted record" is no
> longer useful since it is already checked by
> 026_verwrite_contrecord.pl.
> regards.
+1 for this. Resetting abortedRecPtr and missingContrecPtr after the broken record is skipped in is cleaner. I also
thinkit would make sense to move the " successfully skipped missing contrecord" to after we invalidate the aborted and
missingrecord pointers, like below.
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1948,15 +1948,15 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
LSN_FORMAT_ARGS(record->overwrittenRecPtr));
+ /* We have safely skipped the aborted record */
+ abortedRecPtr = InvalidXLogRecPtr;
+ missingContrecPtr = InvalidXLogRecPtr;
+
ereport(LOG,
(errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
timestamptz_to_str(xlrec.overwrite_time))));
Also, instead of a new test, the current 026_overwrite_contrecord.pl test can include a promotion test at the end.
Attaching a new patch for review.
--
Sami Imseih
Amazon Web Services