Re: [BUG] Panic due to incorrect missingContrecPtr after promotion - Mailing list pgsql-hackers

From Imseih (AWS), Sami
Subject Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date
Msg-id EF51690F-5006-47C4-9473-9DFAFC0B1778@amazon.com
Whole thread Raw
In response to Re: [BUG] Panic due to incorrect missingContrecPtr after promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
List pgsql-hackers
>    After some investigation, I finally concluded that we should reset
>    abortedRecPtr and missingContrecPtr at processing
>    XLOG_OVERWRITE_CONTRECORD.


 >   --- a/src/backend/access/transam/xlogrecovery.c
 >   +++ b/src/backend/access/transam/xlogrecovery.c
 >   @@ -1953,6 +1953,11 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
                                                    LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                                    timestamptz_to_str(xlrec.overwrite_time))));

 >   +               /* We have safely skipped the aborted record */
 >   +               abortedRecPtr = InvalidXLogRecPtr;
 >   +               missingContrecPtr = InvalidXLogRecPtr;
 >   +
 >                   /* Verifying the record should only happen once */
 >                   record->overwrittenRecPtr = InvalidXLogRecPtr;
 >           }

>    The last check in the test against "resetting aborted record" is no
>    longer useful since it is already checked by
>    026_verwrite_contrecord.pl.

>    regards.

+1 for this. Resetting abortedRecPtr and missingContrecPtr after the broken record is skipped in is cleaner. I also
thinkit would make sense to move the " successfully skipped missing contrecord" to after we invalidate the aborted and
missingrecord pointers, like below.
 

+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1948,15 +1948,15 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
                                 LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                 LSN_FORMAT_ARGS(record->overwrittenRecPtr));

+               /* We have safely skipped the aborted record */
+               abortedRecPtr = InvalidXLogRecPtr;
+               missingContrecPtr = InvalidXLogRecPtr;
+
                ereport(LOG,
                                (errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
                                                LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                                timestamptz_to_str(xlrec.overwrite_time))));

Also, instead of a new test, the current 026_overwrite_contrecord.pl test can include a promotion test at the end.

Attaching a new patch for review.

 --
Sami Imseih
Amazon Web Services


Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers
Next
From: Melanie Plageman
Date:
Subject: Re: why do hash index builds use smgrextend() for new splitpoint pages