Thread: [BUG] Panic due to incorrect missingContrecPtr after promotion

[BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:

On 13.5 a wal flush PANIC is encountered after a standby is promoted.

 

With debugging, it was found that when a standby skips a missing continuation record on recovery, the missingContrecPtr is not invalidated after the record is skipped. Therefore, when the standby is promoted to a primary it writes an overwrite_contrecord with an LSN of the missingContrecPtr, which is now in the past. On flush time, this causes a PANIC. From what I can see, this failure scenario can only occur after a standby is promoted.

 

The overwrite_contrecord was introduced in 13.5 with https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff9f111bce24.

 

Attached is a patch and a TAP test to handle this condition. The patch ensures that an overwrite_contrecord is only created if the missingContrecPtr is ahead of the last wal record.

 

To reproduce:

Run the new tap test recovery/t/029_overwrite_contrecord_promotion.pl without the attached patch

 

2022-02-22 18:38:15.526 UTC [31138] LOG:  started streaming WAL from primary at 0/2000000 on timeline 1

2022-02-22 18:38:15.535 UTC [31105] LOG:  successfully skipped missing contrecord at 0/1FFC620, overwritten at 2022-02-22 18:38:15.136482+00

2022-02-22 18:38:15.535 UTC [31105] CONTEXT:  WAL redo at 0/2000028 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFC620; time 2022-02-22 18:38:15.136482+00

…..

2022-02-22 18:38:15.575 UTC [31103] PANIC:  xlog flush request 0/201EC70 is not satisfied --- flushed only to 0/2000088

2022-02-22 18:38:15.575 UTC [31101] LOG:  checkpointer process (PID 31103) was terminated by signal 6: Aborted

….

…..

 

With the patch, running the same tap test succeeds and a PANIC is not observed.

 

Thanks

 

Sami Imseih

Amazon Web Services

 

 

 

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Alvaro Herrera
Date:
On 2022-Feb-22, Imseih (AWS), Sami wrote:

> On 13.5 a wal flush PANIC is encountered after a standby is promoted.
> 
> With debugging, it was found that when a standby skips a missing
> continuation record on recovery, the missingContrecPtr is not
> invalidated after the record is skipped. Therefore, when the standby
> is promoted to a primary it writes an overwrite_contrecord with an LSN
> of the missingContrecPtr, which is now in the past. On flush time,
> this causes a PANIC. From what I can see, this failure scenario can
> only occur after a standby is promoted.

Ooh, nice find and diagnosys.  I can confirm that the test fails as you
described without the code fix, and doesn't fail with it.

I attach the same patch, with the test file put in its final place
rather than as a patch.  Due to recent xlog.c changes this need a bit of
work to apply to back branches; I'll see about getting it in all
branches soon.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
>    Ooh, nice find and diagnosys.  I can confirm that the test fails as you
>    described without the code fix, and doesn't fail with it.

>    I attach the same patch, with the test file put in its final place
>    rather than as a patch.  Due to recent xlog.c changes this need a bit of
>    work to apply to back branches; I'll see about getting it in all
>    branches soon.

Thank you for reviewing!

Sami



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Wed, 23 Feb 2022 02:58:07 +0000, "Imseih (AWS), Sami" <simseih@amazon.com> wrote in 
> >    Ooh, nice find and diagnosys.  I can confirm that the test fails as you
> >    described without the code fix, and doesn't fail with it.
> 
> >    I attach the same patch, with the test file put in its final place
> >    rather than as a patch.  Due to recent xlog.c changes this need a bit of
> >    work to apply to back branches; I'll see about getting it in all
> >    branches soon.
> 
> Thank you for reviewing!

Nice catch!  However, I'm not sure I like the patch.

      * made it through and start writing after the portion that persisted.
      * (It's critical to first write an OVERWRITE_CONTRECORD message, which
      * we'll do as soon as we're open for writing new WAL.)
+     *
+     * If the last wal record is ahead of the missing contrecord, this is a
+     * recently promoted primary and we should not write an overwrite
+     * contrecord.

Before the part, the comment follows the part shown below.

>     * Actually, if WAL ended in an incomplete record, skip the parts that

So, actually WAL did not ended in an incomplete record.  I think
FinishWalRecover is the last place to do that. (But it could be
earlier.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Thu, 24 Feb 2022 16:26:42 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> So, actually WAL did not ended in an incomplete record.  I think
> FinishWalRecover is the last place to do that. (But it could be
> earlier.)

After some investigation, I finally concluded that we should reset
abortedRecPtr and missingContrecPtr at processing
XLOG_OVERWRITE_CONTRECORD.


--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1953,6 +1953,11 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
                                                LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                                timestamptz_to_str(xlrec.overwrite_time))));
 
+               /* We have safely skipped the aborted record */
+               abortedRecPtr = InvalidXLogRecPtr;
+               missingContrecPtr = InvalidXLogRecPtr;
+
                /* Verifying the record should only happen once */
                record->overwrittenRecPtr = InvalidXLogRecPtr;
        }

The last check in the test against "resetting aborted record" is no
longer useful since it is already checked by
026_verwrite_contrecord.pl.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
>    Nice catch!  However, I'm not sure I like the patch.

>             * made it through and start writing after the portion that persisted.
>             * (It's critical to first write an OVERWRITE_CONTRECORD message, which
>             * we'll do as soon as we're open for writing new WAL.)
>    +        *
>    +        * If the last wal record is ahead of the missing contrecord, this is a
>    +        * recently promoted primary and we should not write an overwrite
>    +        * contrecord.

>    Before the part, the comment follows the part shown below.

>    >        * Actually, if WAL ended in an incomplete record, skip the parts that

>    So, actually WAL did not ended in an incomplete record.  I think
>    FinishWalRecover is the last place to do that. (But it could be
>    earlier.)

Thanks for the feedback. 

## On the primary, an OVERWRITE_CONTRECORD is written. The aborted record (abortedRecPtr) and insert position of the
missingcontrecord (missingContrecPtr) are set during FinishWalRecovery and after recovery but before writes are
accepted,the OVERWRITE_CONTRECORD is written.
 
## on the primary logs

2022-02-25 02:25:16.208 UTC [7397] LOG:  redo done at 0/1FFD2B8 system usage: CPU: user: 0.01 s, system: 0.00 s,
elapsed:0.01 s
 
2022-02-25 02:25:16.208 UTC [7397] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2022-02-25 02:25:16.209 UTC [7397] DEBUG:  creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  done creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  OVERWRITE_CONTRECORD inserted at 0/2000028 for aborted record 0/1FFD2E0
<<-------Attached V3 of patch that adds logging which shows the overwrite record created on the primary
 
2022-02-25 02:25:16.217 UTC [7395] LOG:  checkpoint starting: end-of-recovery immediate wait
2022-02-25 02:25:16.217 UTC [7395] DEBUG:  performing replication slot checkpoint
2022-02-25 02:25:16.218 UTC [7395] DEBUG:  attempting to remove WAL segments older than log file
000000000000000000000000
2022-02-25 02:25:16.218 UTC [7395] LOG:  checkpoint complete: wrote 131 buffers (102.3%); 0 WAL file(s) added, 0
removed,0 recycled; write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=11381kB, estimate=11381 kB
 
2022-02-25 02:25:16.219 UTC [7394] DEBUG:  starting background worker process "logical replication launcher"
2022-02-25 02:25:16.219 UTC [7394] LOG:  database system is ready to accept connections

## A downstream standby now skips over this OVERWRITE_CONTRECORD, but the value for missingContrecPtr is not
invalidatedafterwards.
 
## on the standby logs

2022-02-25 02:25:16.616 UTC [7413] DEBUG:  sending hot standby feedback xmin 0 epoch 0 catalog_xmin 0
catalog_xmin_epoch0
 
2022-02-25 02:25:16.616 UTC [7387] LOG:  successfully skipped missing contrecord at 0/1FFD2E0, overwritten at
2022-02-2502:25:16.2175+00
 
2022-02-25 02:25:16.616 UTC [7387] CONTEXT:  WAL redo at 0/2000028 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFD2E0; time
2022-02-2502:25:16.2175+00
 

## After promotion, the standby attempts to write the overwrite_contrecord again using the missingContrecPtr LSN which
isnow in the past.
 
## on the standby logs

2022-02-25 02:25:16.646 UTC [7387] LOG:  invalid record length at 0/201EC70: wanted 24, got 0
2022-02-25 02:25:16.646 UTC [7387] LOG:  redo done at 0/201EC48 system usage: CPU: user: 0.01 s, system: 0.00 s,
elapsed:0.55 s
 
2022-02-25 02:25:16.646 UTC [7387] LOG:  last completed transaction was at log time 2022-02-25 02:25:16.301554+00
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2022-02-25 02:25:16.646 UTC [7387] LOG:  selected new timeline ID: 2
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  updated min recovery point to 0/201EC70 on timeline 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  could not remove file "pg_wal/000000020000000000000002": No such file or
directory
2022-02-25 02:25:16.656 UTC [7387] LOG:  archive recovery complete
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  OVERWRITE_CONTRECORD inserted at 0/2000028 for aborted record 0/1FFD2E0
<<-------The same overwrite record is written on the recently promoted standby
 
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  attempting to remove WAL segments newer than log file
000000020000000000000001
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  recycled write-ahead log file "000000010000000000000002"
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  release all standby locks
2022-02-25 02:25:16.656 UTC [7385] LOG:  checkpoint starting: force
2022-02-25 02:25:16.656 UTC [7385] DEBUG:  performing replication slot checkpoint
2022-02-25 02:25:16.656 UTC [7385] LOG:  request to flush past end of generated WAL; request 0/201EC70, current
position0/2000088
 
2022-02-25 02:25:16.656 UTC [7385] PANIC:  xlog flush request 0/201EC70 is not satisfied --- flushed only to 0/2000088
2022-02-25 02:25:16.657 UTC [7384] LOG:  checkpointer process (PID 7385) was terminated by signal 6: Aborted

The purpose of the patch is to check that if the current LSN is beyond missingContrecPtr, and to skip ( incorrectly )
writingan overwrite contrecord that was written on the old writer.
 

--
Sami Imseih
Amazon Web Services



Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
>    After some investigation, I finally concluded that we should reset
>    abortedRecPtr and missingContrecPtr at processing
>    XLOG_OVERWRITE_CONTRECORD.


 >   --- a/src/backend/access/transam/xlogrecovery.c
 >   +++ b/src/backend/access/transam/xlogrecovery.c
 >   @@ -1953,6 +1953,11 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
                                                    LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                                    timestamptz_to_str(xlrec.overwrite_time))));

 >   +               /* We have safely skipped the aborted record */
 >   +               abortedRecPtr = InvalidXLogRecPtr;
 >   +               missingContrecPtr = InvalidXLogRecPtr;
 >   +
 >                   /* Verifying the record should only happen once */
 >                   record->overwrittenRecPtr = InvalidXLogRecPtr;
 >           }

>    The last check in the test against "resetting aborted record" is no
>    longer useful since it is already checked by
>    026_verwrite_contrecord.pl.

>    regards.

+1 for this. Resetting abortedRecPtr and missingContrecPtr after the broken record is skipped in is cleaner. I also
thinkit would make sense to move the " successfully skipped missing contrecord" to after we invalidate the aborted and
missingrecord pointers, like below.
 

+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1948,15 +1948,15 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
                                 LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                 LSN_FORMAT_ARGS(record->overwrittenRecPtr));

+               /* We have safely skipped the aborted record */
+               abortedRecPtr = InvalidXLogRecPtr;
+               missingContrecPtr = InvalidXLogRecPtr;
+
                ereport(LOG,
                                (errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
                                                LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                                timestamptz_to_str(xlrec.overwrite_time))));

Also, instead of a new test, the current 026_overwrite_contrecord.pl test can include a promotion test at the end.

Attaching a new patch for review.

 --
Sami Imseih
Amazon Web Services


Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"alvherre@alvh.no-ip.org"
Date:
On 2022-Mar-07, Imseih (AWS), Sami wrote:

> I have gone ahead and backpatched this all the way to 10 as well.

Thanks!  I pushed this now.  I edited the test though: I don't
understand why you went to the trouble of setting stuff in order to call
'pg_ctl promote' (in different ways for older branches), when
$node_standby->promote does the same and is simpler to call.  So I
changed the tests to do that.  (I did verify that without the code fix,
the PANIC indeed is thrown.)

Thank again,

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
We see another occurrence of this bug with the last patch applied in 13.7.

After a promotion we observe the following in the logs:

2022-05-25 00:35:38 UTC::@:[371]:PANIC: xlog flush request 10/B1FA3D88 is not satisfied --- flushed only to 7/A8000060
2022-05-25 00:35:38 UTC:172.31.26.238(38610):administrator@postgres:[23433]:ERROR: current transaction is aborted,
commandsignored until end of transaction block
 

However, The logs do not show "LOG: successfully skipped missing contrecord",
therefore we know that  VerifyOverwriteContrecord 
is not being called to invalidate the missingContrecPtr.

VerifyOverwriteContrecord(xl_overwrite_contrecord *xlrec, XLogReaderState *state)
{
   if (xlrec->overwritten_lsn != state->overwrittenRecPtr)
       elog(FATAL, "mismatching overwritten LSN %X/%X -> %X/%X",
            (uint32) (xlrec->overwritten_lsn >> 32),
            (uint32) xlrec->overwritten_lsn,
            (uint32) (state->overwrittenRecPtr >> 32),
            (uint32) state->overwrittenRecPtr);

   /* We have safely skipped the aborted record */
   abortedRecPtr = InvalidXLogRecPtr;
   missingContrecPtr = InvalidXLogRecPtr;

   ereport(LOG,
           (errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
                   (uint32) (xlrec->overwritten_lsn >> 32),
                   (uint32) xlrec->overwritten_lsn,
                   timestamptz_to_str(xlrec->overwrite_time))));

We think it's because VerifyOverwriteContrecord was not 
called which is why we see this behavior. 

Are there are  other places where missingContrecPtr 
should be invalidated, such as after a successful promotion?

--
Sami Imseih
Amazon Web Services




Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Michael Paquier
Date:
On Tue, Feb 22, 2022 at 07:20:55PM +0000, Imseih (AWS), Sami wrote:
> The overwrite_contrecord was introduced in 13.5 with
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff9f111bce24.
>
> Attached is a patch and a TAP test to handle this condition. The
> patch ensures that an overwrite_contrecord is only created if the
> missingContrecPtr is ahead of the last wal record.

The test you are introducing to force a complete segment to be filled
is funky, and kind of nice actually while being cheap.  This part
particularly makes the test predictable:
++unlink $node->basedir . "/pgdata/pg_wal/$endfile"
++  or die "could not unlink " . $node->basedir
. "/pgdata/pg_wal/$endfile: $!";

I really like that.

> With the patch, running the same tap test succeeds and a PANIC is
> not observed.

This needs a very close lookup, I'll try to check all that except if
somebody beats me to it.
--
Michael

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Michael Paquier
Date:
On Fri, May 27, 2022 at 08:53:03AM +0900, Michael Paquier wrote:
> This needs a very close lookup, I'll try to check all that except if
> somebody beats me to it.

Please ignore that..  I need more coffee, and likely a break.
--
Michael

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Thu, 26 May 2022 19:57:41 +0000, "Imseih (AWS), Sami" <simseih@amazon.com> wrote in 
> We see another occurrence of this bug with the last patch applied in 13.7.
> 
> After a promotion we observe the following in the logs:
...
> We think it's because VerifyOverwriteContrecord was not 
> called which is why we see this behavior. 
> 
> Are there are  other places where missingContrecPtr 
> should be invalidated, such as after a successful promotion?

The only cause known to me for EndOfLog being moved to such location
is missingContrecPtr. But if the next record is not
XLOG_OVERWRITE_CONTRECORD recovery should have stopped there. And if
XLOG_OVERWRITE_CONTRECORD is there, VerifyOverwriteContrecord should
have been called..

Could you inspect WAL files of the environment and see if the first
record of the '7/A8'th segment OVERWRITE_CONTRECORD?  I don't say that
makes some progress on this, but could be the first step.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
After further research, we found the following.

Testing on 13.6 with the attached patch we see
that the missingContrecPtr is being incorrectly
set on the standby and the promote in the tap
test fails.

Per the comments in xlog.c, the
missingContrecPtr should not be set when
in standby mode.

            /*
             * When not in standby mode we find that WAL ends in an incomplete
             * record, keep track of that record.  After recovery is done,
             * we'll write a record to indicate downstream WAL readers that
             * that portion is to be ignored.
             */
            if (!StandbyMode &&
                !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
            {
                abortedRecPtr = xlogreader->abortedRecPtr;
                missingContrecPtr = xlogreader->missingContrecPtr;
                elog(LOG, "missingContrecPtr == %ld", missingContrecPtr);
            }

If StandbyModeRequested is checked instead, which
checks for the presence of a standby signal file,
The missingContrecPtr is not set on the
standby and the test succeeds.

            if (!StandbyModeRequested &&
                !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
            {
                abortedRecPtr = xlogreader->abortedRecPtr;
                missingContrecPtr = xlogreader->missingContrecPtr;
                elog(LOG, "missingContrecPtr == %ld", missingContrecPtr);
            }

If this is a bug as it appears, it appears the original patch
to resolve this issue is not needed and the ideal fix
Is to ensure that a standby does not set
missingContrecPtr.

Would like to see what others think.

Thanks

--
Sami Imseih
Amazon Web Services



Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Fri, 27 May 2022 02:01:27 +0000, "Imseih (AWS), Sami" <simseih@amazon.com> wrote in 
> After further research, we found the following.
> 
> Testing on 13.6 with the attached patch we see
> that the missingContrecPtr is being incorrectly
> set on the standby and the promote in the tap
> test fails.
> 
> Per the comments in xlog.c, the
> missingContrecPtr should not be set when
> in standby mode.
> 
>             /*
>              * When not in standby mode we find that WAL ends in an incomplete
>              * record, keep track of that record.  After recovery is done,
>              * we'll write a record to indicate downstream WAL readers that
>              * that portion is to be ignored.
>              */
>             if (!StandbyMode &&
>                 !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
>             {
>                 abortedRecPtr = xlogreader->abortedRecPtr;
>                 missingContrecPtr = xlogreader->missingContrecPtr;
>                 elog(LOG, "missingContrecPtr == %ld", missingContrecPtr);
>             }
> 
> If StandbyModeRequested is checked instead, which
> checks for the presence of a standby signal file,
> The missingContrecPtr is not set on the
> standby and the test succeeds.
> 
>             if (!StandbyModeRequested &&
>                 !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
>             {
>                 abortedRecPtr = xlogreader->abortedRecPtr;
>                 missingContrecPtr = xlogreader->missingContrecPtr;
>                 elog(LOG, "missingContrecPtr == %ld", missingContrecPtr);
>             }
> 
> If this is a bug as it appears, it appears the original patch
> to resolve this issue is not needed and the ideal fix
> Is to ensure that a standby does not set
> missingContrecPtr.
> 
> Would like to see what others think.

Due to lack of information, I don't have a clear idea of what is
happening here. If that change "fixed" the "issue", that seems to mean
that crash recovery passed an immature contrecord (@6/A8000060) but
recovery did not stop at the time then continues until 10/B1FA3D88.  A
possibility is, crash recovery ended at the immature contrecord then
the server moved to archive recovery mode and was able to continue
from the same (aborted) LSN on the archived WAL. This means 6/A8 in
pg_wal had been somehow removed after archived, or the segment 6/A7
(not A8) in both pg_xlog and archive are in different histories, which
seems to me what we wanted to prevent by the
XLOG_OVERWRITE_CONTRECORD.  Didn't you use the same archive content
for repeated recovery testing?

And from the fact that entering crash recovery means the cluster
didn't have an idea of how far it should recover until consistency,
that is contained in backup label control file.  That could happen
when a crashed primary is as-is reused as the new standby of the new
primary.

So.. I'd like to hear exactly what you did as the testing.

When standby mode is requested, if crash recovery fails with immature
contrecord, I think we shouldn't continue recovery. But I'm not sure
we need to explictly reject that case.  Further study is needed..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
>    So.. I'd like to hear exactly what you did as the testing.

>    When standby mode is requested, if crash recovery fails with immature
>    contrecord, I think we shouldn't continue recovery. But I'm not sure
>    we need to explictly reject that case.  Further study is needed..


Here is more details about my findings and testing.

Even with commit 9d92582abf918215d27659d45a4c9e78bda50aff 
we still see the issue with post promotion checkpoint
resulting in "request to flush past end of generated WAL;"

i.e.

2022-05-25 00:35:38 UTC::@:[371]:LOG:  checkpoint starting: immediate force wait wal time
2022-05-25 00:35:38 UTC::@:[371]:LOG:  request to flush past end of generated WAL; request 10/B1FA3D88, currpos
7/A8000060
2022-05-25 00:35:38 UTC::@:[371]:PANIC:  xlog flush request 10/B1FA3D88 is not satisfied --- flushed only to
7/A8000060
2022-05-25 00:35:38 UTC:172.31.26.238(38610):administrator@postgres:[23433]:ERROR:  current transaction is aborted,
commandsignored until end of transaction block
 

The intent of commit 9d92582abf918215d27659d45a4c9e78bda50aff 
was to make sure the standby skips the overwrite contrecord.

However, we still see "missingContrecPtr" is being set 
on the standby before promotion and after the instance is 
promoted, the missingContrecPtr is written to WAL 
and the subsequent flush throws a "PANIC:  xlog flush request"

To Reproduce using TAP tests; 

1) apply the attached patch 0001-Patch_to_repro.patch to the head branch.
   This patch adds logging when an OverWriteContRecord
   is created and comments out the invalidation code
   added in commit  9d92582abf918215d27659d45a4c9e78bda50aff
   
2) Run a tap test under "test/recovery"
   make check PROVE_TESTS='t/026_overwrite_contrecord.pl'

3) What is observed in the logfiles for both the standby
   and primary instance in the tap test is 
   that an overwrite contrecord is created on the primary, 
   which is correct, but also on the standby after promotion, 
   which is incorrect. This is incorrect as the contrecord

simseih@88665a22795f recovery % cat tmp_check/log/*prim* | grep 'creating\|promo'
2022-05-27 13:17:50.843 CDT [98429] LOG:  creating overwrite contrecord at 0/2000058 for aborted_lsn 0/1FFD000

simseih@88665a22795f recovery % cat tmp_check/log/*stan* | grep 'creating\|promo'
2022-05-27 13:17:51.361 CDT [98421] LOG:  received promote request
2022-05-27 13:17:51.394 CDT [98421] LOG:  creating overwrite contrecord at 0/2000058 for aborted_lsn 0/1FFD000
simseih@88665a22795f recovery %

What we found:

1. missingContrecPtr is set when 
   StandbyMode is false, therefore
   only a writer should set this value
   and a record is then sent downstream.

   But a standby going through crash 
   recovery will always have StandbyMode = false,
   causing the missingContrecPtr to be incorrectly
   set.

2. If StandbyModeRequested is checked instead,
     we ensure that a standby will not set a 
     missingContrecPtr.

3. After applying the patch below, the tap test succeeded

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 5ee90b6..a727aaf 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -2977,7 +2977,7 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
                         * we'll write a record to indicate to downstream WAL readers that
                         * that portion is to be ignored.
                         */
-                       if (!StandbyMode &&
+                       if (!StandbyModeRequested &&
                                !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
                        {
                                abortedRecPtr = xlogreader->abortedRecPtr;

So, it might be that the best fix is not the commit in 9d92582abf918215d27659d45a4c9e78bda50aff
but to check StandbyModeRequested = false before setting 
missingContrecPtr.

Thank you
---
Sami Imseih
Amazon Web Services


Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Hsu, John"
Date:
Hi,

As an update we have a test suite running for the last few days with 
constant workload and failovers/promotions. With the proposed change 
from Sami we no longer see PANICs due to this.

The comment indicates that the abortedRecPtr and missingContRecPtr 
should only be set when the database is a writer since it's used to 
write a new record to downstream readers. !StandbyMode is a poor proxy 
for this as Sami mentioned since there's cases when a replica is going 
through crash recovery that would then set this record.

Thanks,

John H

On 5/27/22 12:01 PM, Imseih (AWS), Sami wrote:
>>     So.. I'd like to hear exactly what you did as the testing.
>>     When standby mode is requested, if crash recovery fails with immature
>>     contrecord, I think we shouldn't continue recovery. But I'm not sure
>>     we need to explictly reject that case.  Further study is needed..
>
> Here is more details about my findings and testing.
>
> Even with commit 9d92582abf918215d27659d45a4c9e78bda50aff
> we still see the issue with post promotion checkpoint
> resulting in "request to flush past end of generated WAL;"
>
> i.e.
>
> 2022-05-25 00:35:38 UTC::@:[371]:LOG:  checkpoint starting: immediate force wait wal time
> 2022-05-25 00:35:38 UTC::@:[371]:LOG:  request to flush past end of generated WAL; request 10/B1FA3D88, currpos
7/A8000060
> 2022-05-25 00:35:38 UTC::@:[371]:PANIC:  xlog flush request 10/B1FA3D88 is not satisfied --- flushed only to
7/A8000060
> 2022-05-25 00:35:38 UTC:172.31.26.238(38610):administrator@postgres:[23433]:ERROR:  current transaction is aborted,
commandsignored until end of transaction block
 
>
> The intent of commit 9d92582abf918215d27659d45a4c9e78bda50aff
> was to make sure the standby skips the overwrite contrecord.
>
> However, we still see "missingContrecPtr" is being set
> on the standby before promotion and after the instance is
> promoted, the missingContrecPtr is written to WAL
> and the subsequent flush throws a "PANIC:  xlog flush request"
>
> To Reproduce using TAP tests;
>
> 1) apply the attached patch 0001-Patch_to_repro.patch to the head branch.
>     This patch adds logging when an OverWriteContRecord
>     is created and comments out the invalidation code
>     added in commit  9d92582abf918215d27659d45a4c9e78bda50aff
>     
> 2) Run a tap test under "test/recovery"
>     make check PROVE_TESTS='t/026_overwrite_contrecord.pl'
>
> 3) What is observed in the logfiles for both the standby
>     and primary instance in the tap test is
>     that an overwrite contrecord is created on the primary,
>     which is correct, but also on the standby after promotion,
>     which is incorrect. This is incorrect as the contrecord
>
> simseih@88665a22795f recovery % cat tmp_check/log/*prim* | grep 'creating\|promo'
> 2022-05-27 13:17:50.843 CDT [98429] LOG:  creating overwrite contrecord at 0/2000058 for aborted_lsn 0/1FFD000
>
> simseih@88665a22795f recovery % cat tmp_check/log/*stan* | grep 'creating\|promo'
> 2022-05-27 13:17:51.361 CDT [98421] LOG:  received promote request
> 2022-05-27 13:17:51.394 CDT [98421] LOG:  creating overwrite contrecord at 0/2000058 for aborted_lsn 0/1FFD000
> simseih@88665a22795f recovery %
>
> What we found:
>
> 1. missingContrecPtr is set when
>     StandbyMode is false, therefore
>     only a writer should set this value
>     and a record is then sent downstream.
>
>     But a standby going through crash
>     recovery will always have StandbyMode = false,
>     causing the missingContrecPtr to be incorrectly
>     set.
>
> 2. If StandbyModeRequested is checked instead,
>       we ensure that a standby will not set a
>       missingContrecPtr.
>
> 3. After applying the patch below, the tap test succeeded
>
> diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
> index 5ee90b6..a727aaf 100644
> --- a/src/backend/access/transam/xlogrecovery.c
> +++ b/src/backend/access/transam/xlogrecovery.c
> @@ -2977,7 +2977,7 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
>                           * we'll write a record to indicate to downstream WAL readers that
>                           * that portion is to be ignored.
>                           */
> -                       if (!StandbyMode &&
> +                       if (!StandbyModeRequested &&
>                                  !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
>                          {
>                                  abortedRecPtr = xlogreader->abortedRecPtr;
>
> So, it might be that the best fix is not the commit in 9d92582abf918215d27659d45a4c9e78bda50aff
> but to check StandbyModeRequested = false before setting
> missingContrecPtr.
>
> Thank you
> ---
> Sami Imseih
> Amazon Web Services
>



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Michael Paquier
Date:
On Fri, May 27, 2022 at 07:01:37PM +0000, Imseih (AWS), Sami wrote:
> What we found:
>
> 1. missingContrecPtr is set when
>    StandbyMode is false, therefore
>    only a writer should set this value
>    and a record is then sent downstream.
>
>    But a standby going through crash
>    recovery will always have StandbyMode = false,
>    causing the missingContrecPtr to be incorrectly
>    set.

That stands as true as far as I know, StandbyMode would be switched
only once we get out of crash recovery, and only if archive recovery
completes when there is a restore_command.

> 2. If StandbyModeRequested is checked instead,
>      we ensure that a standby will not set a
>      missingContrecPtr.
>
> 3. After applying the patch below, the tap test succeeded

Hmm.  I have not looked at that in depth, but if the intention is to
check that the database is able to write WAL, looking at
XLogCtl->SharedRecoveryState would be the way to go because that's the
flip switching between crash recovery, archive recovery and the end of
recovery (when WAL can be safely written).

The check in xlogrecovery_redo() still looks like a good thing to have
anyway, because we know that we can safely skip the contrecord.  Now,
for any patch produced, could the existing TAP test be extended so as
we are able to get a PANIC even if we keep around the sanity check in
xlogrecovery_redo().  That would likely involve an immediate shutdown
of a standby followed by a start sequence?
--
Michael

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Mon, 20 Jun 2022 16:13:43 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, May 27, 2022 at 07:01:37PM +0000, Imseih (AWS), Sami wrote:
> > What we found:
> > 
> > 1. missingContrecPtr is set when 
> >    StandbyMode is false, therefore
> >    only a writer should set this value
> >    and a record is then sent downstream.
> > 
> >    But a standby going through crash 
> >    recovery will always have StandbyMode = false,
> >    causing the missingContrecPtr to be incorrectly
> >    set.
> 
> That stands as true as far as I know, StandbyMode would be switched
> only once we get out of crash recovery, and only if archive recovery
> completes when there is a restore_command.

Anyway the change;
-        abortedRecPtr = InvalidXLogRecPtr;
-        missingContrecPtr = InvalidXLogRecPtr;
+        //abortedRecPtr = InvalidXLogRecPtr;
+        //missingContrecPtr = InvalidXLogRecPtr;

Is injecting a bug that invalid "aborted contrecord" record can be
injected for certain conditions.  If a bug is intentionally injected,
it's quite natrual that some behavior gets broken..

> > 2. If StandbyModeRequested is checked instead,
> >      we ensure that a standby will not set a 
> >      missingContrecPtr.
> > 
> > 3. After applying the patch below, the tap test succeeded
> 
> Hmm.  I have not looked at that in depth, but if the intention is to
> check that the database is able to write WAL, looking at
> XLogCtl->SharedRecoveryState would be the way to go because that's the
> flip switching between crash recovery, archive recovery and the end of
> recovery (when WAL can be safely written).

What we are checking there is "we are going to write WAL", not "we are
writing".

(!StanbyMode && StandbyModeRequested) means the server have been
running crash-recovery before starting archive recovery.  In that
case, the server is supposed to continue with archived WAL without
insering a record.  However, if no archived WAL available and the
server promoted, the server needs to insert an "aborted contrecord"
record again.  (I'm not sure how that case happens in the field,
though.)

So I don't think !StandbyModeRequested is the right thing
here. Actually the attached test fails with the fix.

> The check in xlogrecovery_redo() still looks like a good thing to have
> anyway, because we know that we can safely skip the contrecord.  Now,
> for any patch produced, could the existing TAP test be extended so as
> we are able to get a PANIC even if we keep around the sanity check in
> xlogrecovery_redo().  That would likely involve an immediate shutdown
> of a standby followed by a start sequence?

Thus, I still don't see what have happened at Imseih's hand, but I can
cause PANIC with a bit tricky steps, which I don't think valid.  This
is what I wanted to know the exact steps to cause the PANIC.

The attached 1 is the PoC of the TAP test (it uses system()..), and
the second is a tentative fix for that.  (I don't like the fix, too,
though...)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Robert Haas
Date:
On Mon, Jun 20, 2022 at 7:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> > Hmm.  I have not looked at that in depth, but if the intention is to
> > check that the database is able to write WAL, looking at
> > XLogCtl->SharedRecoveryState would be the way to go because that's the
> > flip switching between crash recovery, archive recovery and the end of
> > recovery (when WAL can be safely written).
>
> What we are checking there is "we are going to write WAL", not "we are
> writing".
>
> (!StanbyMode && StandbyModeRequested) means the server have been
> running crash-recovery before starting archive recovery.  In that
> case, the server is supposed to continue with archived WAL without
> insering a record.  However, if no archived WAL available and the
> server promoted, the server needs to insert an "aborted contrecord"
> record again.  (I'm not sure how that case happens in the field,
> though.)
>
> So I don't think !StandbyModeRequested is the right thing
> here. Actually the attached test fails with the fix.

It seems to me that what we want to do is: if we're about to start
allowing WAL writes, then consider whether to insert an aborted
contrecord record. Now, if we are about to start allowing WAL write,
we must determine the LSN at which the first such record will be
written. Then there are two possibilities: either that LSN is on an
existing record boundary, or it isn't. In the former case, no aborted
contrecord record is required. In the latter case, we're writing at
LSN which would have been in the middle of a previous record, except
that the record in question was never finished or at least we don't
have all of it. So the first WAL we insert at our chosen starting LSN
must be an aborted contrecord record.

I'm not quite sure I understand the nature of the remaining bug here,
but this logic seems quite sketchy to me:

            /*
             * When not in standby mode we find that WAL ends in an incomplete
             * record, keep track of that record.  After recovery is done,
             * we'll write a record to indicate to downstream WAL readers that
             * that portion is to be ignored.
             */
            if (!StandbyMode &&
                !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
            {
                abortedRecPtr = xlogreader->abortedRecPtr;
                missingContrecPtr = xlogreader->missingContrecPtr;
            }

I don't see what StandbyMode has to do with anything here. The
question is whether the place where we're going to begin writing WAL
is on an existing record boundary or not. If it so happens that when
StandbyMode is turned on we can never decide to promote in the middle
of a record, then maybe there's no need to track this information when
StandbyMode = true, but I don't see a good reason why we should make
that assumption. What if in the future somebody added a command that
says "don't keep trying to read more WAL, just promote RIGHT HERE?". I
think this logic would surely be incorrect in that case. It feels to
me like the right thing to do is to always keep track if WAL ends in
an incomplete record, and then when we promote, we write an aborted
contrecord record if WAL ended in an incomplete record, full stop.

Now, we do need to keep in mind that, while in StandbyMode, we might
reach the end of WAL more than once, because we have a polling loop
that looks for more WAL. So it does not work to just set the values
once and then assume that's the whole truth forever. But why not
handle that by storing the abortedRecPtr and missingContrecPtr
unconditionally after every call to XLogPrefetcherReadRecord()? They
will go back and forth between XLogRecPtrIsInvalid and other values
many times but the last value should -- I think -- be however things
ended up at the point where we decided to stop reading WAL.

I haven't really dived into this issue much so I might be totally off
base, but it just doesn't feel right to me that this should depend on
whether we're in standby mode. That seems at best incidentally related
to whether an aborted contrecord record is required.

P.S. "aborted contrecord record" is really quite an awkward turn of phrase!

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Mon, 20 Jun 2022 11:57:20 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> It seems to me that what we want to do is: if we're about to start
> allowing WAL writes, then consider whether to insert an aborted
> contrecord record. Now, if we are about to start allowing WAL write,
> we must determine the LSN at which the first such record will be
> written. Then there are two possibilities: either that LSN is on an
> existing record boundary, or it isn't. In the former case, no aborted
> contrecord record is required. In the latter case, we're writing at
> LSN which would have been in the middle of a previous record, except
> that the record in question was never finished or at least we don't
> have all of it. So the first WAL we insert at our chosen starting LSN
> must be an aborted contrecord record.

Right.

> I'm not quite sure I understand the nature of the remaining bug here,
> but this logic seems quite sketchy to me:
> 
>             /*
>              * When not in standby mode we find that WAL ends in an incomplete
>              * record, keep track of that record.  After recovery is done,
>              * we'll write a record to indicate to downstream WAL readers that
>              * that portion is to be ignored.
>              */
>             if (!StandbyMode &&
>                 !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
>             {
>                 abortedRecPtr = xlogreader->abortedRecPtr;
>                 missingContrecPtr = xlogreader->missingContrecPtr;
>             }
> 
> I don't see what StandbyMode has to do with anything here. The
> question is whether the place where we're going to begin writing WAL
> is on an existing record boundary or not. If it so happens that when
> StandbyMode is turned on we can never decide to promote in the middle
> of a record, then maybe there's no need to track this information when
> StandbyMode = true, but I don't see a good reason why we should make
> that assumption. What if in the future somebody added a command that

Right.  I came to the same conclusion before reading this section. It
is rearer than other cases but surely possible.

> says "don't keep trying to read more WAL, just promote RIGHT HERE?". I
> think this logic would surely be incorrect in that case. It feels to
> me like the right thing to do is to always keep track if WAL ends in
> an incomplete record, and then when we promote, we write an aborted
> contrecord record if WAL ended in an incomplete record, full stop.

Agreed. Actually, with the second patch applied, removing !StandbyMode
from the condition makes no difference in behavior.

> Now, we do need to keep in mind that, while in StandbyMode, we might
> reach the end of WAL more than once, because we have a polling loop
> that looks for more WAL. So it does not work to just set the values
> once and then assume that's the whole truth forever. But why not
> handle that by storing the abortedRecPtr and missingContrecPtr
> unconditionally after every call to XLogPrefetcherReadRecord()? They
> will go back and forth between XLogRecPtrIsInvalid and other values
> many times but the last value should -- I think -- be however things
> ended up at the point where we decided to stop reading WAL.

Unfortunately it doesn't work because we read a record already known
to be complete again at the end of recovery.  It is the reason of
"abortedRecPtr < xlogreader->EndRecPtr" in my PoC patch.  Without it,
abrotedRecPtr is erased when it is actually needed.  I don't like that
expedient-looking condition, but the strict condition for resetting
abortedRecPtr is iff "we have read a complete record at the same or
grater LSN ofabortedRecPtr"...

Come to think of this, I noticed that we can get rid of the
file-global abortedRecPtr by letting FinishWalRecovery copy
abortedRecPtr *before* reading the last record. This also allows us to
remove the "expedient" condition.


> I haven't really dived into this issue much so I might be totally off
> base, but it just doesn't feel right to me that this should depend on
> whether we're in standby mode. That seems at best incidentally related
> to whether an aborted contrecord record is required.
> 
> P.S. "aborted contrecord record" is really quite an awkward turn of phrase!

Thats true!  Always open for better phrasings:(

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Michael Paquier
Date:
On Tue, Jun 21, 2022 at 10:35:33AM +0900, Kyotaro Horiguchi wrote:
> At Mon, 20 Jun 2022 11:57:20 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
>> says "don't keep trying to read more WAL, just promote RIGHT HERE?". I
>> think this logic would surely be incorrect in that case. It feels to
>> me like the right thing to do is to always keep track if WAL ends in
>> an incomplete record, and then when we promote, we write an aborted
>> contrecord record if WAL ended in an incomplete record, full stop.

Agreed.  The state is maintained in the WAL reader as far as I
understand the logic behind it.

>> Now, we do need to keep in mind that, while in StandbyMode, we might
>> reach the end of WAL more than once, because we have a polling loop
>> that looks for more WAL. So it does not work to just set the values
>> once and then assume that's the whole truth forever. But why not
>> handle that by storing the abortedRecPtr and missingContrecPtr
>> unconditionally after every call to XLogPrefetcherReadRecord()? They
>> will go back and forth between XLogRecPtrIsInvalid and other values
>> many times but the last value should -- I think -- be however things
>> ended up at the point where we decided to stop reading WAL.
>
> Unfortunately it doesn't work because we read a record already known
> to be complete again at the end of recovery.  It is the reason of
> "abortedRecPtr < xlogreader->EndRecPtr" in my PoC patch.  Without it,
> abrotedRecPtr is erased when it is actually needed.  I don't like that
> expedient-looking condition, but the strict condition for resetting
> abortedRecPtr is iff "we have read a complete record at the same or
> grater LSN ofabortedRecPtr"...
>
> Come to think of this, I noticed that we can get rid of the
> file-global abortedRecPtr by letting FinishWalRecovery copy
> abortedRecPtr *before* reading the last record. This also allows us to
> remove the "expedient" condition.

Interesting point, though I am not sure to get why this is safe
compared to the existing solution of setting
missingContrecPtr/abortedRecPtr while reading a set of records when we
look for the last LSN at the end of recovery.

>> I haven't really dived into this issue much so I might be totally off
>> base, but it just doesn't feel right to me that this should depend on
>> whether we're in standby mode. That seems at best incidentally related
>> to whether an aborted contrecord record is required.
>>
>> P.S. "aborted contrecord record" is really quite an awkward turn of phrase!
>
> Thats true!  Always open for better phrasings:(

The term that would be appropriate here is continuation record?
contrecord is a bit confusing for French-speakers, actually, as
"contre" means "against" ;)

By the way, something that itches me with the proposed patch is that
we don't actually stress the second problem reported, which is that
the use of StandbyMode is incorrect.  Isn't that a sign that we'd
better extend more the tests of 026_overwrite_contrecord.pl with more
end-of-recovery scenarios?  Two things that immediately come to mind
are the use of recovery_target_lsn that would force a promotion in the
middle of a continuation record and cascading standbys to make sure
that these get the extra OVERWRITE_CONTRECORD record generated at the
end of recovery.
--
Michael

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Robert Haas
Date:
On Mon, Jun 20, 2022 at 9:35 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> Unfortunately it doesn't work because we read a record already known
> to be complete again at the end of recovery.  It is the reason of
> "abortedRecPtr < xlogreader->EndRecPtr" in my PoC patch.  Without it,
> abrotedRecPtr is erased when it is actually needed.  I don't like that
> expedient-looking condition, but the strict condition for resetting
> abortedRecPtr is iff "we have read a complete record at the same or
> grater LSN ofabortedRecPtr"...

Yeah, we need to work around that somehow. I noticed the
record-rereading behavior when I was working on some patches for this
release cycle and I think what we ought to do is get rid of that. It
serves no purpose other than to make things complicated. However, we
shouldn't back-patch a change like that, I think, so we'll need to
work around the issue somehow. Perhaps it'd be better to return these
values to the caller somehow and then the caller can decide whether to
save them based on context. The last re-read can choose not to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
>    Thus, I still don't see what have happened at Imseih's hand, but I can
>    cause PANIC with a bit tricky steps, which I don't think valid.  This
>    is what I wanted to know the exact steps to cause the PANIC.

>    The attached 1 is the PoC of the TAP test (it uses system()..), and
>    the second is a tentative fix for that.  (I don't like the fix, too,
>    though...)

It is been difficult to get a generic repro, but the way we reproduce
Is through our test suite. To give more details, we are running tests
In which we constantly failover and promote standbys. The issue
surfaces after we have gone through a few promotions which occur
every few hours or so ( not really important but to give context ).

I am adding some additional debugging  to see if I can draw a better
picture of what is happening. Will also give aborted_contrec_reset_3.patch 
a go, although I suspect it will not handle the specific case we are deaing with.


Regards,

Sami imseih
Amazon Web Services


Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Michael Paquier
Date:
On Fri, Jun 24, 2022 at 04:17:34PM +0000, Imseih (AWS), Sami wrote:
> It is been difficult to get a generic repro, but the way we reproduce
> Is through our test suite. To give more details, we are running tests
> In which we constantly failover and promote standbys. The issue
> surfaces after we have gone through a few promotions which occur
> every few hours or so ( not really important but to give context ).

Hmm.  Could you describe exactly the failover scenario you are using?
Is the test using a set of cascading standbys linked to the promoted
one?  Are the standbys recycled from the promoted nodes with pg_rewind
or created from scratch with a new base backup taken from the
freshly-promoted primary?  I have been looking more at this thread
through the day but I don't see a remaining issue.  It could be
perfectly possible that we are missing a piece related to the handling
of those new overwrite contrecords in some cases, like in a rewind.

> I am adding some additional debugging  to see if I can draw a better
> picture of what is happening. Will also give aborted_contrec_reset_3.patch
> a go, although I suspect it will not handle the specific case we are deaing with.

Yeah, this is not going to change much things if you are still seeing
an issue.  This patch does not change the logic, aka it just
simplifies the tracking of the continuation record data, resetting it
when a complete record has been read.  Saying that, getting rid of the
dependency on StandbyMode because we cannot promote in the middle of a
record is nice (my memories around that were a bit blurry but even
recovery_target_lsn would not recover in the middle of an continuation
record), and this is not bug so there is limited reason to backpatch
this part of the change.
--
Michael

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Mon, 27 Jun 2022 15:02:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, Jun 24, 2022 at 04:17:34PM +0000, Imseih (AWS), Sami wrote:
> > It is been difficult to get a generic repro, but the way we reproduce
> > Is through our test suite. To give more details, we are running tests
> > In which we constantly failover and promote standbys. The issue
> > surfaces after we have gone through a few promotions which occur
> > every few hours or so ( not really important but to give context ).
> 
> Hmm.  Could you describe exactly the failover scenario you are using?
> Is the test using a set of cascading standbys linked to the promoted
> one?  Are the standbys recycled from the promoted nodes with pg_rewind
> or created from scratch with a new base backup taken from the
> freshly-promoted primary?  I have been looking more at this thread
> through the day but I don't see a remaining issue.  It could be
> perfectly possible that we are missing a piece related to the handling
> of those new overwrite contrecords in some cases, like in a rewind.
> 
> > I am adding some additional debugging  to see if I can draw a better
> > picture of what is happening. Will also give aborted_contrec_reset_3.patch 
> > a go, although I suspect it will not handle the specific case we are deaing with.
> 
> Yeah, this is not going to change much things if you are still seeing
> an issue.  This patch does not change the logic, aka it just

True. That is a siginicant hint on what happened at the time.

- Are there only two hosts in the replication set?  I concerned on
  whether it is a cascading set or not.

- Exactly what are you performing at every failover?  Especially do
  the steps contain pg_rewind, and do you copy pg_wal and/or archive
  files between the failover hosts?

> simplifies the tracking of the continuation record data, resetting it
> when a complete record has been read.  Saying that, getting rid of the
> dependency on StandbyMode because we cannot promote in the middle of a
> record is nice (my memories around that were a bit blurry but even
> recovery_target_lsn would not recover in the middle of an continuation
> record), and this is not bug so there is limited reason to backpatch
> this part of the change.

Agreed.  In the first place my "repro" (or the test case) is a bit too
intricated to happen in the real field.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
I'd like to look into the WAL segments related to the failure.

Mmm... With the patch, xlogreader->abortedRecPtr is valid only and
always when the last read failed record was an aborted contrec. If
recovery ends here the first insereted record is an "aborted contrec"
record.  I still see it as the only chance that an aborted contrecord
is followed by a non-"aborted contrec" record is that recovery somehow
fetches two consecutive WAL segments that are inconsistent at the
boundary.


I found the reason that the TAP test doesn't respond to the first
proposed patch (the below).

-            if (!StandbyMode &&
+            if (!StandbyModeRequested &&
                !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))

The cause was that I disabled standby-mode in the test. The change
affects only while standby mode is on, which was to make the test
reliable and simpler. The first attached detects the same error (in a
somwhat maybe-unstable way) and responds to the fix above, and also
responds to the aborted_contrec_reset_3.patch.

So, aborted_contrec_reset_3 looks closer to the issue than before.

Would you mind trying the second attached to abtain detailed log on
your testing environment? With the patch, the modified TAP test yields
the log lines like below.

2022-06-28 15:49:20.661 JST [165472] LOG:  ### [A] @0/1FFD338: abort=(0/1FFD338)0/0, miss=(0/2000000)0/0, SbyMode=0,
SbyModeReq=1
...
2022-06-28 15:49:20.681 JST [165472] LOG:  ### [F] @0/2094610: abort=(0/0)0/1FFD338, miss=(0/0)0/2000000, SbyMode=1,
SbyModeReq=1
...
2022-06-28 15:49:20.767 JST [165472] LOG:  ### [S] @0/2094610: abort=(0/0)0/1FFD338, miss=(0/0)0/2000000, SbyMode=0,
SbyModeReq=1
...
2022-06-28 15:49:20.777 JST [165470] PANIC:  xlog flush request 0/2094610 is not satisfied --- flushed only to
0/2000088

In this example, abortedRecPtr is set at the first line and recovery
continued to 2094610 but abortedRecPtr is not reset then PANICed. ([A]
means aborted contrec falure. [F] and [S] are failed and succeeded
reads respectively.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
> Would you mind trying the second attached to abtain detailed log on
> your testing environment? With the patch, the modified TAP test yields
> the log lines like below.

Thanks for this. I will apply this to the testing environment and
will share the output.

Regards,

Sami Imseih
Amazon Web Services (AWS)


Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"alvherre@alvh.no-ip.org"
Date:
Hello,

On 2022-Jun-29, Imseih (AWS), Sami wrote:

> > Would you mind trying the second attached to abtain detailed log on
> > your testing environment? With the patch, the modified TAP test yields
> > the log lines like below.
> 
> Thanks for this. I will apply this to the testing environment and
> will share the output.

Any luck with this?

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem."  (Tom Lane)



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
> Any luck with this?

Apologies for the delay, as I have been away. 
I will test this next week and report back my findings.

Thanks

Sami Imseih
Amazon Web Services (AWS)


On 6/28/22, 2:10 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 



    I'd like to look into the WAL segments related to the failure.

    Mmm... With the patch, xlogreader->abortedRecPtr is valid only and
    always when the last read failed record was an aborted contrec. If
    recovery ends here the first insereted record is an "aborted contrec"
    record.  I still see it as the only chance that an aborted contrecord
    is followed by a non-"aborted contrec" record is that recovery somehow
    fetches two consecutive WAL segments that are inconsistent at the
    boundary.


    I found the reason that the TAP test doesn't respond to the first
    proposed patch (the below).

    -                       if (!StandbyMode &&
    +                       if (!StandbyModeRequested &&
                                    !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))

    The cause was that I disabled standby-mode in the test. The change
    affects only while standby mode is on, which was to make the test
    reliable and simpler. The first attached detects the same error (in a
    somwhat maybe-unstable way) and responds to the fix above, and also
    responds to the aborted_contrec_reset_3.patch.

    So, aborted_contrec_reset_3 looks closer to the issue than before.

    Would you mind trying the second attached to abtain detailed log on
    your testing environment? With the patch, the modified TAP test yields
    the log lines like below.

    2022-06-28 15:49:20.661 JST [165472] LOG:  ### [A] @0/1FFD338: abort=(0/1FFD338)0/0, miss=(0/2000000)0/0,
SbyMode=0,SbyModeReq=1
 
    ...
    2022-06-28 15:49:20.681 JST [165472] LOG:  ### [F] @0/2094610: abort=(0/0)0/1FFD338, miss=(0/0)0/2000000,
SbyMode=1,SbyModeReq=1
 
    ...
    2022-06-28 15:49:20.767 JST [165472] LOG:  ### [S] @0/2094610: abort=(0/0)0/1FFD338, miss=(0/0)0/2000000,
SbyMode=0,SbyModeReq=1
 
    ...
    2022-06-28 15:49:20.777 JST [165470] PANIC:  xlog flush request 0/2094610 is not satisfied --- flushed only to
0/2000088

    In this example, abortedRecPtr is set at the first line and recovery
    continued to 2094610 but abortedRecPtr is not reset then PANICed. ([A]
    means aborted contrec falure. [F] and [S] are failed and succeeded
    reads respectively.)

    regards.

    --
    Kyotaro Horiguchi
    NTT Open Source Software Center


Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
> Would you mind trying the second attached to abtain detailed log on
> your testing environment? With the patch, the modified TAP test yields
> the log lines like below.

I applied the logging patch to 13.7 ( attached is the backport ) and repro'd the 
Issue.

I stripped out the relevant parts of the file. Let me know if this is
helpful.

postgresql.log.2022-08-05-17:2022-08-05 17:18:51 UTC::@:[359]:LOG:  ### [S] @0/48A3400: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=0,SbyModeReq=1
 
postgresql.log.2022-08-05-17:2022-08-05 17:18:51 UTC::@:[359]:LOG:  ### [F] @0/10000000: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=0,SbyModeReq=1
 
postgresql.log.2022-08-05-17:2022-08-05 17:22:21 UTC::@:[359]:LOG:  ### [S] @0/10000060: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=1,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:38:00 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=1,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:38:05 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=1,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:38:06 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=1,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:38:11 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=1,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:38:14 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=1,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:38:14 UTC::@:[359]:LOG:  ### [S] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=0,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:50:13 UTC::@:[359]:LOG:  ### [S] @6/B8000198: abort=(0/0)0/0, miss=(0/0)0/0,
SbyMode=0,SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:50:20 UTC::@:[359]:LOG:  ### [A] @6/F3FFFF20: abort=(6/F3FFFF20)0/0,
miss=(6/F4000000)0/0,SbyMode=0, SbyModeReq=1
 
postgresql.log.2022-08-05-18:2022-08-05 18:50:20 UTC::@:[359]:LOG:  ### [S] @6/F4000030: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 
postgresql.log.2022-08-05-20:2022-08-05 20:38:06 UTC::@:[359]:LOG:  ### [F] @E/17A78298: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 
postgresql.log.2022-08-05-20:2022-08-05 20:38:11 UTC::@:[359]:LOG:  ### [F] @E/17A78298: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 
postgresql.log.2022-08-05-20:2022-08-05 20:38:13 UTC::@:[359]:LOG:  ### [F] @E/17A78298: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 
postgresql.log.2022-08-05-20:2022-08-05 20:38:18 UTC::@:[359]:LOG:  ### [F] @E/17A78298: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 
postgresql.log.2022-08-05-20:2022-08-05 20:38:21 UTC::@:[359]:LOG:  ### [F] @E/17A78298: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 
postgresql.log.2022-08-05-20:2022-08-05 20:38:21 UTC::@:[359]:LOG:  ### [S] @E/17A78298: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=0, SbyModeReq=1
 
....
.......
.........

2022-08-05 20:38:21 UTC::@:[359]:LOG:  received promote request
2022-08-05 20:38:21 UTC::@:[359]:LOG:  redo done at E/17A78270
2022-08-05 20:38:21 UTC::@:[359]:LOG:  last completed transaction was at log time 2022-08-05 20:38:06.398382+00
recovering 000000020000000E00000005
000000020000000E00000005 archive /rdsdbdata/log/restore/pg-wal-archive.901.* is not yet downloaded, exiting restore
scriptfor now
 
2022-08-05 20:38:21 UTC::@:[359]:LOG:  ### [S] @E/17A78298: abort=(0/0)6/F3FFFF20, miss=(0/0)6/F4000000, SbyMode=0,
SbyModeReq=1
recovering 00000003.history
2022-08-05 20:38:21 UTC::@:[359]:LOG:  selected new timeline ID: 3
2022-08-05 20:38:21 UTC::@:[359]:LOG:  archive recovery complete
recovering 00000002.history
2022-08-05 20:38:22 UTC::@:[373]:LOG:  request to flush past end of generated WAL; request D/C7A425C0, currpos
6/F4000088
2022-08-05 20:38:22 UTC::@:[373]:CONTEXT:  writing block 0 of relation base/14301/26678
2022-08-05 20:38:22 UTC::@:[373]:ERROR:  xlog flush request D/C7A425C0 is not satisfied --- flushed only to 6/F4000088
2022-08-05 20:38:22 UTC::@:[373]:CONTEXT:  writing block 0 of relation base/14301/26678
2022-08-05 20:38:22 UTC::@:[357]:LOG:  database system is ready to accept connections
2022-08-05 20:38:23 UTC::@:[373]:LOG:  checkpoint starting: immediate force wait wal time
2022-08-05 20:38:23 UTC::@:[373]:LOG:  request to flush past end of generated WAL; request D/F5873CF0, currpos
6/F8000060
2022-08-05 20:38:23 UTC::@:[373]:PANIC:  xlog flush request D/F5873CF0 is not satisfied --- flushed only to 6/F8000060
2022-08-05 20:38:23 UTC::@:[357]:LOG:  checkpointer process (PID 373) was terminated by signal 6: Aborted
2022-08-05 20:38:23 UTC::@:[357]:LOG:  terminating any other active server processes
/etc/rds/dbbin/pgscripts/rds_wal_archive: line 19: 14546 Quit                    (core dumped) usleep 50000
2022-08-05 20:38:23 UTC::@:[375]:FATAL:  archive command failed with exit code 131
2022-08-05 20:38:23 UTC::@:[375]:DETAIL:  The failed archive command was: /etc/rds/dbbin/pgscripts/rds_wal_archive
pg_wal/00000003000000060000003D


Thanks,

Sami Imseih
Amazon Web Services



Attachment

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
At Fri, 5 Aug 2022 21:28:16 +0000, "Imseih (AWS), Sami" <simseih@amazon.com> wrote in 
> > Would you mind trying the second attached to abtain detailed log on
> > your testing environment? With the patch, the modified TAP test yields
> > the log lines like below.
> 
> I applied the logging patch to 13.7 ( attached is the backport ) and repro'd the 
> Issue.
> 
> I stripped out the relevant parts of the file. Let me know if this is
> helpful.

Thank you very much!

> postgresql.log.2022-08-05-17:2022-08-05 17:18:51 UTC::@:[359]:LOG:  ### [F] @0/10000000: abort=(0/0)0/0,
miss=(0/0)0/0,SbyMode=0, SbyModeReq=1
 
> postgresql.log.2022-08-05-17:2022-08-05 17:22:21 UTC::@:[359]:LOG:  ### [S] @0/10000060: abort=(0/0)0/0,
miss=(0/0)0/0,SbyMode=1, SbyModeReq=1
 

The server seem to have started as a standby after crashing a
primary. Is it correct?

> postgresql.log.2022-08-05-18:2022-08-05 18:38:14 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0,
miss=(0/0)0/0,SbyMode=1, SbyModeReq=1
 
> postgresql.log.2022-08-05-18:2022-08-05 18:38:14 UTC::@:[359]:LOG:  ### [S] @6/B6CB27D0: abort=(0/0)0/0,
miss=(0/0)0/0,SbyMode=0, SbyModeReq=1
 

Archive recovery ended here. The server should have promoted that
time..  Do you see some interesting log lines around this time?

> postgresql.log.2022-08-05-18:2022-08-05 18:50:13 UTC::@:[359]:LOG:  ### [S] @6/B8000198: abort=(0/0)0/0,
miss=(0/0)0/0,SbyMode=0, SbyModeReq=1
 

But, recovery continues in non-standby mode.  I don't see how come it
behaves that way.

> postgresql.log.2022-08-05-18:2022-08-05 18:50:20 UTC::@:[359]:LOG:  ### [A] @6/F3FFFF20: abort=(6/F3FFFF20)0/0,
miss=(6/F4000000)0/0,SbyMode=0, SbyModeReq=1
 
> postgresql.log.2022-08-05-18:2022-08-05 18:50:20 UTC::@:[359]:LOG:  ### [S] @6/F4000030: abort=(0/0)6/F3FFFF20,
miss=(0/0)6/F4000000,SbyMode=1, SbyModeReq=1
 

Then archive recovery starts again.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
> The server seem to have started as a standby after crashing a
> primary. Is it correct?

Yes, that is correct.

2022-08-05 17:18:51 UTC::@:[359]:LOG:  database system was interrupted; last known up at 2022-08-05 17:08:52 UTC
2022-08-05 17:18:51 UTC::@:[359]:LOG:  creating missing WAL directory "pg_wal/archive_status"
recovering 00000002.history
2022-08-05 17:18:51 UTC::@:[359]:LOG:  entering standby mode
2022-08-05 17:18:51 UTC::@:[359]:LOG:  ### [S] @0/48A3400: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  checkpoint record is at 0/48A3388
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  redo record is at 0/48A3388; shutdown true
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  next transaction ID: 533; next OID: 16395
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  oldest unfrozen transaction ID: 479, in database 1
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  oldest MultiXactId: 1, in database 1
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  commit timestamp Xid oldest/newest: 0/0
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  transaction ID wrap limit is 2147484126, limited by database with OID 1
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  starting up replication slots
2022-08-05 17:18:51 UTC::@:[359]:LOG:  database system was not properly shut down; automatic recovery in progress
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  resetting unlogged relations: cleanup 1 init 0
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  initializing for hot standby
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  recovery snapshots are now enabled
2022-08-05 17:18:51 UTC::@:[359]:LOG:  redo starts at 0/48A3400
2022-08-05 17:18:51 UTC::@:[359]:LOG:  ### [F] @0/10000000: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1
2022-08-05 17:18:51 UTC::@:[359]:DEBUG:  reached end of WAL in pg_wal, entering archive recovery
2022-08-05 17:18:51 UTC::@:[359]:LOG:  consistent recovery state reached at 0/10000000
2022-08-05 17:18:51 UTC::@:[357]:LOG:  database system is ready to accept read only connections


> Archive recovery ended here. The server should have promoted that
> time..  Do you see some interesting log lines around this time?

The server did promote around that time

2022-08-05 18:38:14 UTC::@:[359]:LOG:  ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=1, SbyModeReq=1
2022-08-05 18:38:14 UTC::@:[359]:LOG:  received promote request
2022-08-05 18:38:14 UTC::@:[359]:LOG:  redo done at 6/B6CB27A8
2022-08-05 18:38:14 UTC::@:[359]:LOG:  last completed transaction was at log time 2022-08-05 18:38:00.832047+00
recovering 00000001000000060000002D
....
........
2022-08-05 18:38:14 UTC::@:[359]:LOG:  ### [S] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1
recovering 00000002.history
2022-08-05 18:38:14 UTC::@:[359]:LOG:  selected new timeline ID: 2
2022-08-05 18:38:14 UTC::@:[359]:LOG:  archive recovery complete
recovering 00000001.history
2022-08-05 18:38:14 UTC::@:[357]:LOG:  database system is ready to accept connections
2022-08-05 18:38:15 UTC::@:[367]:LOG:  restartpoint complete: wrote 21388 buffers (2.1%); 0 WAL file(s) added, 9
removed,0 recycled; write=98.394 s, sync=0.041 s, total=98.586 s; sync files=46, longest=0.012 s, average=0.001 s;
distance=1048565kB, estimate=1048584 kB
 
2022-08-05 18:38:15 UTC::@:[367]:LOG:  recovery restart point at 6/5C0003B0

>    But, recovery continues in non-standby mode.  I don't see how come it
>    behaves that way.

But the server crashes sometime after which is why recovery starts.

022-08-05 18:50:13 UTC::@:[357]:LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-08-05 18:50:13 UTC::@:[357]:LOG:  listening on IPv6 address "::", port 5432
2022-08-05 18:50:13 UTC::@:[357]:LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-05 18:50:13 UTC::@:[359]:LOG:  database system was interrupted; last known up at 2022-08-05 18:38:15 UTC
2022-08-05 18:50:13 UTC::@:[359]:LOG:  creating missing WAL directory "pg_wal/archive_status"
recovering 00000003.history
2022-08-05 18:50:13 UTC::@:[359]:LOG:  entering standby mode
recovering 00000002.history
2022-08-05 18:50:13 UTC::@:[359]:LOG:  ### [S] @6/B8000198: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1
2022-08-05 18:50:13 UTC::@:[359]:LOG:  database system was not properly shut down; automatic recovery in progress
2022-08-05 18:50:13 UTC::@:[359]:LOG:  redo starts at 6/B80000E8

And a few hours later, is when we see a panic

Thanks

--
Sami Imseih
Amazon Web Services (AWS)



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Kyotaro Horiguchi
Date:
Hello.

> Yes, that is correct.

Mmm. I believed that the log came from a single server run, since the
PID (I believe the [359], [357] are PID) did not change through the
log lines.

> 2022-08-05 18:50:13 UTC::@:[359]:LOG:  creating missing WAL directory "pg_wal/archive_status"

This means that someone removes the content of pg_wal directory.
Removing some WAL files in pg_wal may lead to this symptom.

> 2022-08-05 18:50:13 UTC::@:[359]:LOG:  entering standby mode
> recovering 00000002.history
> 2022-08-05 18:50:13 UTC::@:[359]:LOG:  ### [S] @6/B8000198: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1
> 2022-08-05 18:50:13 UTC::@:[359]:LOG:  database system was not properly shut down; automatic recovery in progress
> 2022-08-05 18:50:13 UTC::@:[359]:LOG:  redo starts at 6/B80000E8
> 
> And a few hours later, is when we see a panic

So, it seems that the *standby* received the inconsistent WAL stream
(aborted-contrecord not followed by a overwriting-missing-contrecord)
from the primary.  Thus the inconsistency happened on the primary, not
on the standby.

So... I'm still failing to draw the big picutre of what is happening
here.

Could you show us the server configuration (dbservers involved and
their roles (primary/standby..)), and the exact steps when you restart
the server after carsh?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Robert Haas
Date:
On Wed, Aug 10, 2022 at 4:37 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> So, it seems that the *standby* received the inconsistent WAL stream
> (aborted-contrecord not followed by a overwriting-missing-contrecord)
> from the primary.  Thus the inconsistency happened on the primary, not
> on the standby.
>
> So... I'm still failing to draw the big picutre of what is happening
> here.

For the benefit of anyone who may be looking at this thread in the
archive later, I believe this commit will have fixed this issue:


-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
Robert Haas
Date:
On Mon, Aug 29, 2022 at 12:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 10, 2022 at 4:37 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > So, it seems that the *standby* received the inconsistent WAL stream
> > (aborted-contrecord not followed by a overwriting-missing-contrecord)
> > from the primary.  Thus the inconsistency happened on the primary, not
> > on the standby.
> >
> > So... I'm still failing to draw the big picutre of what is happening
> > here.
>
> For the benefit of anyone who may be looking at this thread in the
> archive later, I believe this commit will have fixed this issue:

Err, this commit here, I mean:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6672d7913929b82ec723a54381773d9cdc20fe9d

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From
"Imseih (AWS), Sami"
Date:
>    > For the benefit of anyone who may be looking at this thread in the
>    > archive later, I believe this commit will have fixed this issue:

I can also confirm that indeed the commit https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6672d79
does fix this issue.

Thanks!

--
Sami Imseih
Amazon Web Services (AWS)