Re: [BUG] Panic due to incorrect missingContrecPtr after promotion - Mailing list pgsql-hackers

From Imseih (AWS), Sami
Subject Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date
Msg-id 379F2F42-F19A-4026-AB09-42E6F9ABB5F5@amazon.com
Whole thread Raw
In response to Re: [BUG] Panic due to incorrect missingContrecPtr after promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
>    Nice catch!  However, I'm not sure I like the patch.

>             * made it through and start writing after the portion that persisted.
>             * (It's critical to first write an OVERWRITE_CONTRECORD message, which
>             * we'll do as soon as we're open for writing new WAL.)
>    +        *
>    +        * If the last wal record is ahead of the missing contrecord, this is a
>    +        * recently promoted primary and we should not write an overwrite
>    +        * contrecord.

>    Before the part, the comment follows the part shown below.

>    >        * Actually, if WAL ended in an incomplete record, skip the parts that

>    So, actually WAL did not ended in an incomplete record.  I think
>    FinishWalRecover is the last place to do that. (But it could be
>    earlier.)

Thanks for the feedback. 

## On the primary, an OVERWRITE_CONTRECORD is written. The aborted record (abortedRecPtr) and insert position of the
missingcontrecord (missingContrecPtr) are set during FinishWalRecovery and after recovery but before writes are
accepted,the OVERWRITE_CONTRECORD is written.
 
## on the primary logs

2022-02-25 02:25:16.208 UTC [7397] LOG:  redo done at 0/1FFD2B8 system usage: CPU: user: 0.01 s, system: 0.00 s,
elapsed:0.01 s
 
2022-02-25 02:25:16.208 UTC [7397] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2022-02-25 02:25:16.209 UTC [7397] DEBUG:  creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  done creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  OVERWRITE_CONTRECORD inserted at 0/2000028 for aborted record 0/1FFD2E0
<<-------Attached V3 of patch that adds logging which shows the overwrite record created on the primary
 
2022-02-25 02:25:16.217 UTC [7395] LOG:  checkpoint starting: end-of-recovery immediate wait
2022-02-25 02:25:16.217 UTC [7395] DEBUG:  performing replication slot checkpoint
2022-02-25 02:25:16.218 UTC [7395] DEBUG:  attempting to remove WAL segments older than log file
000000000000000000000000
2022-02-25 02:25:16.218 UTC [7395] LOG:  checkpoint complete: wrote 131 buffers (102.3%); 0 WAL file(s) added, 0
removed,0 recycled; write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=11381kB, estimate=11381 kB
 
2022-02-25 02:25:16.219 UTC [7394] DEBUG:  starting background worker process "logical replication launcher"
2022-02-25 02:25:16.219 UTC [7394] LOG:  database system is ready to accept connections

## A downstream standby now skips over this OVERWRITE_CONTRECORD, but the value for missingContrecPtr is not
invalidatedafterwards.
 
## on the standby logs

2022-02-25 02:25:16.616 UTC [7413] DEBUG:  sending hot standby feedback xmin 0 epoch 0 catalog_xmin 0
catalog_xmin_epoch0
 
2022-02-25 02:25:16.616 UTC [7387] LOG:  successfully skipped missing contrecord at 0/1FFD2E0, overwritten at
2022-02-2502:25:16.2175+00
 
2022-02-25 02:25:16.616 UTC [7387] CONTEXT:  WAL redo at 0/2000028 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFD2E0; time
2022-02-2502:25:16.2175+00
 

## After promotion, the standby attempts to write the overwrite_contrecord again using the missingContrecPtr LSN which
isnow in the past.
 
## on the standby logs

2022-02-25 02:25:16.646 UTC [7387] LOG:  invalid record length at 0/201EC70: wanted 24, got 0
2022-02-25 02:25:16.646 UTC [7387] LOG:  redo done at 0/201EC48 system usage: CPU: user: 0.01 s, system: 0.00 s,
elapsed:0.55 s
 
2022-02-25 02:25:16.646 UTC [7387] LOG:  last completed transaction was at log time 2022-02-25 02:25:16.301554+00
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2022-02-25 02:25:16.646 UTC [7387] LOG:  selected new timeline ID: 2
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  updated min recovery point to 0/201EC70 on timeline 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  could not remove file "pg_wal/000000020000000000000002": No such file or
directory
2022-02-25 02:25:16.656 UTC [7387] LOG:  archive recovery complete
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  OVERWRITE_CONTRECORD inserted at 0/2000028 for aborted record 0/1FFD2E0
<<-------The same overwrite record is written on the recently promoted standby
 
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  attempting to remove WAL segments newer than log file
000000020000000000000001
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  recycled write-ahead log file "000000010000000000000002"
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  release all standby locks
2022-02-25 02:25:16.656 UTC [7385] LOG:  checkpoint starting: force
2022-02-25 02:25:16.656 UTC [7385] DEBUG:  performing replication slot checkpoint
2022-02-25 02:25:16.656 UTC [7385] LOG:  request to flush past end of generated WAL; request 0/201EC70, current
position0/2000088
 
2022-02-25 02:25:16.656 UTC [7385] PANIC:  xlog flush request 0/201EC70 is not satisfied --- flushed only to 0/2000088
2022-02-25 02:25:16.657 UTC [7384] LOG:  checkpointer process (PID 7385) was terminated by signal 6: Aborted

The purpose of the patch is to check that if the current LSN is beyond missingContrecPtr, and to skip ( incorrectly )
writingan overwrite contrecord that was written on the old writer.
 

--
Sami Imseih
Amazon Web Services



Attachment

pgsql-hackers by date:

Previous
From: Noboru Saito
Date:
Subject: Re: Separate the result of \watch for each query execution (psql)
Next
From: Simon Riggs
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks