On 2020/12/09 17:43, Kyotaro Horiguchi wrote:
> Hello.
>
> We found a behavioral change (which seems to be a bug) in recovery at
> PG13.
>
> The following steps might seem somewhat strange but the replication
> code deliberately cope with the case. This is a sequense seen while
> operating a HA cluseter using Pacemaker.
>
> - Run initdb to create a primary.
> - Set archive_mode=on on the primary.
> - Start the primary.
>
> - Create a standby using pg_basebackup from the primary.
> - Stop the standby.
> - Stop the primary.
>
> - Put stnadby.signal to the primary then start it.
> - Promote the primary.
>
> - Start the standby.
>
>
> Until PG12, the parimary signals end-of-timeline to the standby and
> switches to the next timeline. Since PG13, that doesn't happen and
> the standby continues to request for the segment of the older
> timeline, which no longer exists.
>
> FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already
beenremoved
>
> It is because WalSndSegmentOpen() can fail to detect a timeline switch
> on a historic timeline, due to use of a wrong variable to check
> that. It is using state->seg.ws_segno but it seems to be a thinko when
> the code around was refactored in 709d003fbd.
>
> The first patch detects the wrong behavior. The second small patch
> fixes it.
Thanks for reporting this! This looks like a bug.
When I applied two patches in the master branch and
ran "make check-world", I got the following error.
============== creating database "contrib_regression" ==============
# Looks like you planned 37 tests but ran 36.
# Looks like your test exited with 255 just after 36.
t/001_stream_rep.pl ..................
Dubious, test returned 255 (wstat 65280, 0xff00)
Failed 1/37 subtests
...
Test Summary Report
-------------------
t/001_stream_rep.pl (Wstat: 65280 Tests: 36 Failed: 0)
Non-zero exit status: 255
Parse errors: Bad plan. You planned 37 tests but ran 36.
Files=21, Tests=239, 302 wallclock secs ( 0.10 usr 0.05 sys + 41.69 cusr 39.84 csys = 81.68 CPU)
Result: FAIL
make[2]: *** [check] Error 1
make[1]: *** [check-recovery-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....
t/070_dropuser.pl ......... ok
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION