On Fri, Nov 22, 2019 at 05:31:55AM +0000, matsumura.ryo@fujitsu.com wrote:
>Hi all
>
>I find a situation that WAL archive file is lost but any WAL segment file is not lost.
>It causes for archive recovery to fail. Is this behavior a bug?
>
>example:
>
> WAL segment files
> 000000010000000000000001
> 000000010000000000000002
> 000000010000000000000003
>
> Archive files
> 000000010000000000000001
> 000000010000000000000003
>
> Archive file 000000010000000000000002 is lost but WAL segment files
> is continuous. Recovery with archive (i.e. PITR) stops at the end of
> 000000010000000000000001.
>
>How to reproduce:
>- Set up replication (primary and standby).
>- Set [archive_mode = always] in standby.
>- WAL receiver exits (i.e. because primary goes down)
> after receiver inserts the last record in some WAL segment file
> before receiver notifies the segement file to archiver(create .ready file).
>
>Even if WAL receiver restarts, the WAL segment file is not notified to
>archiver.
>
That does indeed seem like a bug. We should certainly archive all WAL
segments, irrespectedly of primary shutdowns/restarts/whatever. I guess
we should make sure the archiver is properly notified befor ethe exit.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services