Standby receiving part of missing WAL segment - Mailing list pgsql-hackers

From Thom Brown
Subject Standby receiving part of missing WAL segment
Date
Msg-id CAA-aLv5StMF=oeoP9WbjEbWuj+Y-EKqBhcp=5aP7WYvO_kSPhw@mail.gmail.com
Whole thread Raw
Responses Re: Standby receiving part of missing WAL segment  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

Today I witnessed a situation which appears to have gone down like this:

- The primary server starting streaming WAL data from segment 00A8 to the standby
- The standby server started receiving that data
- Before 00A8 is finished, the wal sender process dies on the primary, but the archiver process continues, and 00A8 ends up being archived as usual
- The primary continues to generate WAL and cleans up old WAL files from pg_xlog until 00A8 is gone.
- The primary is restarted and the wal sender process is back up and running
- The standby says "waiting for 00A8", which it can no longer get from the primary
- 00A8 is in the standby's archive directory, but the standby is waiting for the rest of the segment from the primary via streaming replication, so doesn't check the archive
- The standby is restarted
- The standby goes back into recovery and eventually replays 00A8 and continues as normal.

Should the standby be able to get feedback from the primary that the requested segment is no longer available, and therefore know to check its archive?

Or should it check the archive anyway if it hasn't received any further WAL data via the streaming replication connection after a certain amount of time?

At the moment, the standby gets stuck forever in this situation, even though it has access to the WAL it needs.

Thom

pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: libpq's multi-threaded SSL callback handling is busted
Next
From: Grzegorz Parka
Date:
Subject: Re: [pgsql-advocacy] GSoC 2015 - mentors, students and admins.