Re: Possible missing segments in archiving on standby - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Possible missing segments in archiving on standby
Date
Msg-id e07f65f2-4f79-1c1b-6a7d-35d84dd67b0d@oss.nttdata.com
Whole thread Raw
In response to Re: Possible missing segments in archiving on standby  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Possible missing segments in archiving on standby
List pgsql-hackers

On 2021/08/31 16:35, Kyotaro Horiguchi wrote:
> I'm not sure which is simpler, but it works except for B, the case of
> a long-jump by a segment switch.  When a segment switch happens,
> walsender sends filling zero-pages but even if walreceiver is
> terminated before the segment is completed, walsender restarts from
> the next segment at the next startup. Concretely like the following.
> 
> - pg_switch_wal() invoked at 6003228 (for example)
> - walreceiver terminates at 6500000 (or a bit later).
> - walrecever rstarts from 7000000
> 
> In this case the segment 6 is not notified even with the patch, and my
> old patches works the same way. (In other words, the call to
> XLogWalRcvClose() at the end of XLogWalRcvWrite doens't work for the
> case as you might expect.) If we think it ok that we don't notify the
> segment earlier than a future checkpoint removes it, yours or only the
> last half of my one is sufficient, but do we really think so?
> Furthermore, your patch or only the last half of my second patch
> doesn't save the case of a crash unlike the case of a graceful
> termination.

Thanks for the clarification!
Please let me check my understanding about the issue.

The issue happens when walreceiver exits after it receives XLOG_SWITCH record
but before receives the remaining bytes of the segment including that
XLOG_SWITCH record. In this case, the startup process tries to replay that
"half-received" segment, finds XLOG_SWITCH record in it, moves to the next
segment and then starts new walreceiver from that next segment. Therefore,
even with my patch, the segment including that XLOG_SWITCH record is not
archived soon. Is my understanding right? I agree that we should address also
this issue.

ISTM, to address the issue,  it's simpler and less fragile to make the startup
process call XLogArchiveCheckDone() or something whenever it moves
the next segment, rather than make walreceiver do that. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Pavel Luzanov
Date:
Subject: psql: \dl+ to list large objects privileges
Next
From: Daniel Gustafsson
Date:
Subject: Re: psql: \dl+ to list large objects privileges