Re: Should the archiver process always make sure that the timeline history files exist in the archive? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Should the archiver process always make sure that the timeline history files exist in the archive?
Date
Msg-id 20230824.171500.418533297821162665.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Should the archiver process always make sure that the timeline history files exist in the archive?  (Jimmy Yih <jyih@vmware.com>)
Responses Re: Should the archiver process always make sure that the timeline history files exist in the archive?
List pgsql-hackers
At Wed, 16 Aug 2023 07:33:29 +0000, Jimmy Yih <jyih@vmware.com> wrote in 
> Hello pgsql-hackers,
> 
> After doing some more debugging on the matter, I believe this issue might be a
> minor regression from commit 5332b8cec541. Prior to that commit, the archiver
> process when first started on a previously promoted primary would have all the
> timeline history files marked as ready for immediate archiving. If that had
> happened, none of my mentioned failure scenarios would be theoretically possible
> (barring someone manually deleting the timeline history files). With that in
> mind, I decided to look more into my Question 1 and created a patch proposal.
> The attached patch will try to archive the current timeline history file if it
> has not been archived yet when the archiver process starts up.

In essence, after taking a subtle but not necessarily wrong steps,
there's a case where a primary server lacks the timeline history file
for the current timeline in both pg_wal and archive, even if that
timeline is larger than 1. This primary can start, but a new standby
created form the primary cannot start streaming, as it can't fetch the
timeline history file for the initial TLI.

A. The OP suggests archiving the timeline history file for the current
 timeline every time the archiver starts. However, I don't think we
 want to keep archiving the same file over and over. (Granted, we're
 not always perfect at avoiding that..)

B. Given that the steps valid, I concur to what is described in the
 test script provided: standbys don't really need that history file
 for the initial TLI (though I have yet to fully verify this).  If the
 walreceiver just overlooks a fetch error for this file, the standby
 can successfully start. (Just skipping the first history file seems
 to work, but it feels a tad aggressive to me.)

C. If those steps aren't valid, we might want to add a note stating
 that -X none basebackups do need the timeline history file for the
 initial TLI. And don't forget to enable archive mode before the
 latest timeline switch if any.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Sergey Shinderuk
Date:
Subject: Re: Fix error handling in be_tls_open_server()
Next
From: Daniel Gustafsson
Date:
Subject: Re: Fix error handling in be_tls_open_server()