Re: pg_rewind: warn when checkpoint hasn't happened after promotion - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: pg_rewind: warn when checkpoint hasn't happened after promotion |
Date | |
Msg-id | 20220607.123938.1165349070995545320.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: pg_rewind: warn when checkpoint hasn't happened after promotion (James Coleman <jtc331@gmail.com>) |
Responses |
Re: pg_rewind: warn when checkpoint hasn't happened after promotion
Re: pg_rewind: warn when checkpoint hasn't happened after promotion |
List | pgsql-hackers |
At Mon, 6 Jun 2022 08:32:01 -0400, James Coleman <jtc331@gmail.com> wrote in > On Mon, Jun 6, 2022 at 1:26 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Sat, 4 Jun 2022 19:09:41 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > > On Sat, Jun 4, 2022 at 6:29 PM James Coleman <jtc331@gmail.com> wrote: > > > > > > > > A few weeks back I sent a bug report [1] directly to the -bugs mailing > > > > list, and I haven't seen any activity on it (maybe this is because I > > > > emailed directly instead of using the form?), but I got some time to > > > > take a look and concluded that a first-level fix is pretty simple. > > > > > > > > A quick background refresher: after promoting a standby rewinding the > > > > former primary requires that a checkpoint have been completed on the > > > > new primary after promotion. This is correctly documented. However > > > > pg_rewind incorrectly reports to the user that a rewind isn't > > > > necessary because the source and target are on the same timeline. > > ... > > > > Attached is a patch that detects this condition and reports it as an > > > > error to the user. > > > > I have some random thoughts on this. > > > > There could be a problem in the case of gracefully shutdowned > > old-primary, so I think it is worth doing something if it can be in a > > simple way. > > > > However, I don't think we can simply rely on minRecoveryPoint to > > detect that situation, since it won't be reset on a standby. A standby > > also still can be the upstream of a cascading standby. So, as > > discussed in the thread for the comment [2], what we can do here would be > > simply waiting for the timelineID to advance, maybe having a timeout. > > To confirm I'm following you correctly, you're envisioning a situation like: > > - Primary A > - Replica B replicating from primary > - Replica C replicating from replica B > > then on failover from A to B you end up with: > > - Primary B > - Replica C replication from primary > - [needs rewind] A > > and you try to rewind A from C as the source? Yes. I think it is a legit use case. That being said, like other points, it might be acceptable. > > In a case of single-step replication set, a checkpoint request to the > > primary makes the end-of-recovery checkpoint fast. It won't work as > > expected in cascading replicas, but it might be acceptable. > > "Won't work as expected" because there's no way to guarantee > replication is caught up or even advancing? Maybe no. I meant that restartpoints don't run more frequently than the intervals of checkpoint_timeout even if checkpint records come more frequently. > > > > In the spirit of the new-ish "ensure shutdown" functionality I could > > > > imagine extending this to automatically issue a checkpoint when this > > > > situation is detected. I haven't started to code that up, however, > > > > wanting to first get buy-in on that. > > > > > > > > 1: https://www.postgresql.org/message-id/CAAaqYe8b2DBbooTprY4v=BiZEd9qBqVLq+FD9j617eQFjk1KvQ@mail.gmail.com > > > > > > Thanks. I had a quick look over the issue and patch - just a thought - > > > can't we let pg_rewind issue a checkpoint on the new primary instead > > > of erroring out, maybe optionally? It might sound too much, but helps > > > pg_rewind to be self-reliant i.e. avoiding external actor to detect > > > the error and issue checkpoint the new primary to be able to > > > successfully run pg_rewind on the pld primary and repair it to use it > > > as a new standby. > > > > At the time of the discussion [2] for the it was the hinderance that > > that requires superuser privileges. Now that has been narrowed down > > to the pg_checkpointer privileges. > > > > If we know that the timeline IDs are different, we don't need to wait > > for a checkpoint. > > Correct. > > > It seems to me that the exit status is significant. pg_rewind exits > > with 1 when an invalid option is given. I don't think it is great if > > we report this state by the same code. > > I'm happy to change that; I only chose "1" as a placeholder for > "non-zero exit status". > > > I don't think we always want to request a non-spreading checkpoint. > > I'm not familiar with the terminology "non-spreading checkpoint". Does "immediate checkpoint" works? That is, a checkpoint that runs at full-speed (i.e. with no delays between writes). > > [2] https://www.postgresql.org/message-id/flat/CABUevEz5bpvbwVsYCaSMV80CBZ5-82nkMzbb%2BBu%3Dh1m%3DrLdn%3Dg%40mail.gmail.com > > I read through that thread, and one interesting idea stuck out to me: > making "tiimeline IDs are the same" an error exit status. On the one > hand that makes a certain amount of sense because it's unexpected. But > on the other hand there are entirely legitimate situations where upon > failover the timeline IDs happen to match (e.g., for use it happens > some percentage of the time naturally as we are using sync replication > and failovers often involve STONITHing the original primary, so it's > entirely possible that the promoted replica begins with exactly the > same WAL ending LSN from the primary before it stopped). Yes that is true for most cases unless old primary written some records that had not sent to the standby before its death. So if we don't inspect WAL records (on the target cluster), we should always run rewinding even in the STONITH-killed (or immediate-shutdown) cases. One possible way to detect promotion reliably is to look into timeline history files. It is written immediately at promotion even on standbys. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: