Re: Strange decreasing value of pg_last_wal_receive_lsn() - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: Strange decreasing value of pg_last_wal_receive_lsn() |
Date | |
Msg-id | 20200514184457.48d58ef5@firost Whole thread Raw |
In response to | Re: Strange decreasing value of pg_last_wal_receive_lsn() (godjan • <g0dj4n@gmail.com>) |
Responses |
Re: Strange decreasing value of pg_last_wal_receive_lsn()
|
List | pgsql-hackers |
(please, the list policy is bottom-posting to keep history clean, thanks). On Thu, 14 May 2020 07:18:33 +0500 godjan • <g0dj4n@gmail.com> wrote: > -> Why do you kill -9 your standby? > Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data > in such situation. OK. This test is highly useful to stress data high availability and durability, of course. However, how useful is this test in a context of auto failover for **service** high availability? If all your nodes are killed in the same disaster, how/why an automatic cluster manager should take care of starting all nodes again and pick the right node to promote? > So, now we update logic as Michael said. All ha alive standbys now waiting > for replaying all WAL that they have and after we use pg_last_replay_lsn() to > choose which standby will be promoted in failover. > > It fixed out trouble, but there is one another. Now we should wait when all > ha alive hosts finish replaying WAL to failover. It might take a while(for > example WAL contains wal_record about splitting b-tree). Indeed, this is the concern I wrote about yesterday in a second mail on this thread. > We are looking for options that will allow us to find a standby that contains > all data and replay all WAL only for this standby before failover. Note that when you promote a node, it first replays available WALs before acting as a primary. So you can safely signal the promotion to the node and wait for it to finish the replay and promote. > Maybe you have ideas on how to keep the last actual value of > pg_last_wal_receive_lsn()? Nope, no clean and elegant idea. One your instances are killed, maybe you can force flush the system cache (secure in-memory-only data) and read the latest received WAL using pg_waldump? But, what if some more data are available from archives, but not received from streaming rep because of a high lag? > As I understand WAL receiver doesn’t write to disk walrcv->flushedUpto. I'm not sure to understand what you mean here. pg_last_wal_receive_lsn() reports the actual value of walrcv->flushedUpto. walrcv->flushedUpto reports the latest LSN force-flushed to disk. > > On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> > > wrote: > > > > > > (too bad the history has been removed to keep context) > > > > On Fri, 8 May 2020 15:02:26 +0500 > > godjan • <g0dj4n@gmail.com> wrote: > > > >> I got it, thank you. > >> Can you recommend what to use to determine which quorum standby should be > >> promoted in such case? We planned to use pg_last_wal_receive_lsn() to > >> determine which has fresh data but if it returns the beginning of the > >> segment on both replicas we can’t determine which standby confirmed that > >> write transaction to disk. > > > > Wait, pg_last_wal_receive_lsn() only decrease because you killed your > > standby. > > > > pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The > > later is set to the beginning of the segment requested only during the first > > walreceiver startup or a timeline fork: > > > > /* > > * If this is the first startup of walreceiver (on this timeline), > > * initialize flushedUpto and latestChunkStart to the starting > > point. */ > > if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli) > > { > > walrcv->flushedUpto = recptr; > > walrcv->receivedTLI = tli; > > walrcv->latestChunkStart = recptr; > > } > > walrcv->receiveStart = recptr; > > walrcv->receiveStartTLI = tli; > > > > After a primary loss, as far as the standby are up and running, it is fine > > to use pg_last_wal_receive_lsn(). > > > > Why do you kill -9 your standby? Whay am I missing? Could you explain the > > usecase you are working on to justify this? > > > > Regards, > > > -- Jehan-Guillaume de Rorthais Dalibo
pgsql-hackers by date: