Re: Strange decreasing value of pg_last_wal_receive_lsn() - Mailing list pgsql-hackers

From godjan •
Subject Re: Strange decreasing value of pg_last_wal_receive_lsn()
Date
Msg-id D3A6D0DE-A8C7-4E3A-A1B6-406C53662928@gmail.com
Whole thread Raw
In response to Re: Strange decreasing value of pg_last_wal_receive_lsn()  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Responses Re: Strange decreasing value of pg_last_wal_receive_lsn()
List pgsql-hackers
-> Why do you kill -9 your standby?
Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data in such situation.

So, now we update logic as Michael said. All ha alive standbys now waiting for replaying all WAL that they have and
afterwe use pg_last_replay_lsn() to choose which standby will be promoted in failover. 

It fixed out trouble, but there is one another. Now we should wait when all ha alive hosts finish replaying WAL to
failover.It might take a while(for example WAL contains wal_record about splitting b-tree). 

We are looking for options that will allow us to find a standby that contains all data and replay all WAL only for this
standbybefore failover. 

Maybe you have ideas on how to keep the last actual value of pg_last_wal_receive_lsn()? As I understand WAL receiver
doesn’twrite to disk walrcv->flushedUpto. 

> On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
>
>
> (too bad the history has been removed to keep context)
>
> On Fri, 8 May 2020 15:02:26 +0500
> godjan • <g0dj4n@gmail.com> wrote:
>
>> I got it, thank you.
>> Can you recommend what to use to determine which quorum standby should be
>> promoted in such case? We planned to use pg_last_wal_receive_lsn() to
>> determine which has fresh data but if it returns the beginning of the segment
>> on both replicas we can’t determine which standby confirmed that write
>> transaction to disk.
>
> Wait, pg_last_wal_receive_lsn() only decrease because you killed your standby.
>
> pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The later
> is set to the beginning of the segment requested only during the first
> walreceiver startup or a timeline fork:
>
>     /*
>      * If this is the first startup of walreceiver (on this timeline),
>      * initialize flushedUpto and latestChunkStart to the starting point.
>      */
>     if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli)
>     {
>         walrcv->flushedUpto = recptr;
>         walrcv->receivedTLI = tli;
>         walrcv->latestChunkStart = recptr;
>     }
>     walrcv->receiveStart = recptr;
>     walrcv->receiveStartTLI = tli;
>
> After a primary loss, as far as the standby are up and running, it is fine
> to use pg_last_wal_receive_lsn().
>
> Why do you kill -9 your standby? Whay am I missing? Could you explain the
> usecase you are working on to justify this?
>
> Regards,




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Our naming of wait events is a disaster.
Next
From: Fujii Masao
Date:
Subject: Re: SLRU statistics