Re: standby apply lag on inactive servers - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: standby apply lag on inactive servers
Date
Msg-id 20200131144757.GA3354@alvherre.pgsql
Whole thread Raw
In response to Re: standby apply lag on inactive servers  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: standby apply lag on inactive servers  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2020-Jan-31, Fujii Masao wrote:
> On 2020/01/31 22:40, Alvaro Herrera wrote:
> > On 2020-Jan-31, Fujii Masao wrote:
> > 
> > > You're thinking to apply this change to the back branches? Sorry
> > > if my understanding is not right. But I don't think that back-patch
> > > is ok because it changes the documented existing behavior
> > > of pg_last_xact_replay_timestamp(). So it looks like the behavior
> > > change not a bug fix.
> > 
> > Yeah, I am thinking in backpatching it.  The documented behavior is
> > already not what the code does.
> 
> Maybe you thought this because getRecordTimestamp() extracts the
> timestamp from even WAL record of a restore point? That is, you're
> concerned about that pg_last_xact_replay_timestamp() returns the
> timestamp of not only commit/abort record but also restore point one.
> Right?

right.

> As far as I read the code, this problem doesn't occur because
> SetLatestXTime() is called only for commit/abort records, in
> recoveryStopsAfter(). No?

... uh, wow, you're right about that too.  IMO this is extremely
fragile, easy to break, and under-documented.  But you're right, there's
no bug there at present.

> >  Do you have a situation where this
> > change would break something?  If so, can you please explain what it is?
> 
> For example, use the return value of pg_last_xact_replay_timestamp()
> (and also the timestamp in the log message output at the end of
> recovery) as a HINT when setting recovery_target_time later.

Hmm.

I'm not sure how you would use it in that way.  I mean, I understand how
it *can* be used that way, but it seems too fragile to be done in
practice, in a scenario that's not just laboratory games.

> Use it to compare with the timestamp retrieved from the master server,
> in order to monitor the replication delay.

That's precisely the use case that I'm aiming at.  The timestamp
currently is not useful because this usage breaks when the primary is
inactive (no COMMIT records occur).  During such periods of inactivity,
CHECKPOINT records would keep the "last xtime" current.  This has
actually happened in a production setting, it's not a thought
experiment.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: standby apply lag on inactive servers
Next
From: Tom Lane
Date:
Subject: Re: Marking some contrib modules as trusted extensions