Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Date
Msg-id CA+TgmoZLWgr0StwDMxxmN2fmS57EX7vryQP3Pda5B6Ap0DbHPQ@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Tue, Jun 5, 2012 at 4:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> We might want to have a different definition of apply delay for
> different purposes, so an improved definition of apply delay doesn't
> necessarily mean changing standby delay mechanism.
>
> An improved definition of apply delay would be, IMHO
> if (XLByteLE(receivePtr, replayPtr))
>    return 0;
> if (recoveryLastXTime > currentChunkStartTime)
>  then LastKnownTS = LastAppliedTS
>  else
>         LastKnownTS = StartChunkTS
> ApplyDelay = TimestampDifference(LastKnownTS, GetCurrentTimestamp()….);
>
> Which assumes the clocks are in sync. It also doesn't give very useful
> answers when no commits are occurring, and can hide the effects of
> large amounts of WAL generated by VACUUMs. So we need a better
> definition.

Another problem is that it sometimes subtracts two slave timestamps,
and sometimes subtracts a master timestamp from a slave timestamp.  If
we're assuming that the clocks must be in sync, you could argue that's
OK, but I think it will lead to weird edge-case behavior.

Suppose that we have the master guarantee that at least one
timestamped WAL record will be emitted every N seconds.  For the sake
of argument, let's say N = 5.  So, every 5 seconds, some process wakes
up on the master and checks whether any commit or abort records - or
any other kind of WAL record that carries a timestamp - has been
emitted in the last 5 seconds.  If so, then it does nothing.  If not,
it checks whether any WAL at all has been emitted since the last
timestamped record was generated.    If not, then it again does
nothing.  But if so, then it emits a WAL record when consists solely
of a master timestamp.

On the slave, every time we reach a commit record, an abort record, or
one of these new master-timestamp records, or any other record that
happens to have a timestamp, we update some shared memory area which
stores (a) the last master timestamp we saw during replay and (b) the
slave timestamp at the time we replayed it.  Apply delay (ignoring
time skew) can be calculated by subtracting the first value from the
second one, or we could expose the two values separately, which might
be even better, since users can then answer questions like "how long
has it been since we were able to recalculate the apply delay?".

I'm sure that at least one member of the audience will have some rocks
to throw at this proposal... fire away, but be gentle, since we are
all on the same team here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Next
From: Fujii Masao
Date:
Subject: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance