LogStandbySnapshot (was another thread) - Mailing list pgsql-hackers

From Simon Riggs
Subject LogStandbySnapshot (was another thread)
Date
Msg-id 1273016535.4535.3155.camel@ebony
Whole thread Raw
In response to Re: Pause/Resume feature for Hot Standby  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: LogStandbySnapshot (was another thread)
List pgsql-hackers
On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote:

> * LogStandbySnapshot is merest fantasy: no guarantee that either the
> XIDs list or the locks list will be consistent with the point in WAL
> where it will get inserted.  What's worse, locking things down enough
> to guarantee consistency would be horrid for performance, or maybe
> even deadlock-inducing. Could lose both ways: list might contain an
> XID whose commit/abort went to WAL before the snapshot did, or list
> might be missing an XID started just after snap was taken, The latter
> case could possibly be dealt with via nextXid filtering, but that
> doesn't fix the former case, and anyway we have both ends of the same
> problem for locks.

This was the only serious complaint on your list, so lets address it.

Clearly we don't want to lock everything down, for all the reasons you
say. That creates a gap between when data is derived and when data
logged to WAL.

LogStandbySnapshot() occurs during online checkpoints on or after the
logical checkpoint location and before the physical checkpoint location.

We start recovery from a checkpoint, so we have a starting point in WAL
for our processing. The time sequence on the primary of these related
events is

Logical Checkpoint location
newxids/commits/locks "Before1"
AccessExclusiveLocks derived
newxids/commits/locks "Before2"
AccessExclusiveLocks WAL record inserted
newxids/commits/locks "After1"
RunningXact derived
newxids/commits/locks "After2"
RunningXact WAL record inserted

though when we read them back from WAL, they will be in this order, and
we cannot tell the difference between events at Before 1 & 2 or After 1
& 2.

Logical Checkpoint location <= STANDBY_INITIALIZED
newxids/commits/locks "Before1"
newxids/commits/locks "Before2"
AccessExclusiveLocks WAL record
newxids/commits/locks "After1"
newxids/commits/locks "After2"
RunningXact WAL record <= STANDBY_SNAPSHOT_READY

We're looking for a consistent point. We don't know what the exact
time-synchronised point is on master, so we have to use an exact point
in WAL and work from there. We need to understand that the serialization
of events in the log can be slightly different to how they occurred on
the primary, but that doesn't change anything important.

So to get a set of xids + locks that are consistent at the moment the
RunningXact WAL record is read we need to 

1. Begin processing incoming changes from the time we are
STANDBY_INITIALIZED, though forgive any errors for removals of missing
items until we hit STANDBY_SNAPSHOT_READY
a) locks - we ignore missing locks in StandbyReleaseLocks()
b) xids - we ignore missing xids in KnownAssignedXidsRemove()

2. Any transaction commits/aborts from the time we are
STANDBY_INITIALIZED, through to STANDBY_SNAPSHOT_READY need to be saved,
so that we can remove them again from the snapshot state. That is
because events might otherwise exist in the standby that will never be
removed from snapshot. We do this by simple test whether the related xid
has already completed.
a) locks - we ignore locks for already completed xids in
StandbyAcquireAccessExclusiveLock()
b) xids - we ignore already completed xids in
ProcArrayApplyRecoveryInfo()

We currently do all of the above. So it looks correct to me.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Need to contact driver authors about change in index naming behavior ...
Next
From: "Joshua D. Drake"
Date:
Subject: Re: max_standby_delay considered harmful