Re: Hot Standby, deferred conflict resolution for cleanup records (v2) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Standby, deferred conflict resolution for cleanup records (v2)
Date
Msg-id 1260788864.1955.581.camel@ebony
Whole thread Raw
In response to Re: Hot Standby, deferred conflict resolution for cleanup records (v2)  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Mon, 2009-12-14 at 04:57 +0000, Greg Stark wrote:
> On Sat, Dec 12, 2009 at 3:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Anyone acquiring a lock on a table should check the latestRemovedXid for
> > the table and abort if their xmin is too old. This prevents new lockers
> > from accessing a cleaned relation immediately after we decide to abort
> > anyone looking at that table. (Anyone queuing for the existing locks
> > would be caught by this).
> 
> I fear given HOT pruning that this could mean no query can even get
> started against a busy table. It seems like you would have to start
> your transaction several times until you manage to get a lock on the
> busy table soon enough after taking the snapshot to not have missed
> any cleanups in the table. 

The proposal improves this situation. Right now we would cancel all
queries, not just the ones looking at the busy table.

> Or have I missed something that protects
> against that?

At your suggestion, I previously added a feature, described in docs:

"It is also possible to set vacuum_defer_cleanup_age on the primary to
defer the cleanup of records by autovacuum, vacuum and HOT. This may
allow more time for queries to execute before they are cancelled on the
standby, without the need for setting a high max_standby_delay."

vacuum_defer_cleanup_age delays globalxmin by a fixed number of xids.

That is fairly crude and so the proposal here is to add a finer-grained
conflict resolution, please read on.

> The bigger problem with this is that I don't see any way to tune this
> to have a safe replica. In the current system you can set
> standby_max_delay to 0 or -1 or whatever to completely disable killing
> off valid queries on the replica. In this setup you're going ahead
> with cleanup records which may or may not be safe and then have no
> recourse if they turn out to conflict.

Attempting a full analysis...

An example of current and proposed behaviours, using tables A, B and C
T0: An AccessExclusiveLock is applied to B
T1: Q1 takes snapshot, takes lock on A and begins query 
T2: Q2 takes snapshot, queues for lock on B behind AccessExclusiveLock
T3: Cleanup on table C is handled that will conflict with both snapshots
T4: Q3 takes snapshot, takes lock on C and begins query (if possible)
T5: Cleanup on table C is handled that will conflict with Q3

Current: At T3, current conflict resolution will wait for
max_standby_delay and then cancel Q1 and Q2. Q3 can begin processing
immediately because the snapshot it takes will always be same or later
than the xmin that generated the cleanup at T3 (*). At T5, Q3 will be
quickly cancelled because all the standby delay was used up at T3 and
there is none left to spend on delaying for Q3.
(*) is obviously a key assumption.

Proposal1: Conflict resolution will not wait at T3 at all and Q1 and Q2
will continue towards completion. At T5, Q3 will be cancelled without
much delay, as explained for current.

Proposal1 seems better than current situation.

Taking up your point about timing delays, if the sequence of actions is
T6: Q4 takes snapshot
T7: commit of transaction that advances xmin
T8: Cleanup on table C handled without delay
T9: Q4 takes lock on C and cancels
then yes, Q4 is cancelled without a delay. There is a possible race
condition that would allow this, but in the vast majority of read
committed queries this would be a small window, since T7, T8 are seldom
adjacent in WAL, whereas T6, T9 are typically very fast.  If the race
does occur then the effect is not incorrect query results, just a query
cancelled earlier than we would ideally like it to have been.

A slight modification to the proposal would be to check for conflicts
based upon the snapshot first, wait, then check for lock conflicts
before cancelling, rather than the other way around. That closes the
timing window you've pointed out, at least as far as max_standby_delay.
Call that Proposal2.

The first example would then result like this:
Proposal2: Conflict resolution will wait at T3; Q1 and Q2 will continue
towards completion because there is no lock conflict. At T5, Q3 will be
cancelled without much delay, as explained for current. So almost
identical outcome in the typical case, but Proposal 2 doesn't cancel
queries early in some cases.

In summary:
* Current: Q1, Q2 and Q3 are cancelled.
* Proposal1: Q1, Q2 continue until completion. Q3 is cancelled, with
roughly same delay as in current proposal.
* Proposal2: Q1, Q2 continue until completion. Q3 is cancelled, with
roughly same delay as in current proposal, though consistent handling in
all, not just typical cases.

So Proposal2 wins, though problems still possible.

AFAICS the faster a table generates cleanup records, the shorter the
window of opportunity for queries to run against it without conflict on
Hot Standby. The worst case is a single row table being updated
constantly by short transactions, which will generate a rapid stream of
cleanup from WAL. The only simple solution in that case is to pause the
standby entirely while queries complete. That option has been removed
during review to allow the remainder of the patch to proceed, though
clearly needs to be replaced.

I don't think I've solved every usability issue and I think this needs
to go to Alpha to get better feedback on that. More ideas welcome.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Range types
Next
From: Simon Riggs
Date:
Subject: Re: Hot Standby, release candidate?