Re: Hot Standby: Relation-specific deferred conflict resolution - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Hot Standby: Relation-specific deferred conflict resolution |
Date | |
Msg-id | 1264851488.13782.1858.camel@ebony Whole thread Raw |
In response to | Re: Hot Standby: Relation-specific deferred conflict resolution (Greg Stark <gsstark@mit.edu>) |
List | pgsql-hackers |
On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote: > Can you explain what it does in > more detail so we can understand why it's necessary for a sensible set > of features? I've slimmed down the patch to make it clearer what it does, having committed some refactoring. Problem: Currently when we perform conflict resolution we do not use the relid from the WAL record, we target all users regardless of which relations they have accessed or intend to access. So changes to table X can cause cancelation of someone accessing table Y because **they might later in the transaction access table X**. That is too heavy handed and is most often overkill. This is the same problem you and I have discussed many times, over the last 14 months, though the problem itself has been discussed on hackers many times over last 20 months and many potential solutions offered by me. An example of current behaviour, using tables A, B and C T0: An AccessExclusiveLock is applied to B T1: Q1 takes snapshot, takes lock on A and begins query T2: Q2 takes snapshot, queues for lock on B behind AccessExclusiveLock T3: Cleanup on table C is handled that will conflict with both snapshots T4: Q3 takes snapshot, takes lock on C and begins query (if possible) T5: Cleanup on table C is handled that will conflict with Q3 Current: At T3, current conflict resolution will wait for max_standby_delay and then cancel Q1 and Q2. Q3 can begin processing immediately because the snapshot it takes will always be same or later than the xmin that generated the cleanup at T3. At T5, Q3 will be quickly cancelled because all the standby delay was used up at T3 and there is none left to spend on delaying for Q3. Proposed Resolution: as presented to hackers in 12/2009 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php Let's look at the effect first, then return to the detail. In this proposal, the above sequence of actions will look like this: Conflict resolution will wait at T3 until we hit max_standby_delay, at which point we learn that Q1 and Q2 do not conflict and we let them continue on their way. At T5, Q3 will be cancelled without much delay, because we have now used up most of max_standby_delay. So in both approaches, Q3 that accessed table C will be canceled fairly quickly. The key to this is that in the new proposal, Q1 and Q2 will not be canceled: they will continue to completion. How it works: When we process a snapshot conflict we check which queries have snapshots that conflict. We then wait for max_standby_delay and the check lock conflicts. (We do it this way because of a timing issue described on the link above, pointed out by Greg). When we check for lock conflicts we also set a latestRemovedXid on "that relation", so that we capture all current lockers *and* allow all future lockers to check the latestRemovedXid against their snapshot. In either case, if a lock conflict occurs then we will cancel the query. I mention "that relation" because *where* we record the xid limit for each relation is an important aspect of the design. In the current patch we take a simple approach, others are possible. If there is already a lock in the shared lock table, then we add the latestRemovedXid to that. If not, we keep track of the latestRemovedXid for the whole lock partition. So we aren't tracking each relation separately in most cases, except for when a table is being frequently accessed, or access for a long period. There is also an optimization added here. When we defer cancelation of queries the same query keeps re-appearing in the conflict list for later WAL records. As a result there is a mechanism to avoid constant re-listing of a conflict. The attached patch is for review and discussion only at this stage. I'm working on other areas now while discussion takes place, or not. -- Simon Riggs www.2ndQuadrant.com
Attachment
pgsql-hackers by date: