Re: hot_standby_feedback vs excludeVacuum and snapshots - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: hot_standby_feedback vs excludeVacuum and snapshots |
Date | |
Msg-id | 20180704202756.GA269861@rfd.leadboat.com Whole thread Raw |
In response to | Re: hot_standby_feedback vs excludeVacuum and snapshots (Andres Freund <andres@anarazel.de>) |
Responses |
Re: hot_standby_feedback vs excludeVacuum and snapshots
|
List | pgsql-hackers |
On Fri, Jun 08, 2018 at 11:03:38AM -0700, Andres Freund wrote: > On 2018-06-08 09:23:02 +0100, Simon Riggs wrote: > > For context, AEL locks are normally removed by COMMIT or ABORT. > > StandbyReleaseOldLocks() is just a sweeper to catch anything that > > didn't send an abort before it died, so it hardly ever activates. The > > coding of StandbyReleaseOldLocks() is backwards... if it ain't in the > > running list, then we remove it. > > > > But that wasn't working correctly either, since as of 49bff5300d527 we > > assigned AELs to subxids. Subxids weren't in the running list and so > > AELs held by them would have been removed at the wrong time, an extant > > bug in PG10. It looks to me like they would have been removed either > > early or late, up to the next runningxact info record. They would be > > removed, so no leakage, but the late timing wouldn't be noticeable for > > tests or most usage, since it would look just like lock contention. > > Early release might give same issue of block access to non-existent > > block/file. > > Yikes, that's kinda bad. It can also just cause plain crashes, no? The > on-disk / catalog state isn't necessarily consistent during DDL, which > is why we hold AE locks. At the very least it can cause corruption of > in-use relcache entries and such. Yes. If I'm reading the commit message right, the committed code also releases locks too early: > commit 15378c1 > Author: Simon Riggs <simon@2ndQuadrant.com> > AuthorDate: Sat Jun 16 14:03:29 2018 +0100 > Commit: Simon Riggs <simon@2ndQuadrant.com> > CommitDate: Sat Jun 16 14:03:29 2018 +0100 > > Remove AELs from subxids correctly on standby > > Issues relate only to subtransactions that hold AccessExclusiveLocks > when replayed on standby. > > Prior to PG10, aborting subtransactions that held an > AccessExclusiveLock failed to release the lock until top level commit or > abort. 49bff5300d527 fixed that. > > However, 49bff5300d527 also introduced a similar bug where subtransaction > commit would fail to release an AccessExclusiveLock, leaving the lock to > be removed sometimes early and sometimes late. This commit fixes > that bug also. Backpatch to PG10 needed. Subtransaction commit is too early to release an arbitrary AccessExclusiveLock. The primary releases every AccessExclusiveLock at top-level transaction commit, top-level transaction abort, or subtransaction abort. CommitSubTransaction() doesn't do that; it transfers locks to the parent sub(xact). Standby nodes can't safely remove an arbitrary lock earlier than the primary would.
pgsql-hackers by date: