Re: snapshot too old, configured by time - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: snapshot too old, configured by time |
Date | |
Msg-id | CAM3SWZSiL52kGFPVVST=hA=oA+-YWEHQAPK_pVocReBoOEeQqQ@mail.gmail.com Whole thread Raw |
In response to | Re: snapshot too old, configured by time (Kevin Grittner <kgrittn@gmail.com>) |
Responses |
Re: snapshot too old, configured by time
|
List | pgsql-hackers |
On Wed, Mar 30, 2016 at 11:53 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > When the initial "proof of concept" patch was tested by the > customer, it was not effective due to issues related to what you > raise. Autovacuum workers were blocking due to the page pins for > scans using these old snapshots, causing the bloat to accumulate in > spite of this particular patch. This was addressed, at least to a > degree sufficient for this customer, with this patch: > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 > > Basically, for most common cases the "super-exclusive" locking has > been eliminated from btree logic; and I have been happy to see that > Simon has been working on dealing with the corner cases where I > hadn't rooted it out. > >> I worry that something weird could happen there. For example, perhaps >> the page LSN on what is actually a newly recycled page could be set >> such that the backend following a stale right spuriously raises a >> "snapshot too old" error. > > That particular detail doesn't seem to be a realistic concern, > though -- if a page has been deleted, assigned to a new place in > the index with an LSN corresponding to that action, it would be a > pretty big bug if a right-pointer still referenced it. Yes, that would be a big bug. But I wasn't talking about "super-exclusive" locking. Rather, I was talking about the general way in which index scans are guaranteed to not land on an already-recycled page (not a half-dead page, and not a fully deleted page -- a fully reclaimed/recycled page). This works without buffer pins or buffer locks needing to be held at all -- there is a global interlock against page *recycling* based on RecentGlobalXmin, per the nbtree README. So, this vague concern of mine is about VACUUM's B-Tree page recycling. During an index scan, we expect to be able to land on the next page, and to be at least able to reason about it being deleted, even though we don't hold a pin on anything for a period. We certainly are shy about explaining all this, but if you look at a routine like _bt_search() carefully (the routine that is used to descend a B-Tree), it doesn't actually hold a pin concurrently, as we drop a level (and certainly not a buffer lock, either). The next page/block should be substantively the same page as expected from the downlink (or right link) we followed, entirely because of the RecentGlobalXmin interlock. Backwards scans also rely on this. This is just a vague concern, and possibly this is completely irrelevant. I haven't read the patch. >> I suggest you consider making amcheck [1] a part of your testing >> strategy. I think that this patch is a good idea, and I'd be happy to >> take feedback from you on how to make amcheck more effective for >> testing this patch in particular. > > I'm not sure how that would fit in; could you elaborate? Well, amcheck is a tool that in essence makes sure that B-Trees look structurally sound, and respect invariants like having every item on each page in logical order. That could catch a bug of the kind I just described, because it's quite likely that the recycled page would happen to have items that didn't comport with the ordering on the page. The block has been reused essentially at random. Importantly, amcheck can catch this without anything more than an AccessShareLock, so you have some hope of catching this kind of race condition (the stale downlink that you followed to get to the spuriously-recycled-early page doesn't stay stale for long). Or, maybe it would happen to catch some other random problem. Difficult to say. Again, this is based on a speculation that might be wrong. But it's worth considering. -- Peter Geoghegan
pgsql-hackers by date: