Re: [HACKERS] amcheck (B-Tree integrity checking tool) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] amcheck (B-Tree integrity checking tool) |
Date | |
Msg-id | 20170306234939.2g7n34gaof5ukpc6@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] amcheck (B-Tree integrity checking tool) (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Hi, On 2017-02-13 11:52:53 -0500, Robert Haas wrote: > On Thu, Feb 9, 2017 at 5:32 PM, Andres Freund <andres@anarazel.de> wrote: > >> > Again, some parts of the code doing something bad isn't a good argument > >> > for doing again. Releasing locks early is a bad pattern, because backend > >> > code isn't generally safe against it, we have special code-paths for > >> > catalog tables to support it for those. > >> > >> I don't agree with that. The reason we keep relation locks until the > >> end of the transaction is to avoid surprising application code, not > >> because the system can't tolerate it. But here it's more likely that > >> retaining the lock will surprise the user. > > > > It's both. A bunch of code paths rely on early release ony happening on > > catalogs. E.g., and that's just the first thing that comes to mind, > > cache invalidations which really only are guaranteed to be delivered and > > visibile for other cache accesses, if the lock is only released after > > the end of the transaction. Object accesses like relation_open() accept > > invalidations *after* acquiring the lock. Combined with lock releases > > only happening *after* transaction commits that guarantees an up2date > > view of the cache; but if others release the lock earlier and have cache > > invalidations, that doesn't work anymore. Now you can argue that it's > > possibly harmless here because there's presumably no way this can ever > > trigger cache invals - and you're probably right - but this isn't the > > only place with such assumption and you'd definitely have to argue *why* > > it's right. > > I agree that code which issues cache invalidations needs to hold a > blocking lock until commit IF it's critical for other backends to see > the update. Right. And I think that needs to be the default assumption, because otherwise you need to be careful about all the relevant code paths. I'm fine with adding something along the lines of /** Release lock on relation early, this function doesn't generate any* cache invalidations, and it's convenient for callersto be able to* check multiple functions in one go.*/ (after checking that's true of course) and then skipping the lock release. What I'm absolutely not ok with is doing so without consideration and documentation. > I don't really understand your demand for "an argument why it's > right". My argument is simple: I can't think of a single thing that > it would break, and I don't believe there is any such thing. That's pretty much an argument. I just want documentation that calls attention that we're doing something that requires a modicum of care. I fail to see why that's unreasonable. I don't think it's that unlikely will come around adding something like "hey we can update the free page statistic here, because we just read the whole damn thing", suddenly invalidating the logic. Or we add a heap check that then might trigger pruning, which quite conceivably could trigger an invalidation. Or ... > I think you're trying to invent a coding rule that doesn't exist, but > I can't prove a negative. We had bugs around this in the past, so ... > Your argument so far is that "yeah, well, maybe inval doesn't emit > sinval traffic (duh?) but it might do something else that breaks, > prove to me that there is no such thing". I have no idea what you're trying to say with this. The rest of that paragraph just seems like bunch of odd hyperbole ("impossible to exceed the speed of light..."). > From my point of view, it's likely to be common to want to run amcheck > in series on a bunch of indexes, like by writing a SELECT query > against pg_class or pg_index and passing the OIDs to amcheck. With > the retain-locks design, that query accumulates locks on a large > number of objects, possibly blowing out the lock table or blocking > DDL. On the other hand, retaining the locks actually increases the chance to run a check to completion. If you e.g. write a query checking the indexes for a table, you suddenly can't be sure that subsequent indexes for the same table can be checked, because the relation lock has been released inbetween. Which then'd lead to erroring out after doing part of the expensive work. I don't think that refutes your argument, but I also don't think it's as clear cut as you make it out to be. Greetings, Andres Freund
pgsql-hackers by date: