Re: [HACKERS] amcheck (B-Tree integrity checking tool) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] amcheck (B-Tree integrity checking tool)
Date
Msg-id 20170306234939.2g7n34gaof5ukpc6@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] amcheck (B-Tree integrity checking tool)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2017-02-13 11:52:53 -0500, Robert Haas wrote:
> On Thu, Feb 9, 2017 at 5:32 PM, Andres Freund <andres@anarazel.de> wrote:
> >> > Again, some parts of the code doing something bad isn't a good argument
> >> > for doing again. Releasing locks early is a bad pattern, because backend
> >> > code isn't generally safe against it, we have special code-paths for
> >> > catalog tables to support it for those.
> >>
> >> I don't agree with that.  The reason we keep relation locks until the
> >> end of the transaction is to avoid surprising application code, not
> >> because the system can't tolerate it.  But here it's more likely that
> >> retaining the lock will surprise the user.
> >
> > It's both.  A bunch of code paths rely on early release ony happening on
> > catalogs. E.g., and that's just the first thing that comes to mind,
> > cache invalidations which really only are guaranteed to be delivered and
> > visibile for other cache accesses, if the lock is only released after
> > the end of the transaction.  Object accesses like relation_open() accept
> > invalidations *after* acquiring the lock. Combined with lock releases
> > only happening *after* transaction commits that guarantees an up2date
> > view of the cache; but if others release the lock earlier and have cache
> > invalidations, that doesn't work anymore.  Now you can argue that it's
> > possibly harmless here because there's presumably no way this can ever
> > trigger cache invals - and you're probably right - but this isn't the
> > only place with such assumption and you'd definitely have to argue *why*
> > it's right.
> 
> I agree that code which issues cache invalidations needs to hold a
> blocking lock until commit IF it's critical for other backends to see
> the update.

Right. And I think that needs to be the default assumption, because
otherwise you need to be careful about all the relevant code paths.  I'm
fine with adding something along the lines of
/** Release lock on relation early, this function doesn't generate any* cache invalidations, and it's convenient for
callersto be able to* check multiple functions in one go.*/
 
(after checking that's true of course) and then skipping the lock
release.  What I'm absolutely not ok with is doing so without
consideration and documentation.


> I don't really understand your demand for "an argument why it's
> right".  My argument is simple: I can't think of a single thing that
> it would break, and I don't believe there is any such thing.

That's pretty much an argument.  I just want documentation that calls
attention that we're doing something that requires a modicum of care.  I
fail to see why that's unreasonable.  I don't think it's that unlikely
will come around adding something like "hey we can update the free page
statistic here, because we just read the whole damn thing", suddenly
invalidating the logic.  Or we add a heap check that then might trigger
pruning, which quite conceivably could trigger an invalidation. Or ...


> I think you're trying to invent a coding rule that doesn't exist, but
> I can't prove a negative.

We had bugs around this in the past, so ...


> Your argument so far is that "yeah, well, maybe inval doesn't emit
> sinval traffic (duh?) but it might do something else that breaks,
> prove to me that there is no such thing".

I have no idea what you're trying to say with this.  The rest of that
paragraph just seems like bunch of odd hyperbole ("impossible to exceed
the speed of light...").


> From my point of view, it's likely to be common to want to run amcheck
> in series on a bunch of indexes, like by writing a SELECT query
> against pg_class or pg_index and passing the OIDs to amcheck.  With
> the retain-locks design, that query accumulates locks on a large
> number of objects, possibly blowing out the lock table or blocking
> DDL.

On the other hand, retaining the locks actually increases the chance to
run a check to completion.  If you e.g. write a query checking the
indexes for a table, you suddenly can't be sure that subsequent indexes
for the same table can be checked, because the relation lock has been
released inbetween.  Which then'd lead to erroring out after doing part
of the expensive work.

I don't think that refutes your argument, but I also don't think it's
as clear cut as you make it out to be.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Proposal : Parallel Merge Join