Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Andres Freund
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id 20141120214552.GC3461@alap3.anarazel.de
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: group locking: incomplete patch, just for discussion  (Robert Haas <robertmhaas@gmail.com>)
Re: group locking: incomplete patch, just for discussion  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-11-20 15:47:39 -0500, Robert Haas wrote:
> On Thu, Nov 20, 2014 at 3:15 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Haven't I responded to this a few times already?
> > Not in a particularly convincing way at least.
> 
> That's not a very helpful response.

Well...

> >> I see no way, even theoretically, that it can be sane for
> >> CheckTableNotInUse() to succeed in a parallel context.  Ever.
> >
> > It's not exactly inconceivable that you'd want a parallel
> > CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating
> > backends, true, but it's still a pretty sane case.
> >
> > You haven't really made your case why you want to allow access to AEL
> > locked relations in a parallel context in the first place.
> 
> Well, what the heck was I doing here?
> 
> http://www.postgresql.org/message-id/CA+Tgmoa_+u-qCcFZ9zAVZFk+TGXbqQB1+0ktPZRGgPJ5PvrJVw@mail.gmail.com

I don't see anything about actual use cases there.

> I've *repeatedly* point out that if you *don't* allow this, stuff
> breaks.

You've pointed out some strange corner cases that might break yes.

> And you haven't answered how you'd like to fix that.  The
> only answer you've given so far, that I can see, is that maybe we can
> foresee all the locks that the child might take and not initiate
> parallelism if any of them conflict with locks we already hold. But
> that's not a reasonable approach; we have no reasonable way of
> predicting what locks the child will need, and even if we did, things
> can change between plan time and execution time.

We don't need to predict all future locks. We only need to check which
self-exclusive locks we are already holding.

I don't buy the plan time/execution time argument. We'll need to be able
to adapt plans to the availability of bgworker slots *anyway*. Otherwise
we either need to to set the number of bgworkers to "MaxConnections *
MaxParallelism" or live with errors as soon as too many processes use
parallelism. The former is clearly unrealistic given PG's per-backend
overhead and the latter will be a *much* bigger PITA than all the
deadlock scenarios talked about here. It'll constantly fail, even if
everything is ok.

> >> If the
> >> deadlock detector would kill the processes anyway, it doesn't matter,
> >> because CheckTableNotInUse() should do it first, so that we get a
> >> better error and without waiting for deadlock_timeout.  So any
> >> scenario that's predicated on the assumption that CheckTableNotInUse()
> >> will succeed in a parallel context is 100% unconvincing to me as an
> >> argument for anything.
> >
> > Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
> > still rely on locks actually working. And it's not just postgres code,
> > this *will* break user code.
> 
> It would help if you'd provide some specific examples of the problems
> you foresee.  In the absence of specificity, it's hard to tell come to
> any conclusions about whether your concerns are well-founded, or
> propose any way to overcome them.

How about a frontend process having created a relation that then starts
a parallel query. Then the frontend process ERRORs out and, in the
course of that, does smgrDoPendingDeletes(). Which will delete the new
relation. Boom. The background process might just be accessing it. If
you think thats harmless, think e.g. what'd happen with heap cleanup
records generated in the background process. They'd not replay.

Another one is that the frontend process does, from some function or so,
CREATE POLICY. Triggering a relcache flush. And RelationClearRelation()
essentially assumes that a referenced relcache entry can't change (which
is the reason the keep_* stuff is in RelationClearRelation()).

I'm pretty sure there's at least a dozen other such hazards around.

> > Your argument essentially is that we don't actually need to lock objects
> > if the other side only reads. Yes, you add a condition or two ontop
> > (e.g. CheckTableNotInUse() fails), but that's not that much.
> 
> I don't think that's my argument at all.  What I'm arguing is that
> there's more than one reason for taking locks.  Broadly, I think we
> take some locks to preserve transaction isolation guarantees and
> others to preserve data integrity.  If you're adding or dropping a
> column, nobody else had better be looking at that relation, because
> they might get confused and crash the server or something.  Backends,
> even cooperating parallel backends, won't be able to cope with the
> tuple descriptor changing under them.  But once that operation is
> complete, there's no reason why the *next* statement in that
> transaction can't see the relation you just modified, even though it's
> still now exclusively locked.  As long as we make sure that new
> parallel backends use our snapshot, they'll see the right metadata for
> that relation and can safely access it.  We still can't let *other*
> people see that relation because the ALTER TABLE is an uncommitted
> change, but that's no argument against letting our own parallel
> workers look at it.

I think the transaction abort example from above refutes that theory.

> > Btw, what are your thoughts about SERIALIZABLE and parallel query?
> > Afaics that'd be pretty hard to support?
> 
> Hmm, I haven't thought about that.  Since we never *wait* for SIREAD
> locks, and there's no deadlock detection to worry about, we might be
> able to finagle things so that the parallel backends just take the
> same SIREAD locks the parent would have taken, using the parent's
> PGPROC.  But that could easily turn out to be unworkable,
> insufficient, or otherwise brain-damaged.

I think there's some local state as well, so that's probably not so
easy. I guess this needs input from Kevin.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Next
From: David G Johnston
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()