Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Robert Haas
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id CA+Tgmoa7NNngUANO4eknJ3Me_gVdTWxEqnimPCcSHoeg4AiFLQ@mail.gmail.com
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: group locking: incomplete patch, just for discussion
List pgsql-hackers
On Thu, Nov 20, 2014 at 3:15 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Haven't I responded to this a few times already?
> Not in a particularly convincing way at least.

That's not a very helpful response.

>> I see no way, even theoretically, that it can be sane for
>> CheckTableNotInUse() to succeed in a parallel context.  Ever.
>
> It's not exactly inconceivable that you'd want a parallel
> CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating
> backends, true, but it's still a pretty sane case.
>
> You haven't really made your case why you want to allow access to AEL
> locked relations in a parallel context in the first place.

Well, what the heck was I doing here?

http://www.postgresql.org/message-id/CA+Tgmoa_+u-qCcFZ9zAVZFk+TGXbqQB1+0ktPZRGgPJ5PvrJVw@mail.gmail.com

I've *repeatedly* point out that if you *don't* allow this, stuff
breaks.  And you haven't answered how you'd like to fix that.  The
only answer you've given so far, that I can see, is that maybe we can
foresee all the locks that the child might take and not initiate
parallelism if any of them conflict with locks we already hold.  But
that's not a reasonable approach; we have no reasonable way of
predicting what locks the child will need, and even if we did, things
can change between plan time and execution time.

>> If the
>> deadlock detector would kill the processes anyway, it doesn't matter,
>> because CheckTableNotInUse() should do it first, so that we get a
>> better error and without waiting for deadlock_timeout.  So any
>> scenario that's predicated on the assumption that CheckTableNotInUse()
>> will succeed in a parallel context is 100% unconvincing to me as an
>> argument for anything.
>
> Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
> still rely on locks actually working. And it's not just postgres code,
> this *will* break user code.

It would help if you'd provide some specific examples of the problems
you foresee.  In the absence of specificity, it's hard to tell come to
any conclusions about whether your concerns are well-founded, or
propose any way to overcome them.

> Your argument essentially is that we don't actually need to lock objects
> if the other side only reads. Yes, you add a condition or two ontop
> (e.g. CheckTableNotInUse() fails), but that's not that much.

I don't think that's my argument at all.  What I'm arguing is that
there's more than one reason for taking locks.  Broadly, I think we
take some locks to preserve transaction isolation guarantees and
others to preserve data integrity.  If you're adding or dropping a
column, nobody else had better be looking at that relation, because
they might get confused and crash the server or something.  Backends,
even cooperating parallel backends, won't be able to cope with the
tuple descriptor changing under them.  But once that operation is
complete, there's no reason why the *next* statement in that
transaction can't see the relation you just modified, even though it's
still now exclusively locked.  As long as we make sure that new
parallel backends use our snapshot, they'll see the right metadata for
that relation and can safely access it.  We still can't let *other*
people see that relation because the ALTER TABLE is an uncommitted
change, but that's no argument against letting our own parallel
workers look at it.

> If we want
> to do this we really need to forbid *any* modification to catalog/user
> tables while parallel operations are ongoing. In the primary and worker
> backends.  I think that's going to end up being much more
> problematic/restrictive than not allowing non-cooperative paralellism
> after >= ShareUpdateExclusive. If we ever want to parallelize writing
> operations we'll regret this quite a bit.

I can't really follow how this follows from anything I said.

> Breaking the locking system - and that's what you're doing by removing
> the usual guarantees - seems just fundamentally wrong.  Yes, I can't
> name all the dangers that I think are out there.
>
> The 'lets grant all the current locks at start of parallel query'
> approach seems quite a bit safer than always doing that during parallel
> query, don't get me wrong.

"Lets grant all the current locks at start of parallel query" is the
proposal I'm focused on right now.  Based on your arguments and those
of others, I have long-since abandoned the original proposal of having
locks within a group never conflict.  It seemed like a good idea at
the time, but it would put way too much burden on the parallel workers
to coordinate their activity with each other, so it's dead.  What I
want to establish at this time is what types of problems, if any, you
or others can foresee with "Lets grant all the current locks at start
of parallel query" - because I sense considerable skepticism, but
cannot put my finger on anything concretely wrong with it.

> Btw, what are your thoughts about SERIALIZABLE and parallel query?
> Afaics that'd be pretty hard to support?

Hmm, I haven't thought about that.  Since we never *wait* for SIREAD
locks, and there's no deadlock detection to worry about, we might be
able to finagle things so that the parallel backends just take the
same SIREAD locks the parent would have taken, using the parent's
PGPROC.  But that could easily turn out to be unworkable,
insufficient, or otherwise brain-damaged.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Next
From: Robert Haas
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()