Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Robert Haas
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id CA+TgmoafNsHUDBOKchYQNN3yrniPMmBbNy84Hs+dmsy0dgy26w@mail.gmail.com
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: group locking: incomplete patch, just for discussion  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> And you haven't answered how you'd like to fix that.  The
>> only answer you've given so far, that I can see, is that maybe we can
>> foresee all the locks that the child might take and not initiate
>> parallelism if any of them conflict with locks we already hold. But
>> that's not a reasonable approach; we have no reasonable way of
>> predicting what locks the child will need, and even if we did, things
>> can change between plan time and execution time.
>
> We don't need to predict all future locks. We only need to check which
> self-exclusive locks we are already holding.

I can't follow that logic.  Why do you think self-exclusive locks are
the only ones that matter?

> I don't buy the plan time/execution time argument. We'll need to be able
> to adapt plans to the availability of bgworker slots *anyway*. Otherwise
> we either need to to set the number of bgworkers to "MaxConnections *
> MaxParallelism" or live with errors as soon as too many processes use
> parallelism. The former is clearly unrealistic given PG's per-backend
> overhead and the latter will be a *much* bigger PITA than all the
> deadlock scenarios talked about here. It'll constantly fail, even if
> everything is ok.

I certainly think that any parallel node needs to be able to cope with
getting fewer workers than it would ideally like to have.  But that's
different from saying that when our own backend takes new locks, we
have to invalidate plans.  Those seem like largely unrelated problems.

>> >> If the
>> >> deadlock detector would kill the processes anyway, it doesn't matter,
>> >> because CheckTableNotInUse() should do it first, so that we get a
>> >> better error and without waiting for deadlock_timeout.  So any
>> >> scenario that's predicated on the assumption that CheckTableNotInUse()
>> >> will succeed in a parallel context is 100% unconvincing to me as an
>> >> argument for anything.
>> >
>> > Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
>> > still rely on locks actually working. And it's not just postgres code,
>> > this *will* break user code.
>>
>> It would help if you'd provide some specific examples of the problems
>> you foresee.  In the absence of specificity, it's hard to tell come to
>> any conclusions about whether your concerns are well-founded, or
>> propose any way to overcome them.
>
> How about a frontend process having created a relation that then starts
> a parallel query. Then the frontend process ERRORs out and, in the
> course of that, does smgrDoPendingDeletes(). Which will delete the new
> relation. Boom. The background process might just be accessing it. If
> you think thats harmless, think e.g. what'd happen with heap cleanup
> records generated in the background process. They'd not replay.

OK, that's an excellent example.  Thanks.  I'll think about that.

> Another one is that the frontend process does, from some function or so,
> CREATE POLICY. Triggering a relcache flush. And RelationClearRelation()
> essentially assumes that a referenced relcache entry can't change (which
> is the reason the keep_* stuff is in RelationClearRelation()).

I don't quite follow that one.  Are you saying that the frontend would
do a CREATE POLICY while there are backend workers running?  I
seriously doubt that can ever be made safe, but I'd choose to prohibit
it using something other than the heavyweight lock manager.

> I think there's some local state as well, so that's probably not so
> easy. I guess this needs input from Kevin.

Yeah.  I'd be OK to have v1 disable parallelism at serializable, and
fix that in v2, but of course if we can avoid that it's even better.
The heavyweight locking issue is really killing me, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [v9.5] Custom Plan API
Next
From: Robert Haas
Date:
Subject: Re: [v9.5] Custom Plan API