Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Date
Msg-id CA+TgmoZrL5eHLX6N8nxx4RfrobVSD=PAiut4HxqB7mFbpLVDfw@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On Sat, Feb 13, 2016 at 9:20 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> The case that comes to mind for me is in logical decoding, for decoding
> prepared xacts. Being able to make the prepared xact a member of a "lock
> group" along with the decoding session's xact may provide a solution to the
> locking-related challenges there.
>
> I haven't looked closely at what's involved in the decoding prepared xact
> locking issues yet, just an idea.
>
> To do this it'd have to be possible to add an existing session/xact to a
> lock group (or make it the leader of a new lock group then join that group).
> Do you think that's practical with your design?

I doubt it.  It's only safe to join a locking group if you don't yet
hold any heavyweight locks.  I'm not going to say it'd be impossible
to lift that restriction, but it'd be pretty complex, because doing so
could either create or remove deadlocks that didn't exist before.  For
example, suppose A wanting AccessExclusiveLock waits for B wanting
AccessExclusvieLock waits for C holding AccessShareLock.  Then, C
joins A's lock group.  If A's lock request can't be granted
immediately - say D also holds AccessShareLock on the object - this is
a deadlock.   Moreover, C can't detect the deadlock in the normal
course of things because C is not waiting.  Sorting this out does not
sound simple.

It could possibly work if the decoding transaction holds no locks at
all, joins the prepared xact's locking group, does stuff, and then,
when it again reaches a point where it holds no locks, leaves the lock
group.  I wonder, though, what happens if you deadlock.  The decoding
transaction get killed, but you can't kill the prepared transaction,
so any locks it held would be retained.  Maybe that's OK, but I have a
sneaking suspicion there might be situations where we kill the
decoding transaction without resolving the deadlock.  Sharing locks
with a prepared transaction is not really what this was designed for.

I don't really understand what problem you are trying to solve here,
but I suspect there is a better solution than group locking.  The
thing is, in the normal course of events, heavyweight locking prevents
a lot of bad stuff from happening.  When you become a member of a lock
group, you're on your own recognizance to prevent that bad stuff.  The
parallel code does that (or hopefully does that, anyway) by imposing
severe restrictions on what you can do while in parallel mode; those
restrictions include "no writes whatsoever" and "no DDL".  If you
wanted to allow either of those things, you would need to think very,
very carefully about that, and then if you decided that it was going
to be safe, you'd need to think carefully about it a second time.

As I mentioned to Simon on another thread a while back, Thomas Munro
is working on a hash table that uses dynamic shared memory, and as
part of that work, he is taking the allocator work that I did a year
or two ago and turning that into a full-fledged allocator for dynamic
shared memory.  Once we have a real allocator and a real hash table
for DSM, I believe we'll be able to solve some of the problems that
currently require that parallel query - and probably anything that
uses group locking - be strictly read-only.  For example, we can use
that infrastructure to store a combo CID hash table that can grow
arbitrarily in a data structure that all cooperating processes can
share.  Unfortunately, it does not look like that work will be ready
in time to be included in PostgreSQL 9.6.  I think he will be in a
position to submit it in time for PostgreSQL 9.7, though.

>> I don't have any plans to implement anything like that but I
>> felt it was better to keep the concept of a lock group - which is a
>> group of processes that cooperate so closely that their locks need not
>> conflict - from the concept of a parallel context - which is a leader
>> process that is most likely connected to a user plus a bunch of
>> ephemeral background workers that aren't.  That way, if somebody later
>> wants to try to reuse the lock grouping stuff for something else,
>> nothing will get in the way of that; if not, no harm done, but keeping
>> the two things decoupled is at least easier to understand, IMHO.
>
> Yeah, strong +1

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Crash with old Windows on new CPU
Next
From: Robert Haas
Date:
Subject: Re: Way to check whether a particular block is on the shared_buffer?