Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Robert Haas
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id CA+TgmoYhSq+xw_XeyEBZa_3zHxOUt3G6yg9PsPFMaKMLxZ8Yqg@mail.gmail.com
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: group locking: incomplete patch, just for discussion  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, Nov 12, 2014 at 2:57 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> Trying to catch up on this thread, please excuse me if these questions
> are already covered.

Welcome to the party.  The more, the merrier.  :-)

> You mention the possibility of undetected deadlocks, which is surely
> unacceptable. But why not improve the deadlock detector? How bad are
> _detected_ deadlocks? A lot of the concern was around catalog accesses,
> but those lock requests would rarely wait anyway.

Detected deadlocks are fine.  Improving the deadlock detector is the
heart of what needs to be done here.  As you say, the lock requests
we're talking about will rarely wait, so deadlocks won't be frequent.
The issue is making sure that, if they do happen, we get a better
behavior than "your parallel query hangs forever; good luck figuring
out why".

> I also wonder if group locking is generally the wrong approach to
> parallelism. Parallel scan/sort should work by assigning workers to
> chunks of data, and then merging the results. In principle the workers
> don't need to touch the same data at all, so why are we trying so hard
> to get them to all take the same locks?
>
> The reason, I assume, is that a full table is the lockable object, but
> we are imagining the segment files as the chunks. But maybe there's a
> way to address this more directly with an extra field in the lock tag,
> and perhaps some catalog knowledge?

In the common case, we're only taking AccessShareLock on some relation
to prevent it from being concurrently dropped or rewritten.  Locking
only part of the relation wouldn't lead to any improvement in
concurrency, because the full-relation lock isn't conflicting with
anything that the partial-relation lock wouldn't also need to conflict
with.

More generally, I think there's some misunderstanding about the
overall goal of the parallelism infrastructure that I'm trying to
create.  Many people have proposed interesting strategies for how we
could do various things (locking, planning, execution) better.  Some
of those ideas are, doubtless, good ones.  But my goal is in some ways
the opposite: I'm trying to make it possible to run as much existing
PostgreSQL backend code as possible inside a parallel worker without
any modification.  To do that, I'm trying to modify the subsystems
that are widely used throughout the backend - such as locking - in
such a way that the people using that infrastructure need not put
conditional logic into their code to make it do one thing when in
parallel mode and another thing when not in parallel mode.  And I'm
also trying to avoid fundamentally changing the way major subsystems
work today as a precondition of parallelism.

So I'm taking a very surgical approach to the patches I propose.  I'm
not proposing patches that do anything fundamentally new or different;
instead, I'm proposing patches that let the stuff we already do
continue to work.  In a single backend executing a query, we have lots
of code that assumes you can allocate memory, look data up in a
syscache, throw an error, lock a buffer, examine the value of a GUC,
test a tuple against the active snapshot, and so on.
For parallel mode to be useful, those same things need to work in a
parallel worker.  We of course do not need every crazy thing somebody
may want to do in a parallel worker, nor will it.  But anything that's
done in thousands of places throughout the code had better work, or
parallelism will be so restricted as to be useless.  So commit
2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, for example, is about
letting "ereport" work in a parallel worker, and this patch is about
making things like "SearchSysCache1" and "PG_GETARG_TEXT_PP" work
*reliably* in a parallel worker.

Now, the time to do new and different things is coming, and is maybe
even now not so very far away.  I'm not an expert on parallel
execution, and I'm sure there are a number of people far more skilled
than I at figuring out how to do a parallel scan, join, sort, or
whatever quickly.  What I'm trying to do is get the infrastructure to
a point where those people don't have to worry about whether the basic
facilities that we rely on throughout the backend are gonna work.

By way of comparison, think about the periodic discussions about
making the backend multi-threaded.  Since the backend relies on global
variables in an enormous number of places, it isn't thread-safe.  If
you spawn a thread inside a PostgreSQL backend, it can't safely
ereport(), palloc(), LWLockAcquire(), or just about anything else,
which means that, although anybody can write code that calls
pthread_create(), it's not useful to do so because there's practically
no existing backend code that you could safely call from the new
thread.  Using background worker processes eliminates a lot of those
problems - palloc and lwlocks just work, for example - but not all of
them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_receivexlog --status-interval add fsync feedback
Next
From: nill
Date:
Subject: Reverse Engineering - search constraints are not explicitly stated in the tables from the VIEW