Re: assessing parallel-safety - Mailing list pgsql-hackers

From Robert Haas
Subject Re: assessing parallel-safety
Date
Msg-id CA+TgmoZH6Z7_MsFUz1RX5d8yBSCiO8j2PzTqwQ6g6KfObRsjrw@mail.gmail.com
Whole thread Raw
In response to Re: assessing parallel-safety  (Noah Misch <noah@leadboat.com>)
Responses Re: assessing parallel-safety  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Sat, Feb 14, 2015 at 12:09 AM, Noah Misch <noah@leadboat.com> wrote:
>> Tom also argued that (1) trying to assess parallel-safety before
>> preprocess_expressions() was doomed to fail, because
>> preprocess_expressions() can additional function calls via, at least,
>> inlining and default argument insertion and (2)
>> preprocess_expressions() can't be moved earlier than without changing
>> the semantics.  I'm not sure if he's right, but those are sobering
>> conclusions.  Andres pointed out to me via IM that inlining is
>> dismissable here; if inlining introduces a parallel-unsafe construct,
>> the inlined function was mislabeled to begin with, and the user has
>> earned the error message they get.  Default argument insertion might
>> not be dismissable although the practical risks seem low.
>
> All implementation difficulties being equal, I would opt to check for parallel
> safety after inserting default arguments and before inlining.  Checking before
> inlining reveals the mislabeling every time instead of revealing it only when
> inline_function() gives up.  Timing of the parallel safety check relative to
> default argument insertion matters less.  Remember, the risk is merely that a
> user will find cause to remove a parallel-safe marking where he/she expected
> the system to deduce parallel unsafety.  If implementation difficulties lead
> to some other timings, that won't bother me.

Fair.  For a first cut at this, I've implemented the approach you
suggested before: just do an extra pass over the whole query tree,
looking for parallel-unsafe functions.  The attached patch implements
that, via a new proparallel system catalog column.  A few notes:

- For testing purposes, I set this up so that the executor activates
parallel mode when running any query that seems to be parallel-safe.
It seems nearly certain we'd only want to do this when a parallel plan
was actually selected, but this behavior is awfully useful for
testing, and revealed a number of bugs in the parallel-safety
markings.  parallel-mode-v6.patch + this patch passes make
check-world.

- This patch doesn't include syntax to let the user set the
proparallel flag on new objects.  Presumably we could add PARALLEL {
SAFE | RESTRICTED | UNSAFE } to CREATE/ALTER FUNCTION, but I haven't
done that yet.  This is enough to assess whether the approach is
fundamentally workable, and to check what the performance
ramifications may be.

- The patch includes a script, fixpgproc.pl, which I used to insert
and update the parallel markings on the system catalogs.  That is, of
course, not intended for commit, but it's mighty useful for
experimentation.

- parallel-mode-v6.patch adds a new restriction for a function to be
considered parallel-safe: it must not acquire any heavyweight locks
that it does not also release.  This allows scans of system catalogs
and the scan relation, but generally rules out scanning other things.
To compensate, this patch marks more stuff parallel-unsafe than the
list I sent previously.  I think that's OK for now, but this is a
pretty onerous restriction which I think we will want to try to relax
at least to the extent of allowing a function to be parallel-safe if
it acquires AccessShareLock on additional relations.  See the README
in the other patch for why I imposed this restriction.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: parallel mode and parallel contexts
Next
From: David G Johnston
Date:
Subject: Re: Add pg_settings.pending_restart column