Re: assessing parallel-safety - Mailing list pgsql-hackers

From Noah Misch
Subject Re: assessing parallel-safety
Date
Msg-id 20150214050959.GB3906203@tornado.leadboat.com
Whole thread Raw
In response to Re: assessing parallel-safety  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: assessing parallel-safety  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Feb 13, 2015 at 05:13:06PM -0500, Robert Haas wrote:
> On Fri, Feb 13, 2015 at 12:10 AM, Noah Misch <noah@leadboat.com> wrote:
> > Given your wish to optimize, I recommend first investigating the earlier
> > thought to issue eval_const_expressions() once per planner() instead of once
> > per subquery_planner().  Compared to the parallelModeRequired/parallelModeOK
> > idea, it would leave us with a more maintainable src/backend/optimizer.  I
> > won't object to either design, though.
> 
> In off-list discussions with Tom Lane, he pressed hard on the question
> of whether we can zero out the number of functions that are
> parallel-unsafe (i.e. can't be run while parallel even in the master)
> vs. parallel-restricted (must be run in the master rather than
> elsewhere).  The latter category can be handled by strictly local
> decision-making, without needing to walk the entire plan tree; e.g.
> parallel seq scan can look like this:
> 
> Parallel Seq Scan on foo
>    Filter: a = pg_backend_pid()
>    Parallel Filter: b = 1
> 
> And, indeed, I was pleasantly surprised when surveying the catalogs by
> how few functions were truly unsafe, vs. merely needing to be
> restricted to the master.  But I can't convince myself that there's
> any way sane of allowing database writes even in the master; creating
> new combo CIDs there seems disastrous, and users will be sad if a
> parallel plan is chosen for some_plpgsql_function_that_does_updates()
> and this then errors out because of parallel mode.

Yep.  The scarcity of parallel-unsafe, built-in functions reflects the
dominant subject matter of built-in functions.  User-defined functions are
more diverse.  It would take quite a big hammer to beat the parallel-unsafe
category into irrelevancy.

> Tom also argued that (1) trying to assess parallel-safety before
> preprocess_expressions() was doomed to fail, because
> preprocess_expressions() can additional function calls via, at least,
> inlining and default argument insertion and (2)
> preprocess_expressions() can't be moved earlier than without changing
> the semantics.  I'm not sure if he's right, but those are sobering
> conclusions.  Andres pointed out to me via IM that inlining is
> dismissable here; if inlining introduces a parallel-unsafe construct,
> the inlined function was mislabeled to begin with, and the user has
> earned the error message they get.  Default argument insertion might
> not be dismissable although the practical risks seem low.

All implementation difficulties being equal, I would opt to check for parallel
safety after inserting default arguments and before inlining.  Checking before
inlining reveals the mislabeling every time instead of revealing it only when
inline_function() gives up.  Timing of the parallel safety check relative to
default argument insertion matters less.  Remember, the risk is merely that a
user will find cause to remove a parallel-safe marking where he/she expected
the system to deduce parallel unsafety.  If implementation difficulties lead
to some other timings, that won't bother me.

Thanks,
nm



pgsql-hackers by date:

Previous
From: Atri Sharma
Date:
Subject: Re: Support UPDATE table SET(*)=...
Next
From: Michael Paquier
Date:
Subject: Re: Strange assertion using VACOPT_FREEZE in vacuum.c