Thread: assessing parallel-safety

Re: assessing parallel-safety

From

Noah Misch

Date:

08 February 2015, 16:31:32

On Sat, Feb 07, 2015 at 08:18:55PM -0500, Robert Haas wrote:
> There are a few problems with this design that I don't immediately
> know how to solve:
> 
> 1. I'm concerned that the query-rewrite step could substitute a query
> that is not parallel-safe for one that is.  The upper Query might
> still be flagged as safe, and that's all that planner() looks at.

I would look at determining the query's parallel safety early in the planner
instead; simplify_function() might be a cheap place to check.  Besides
avoiding rewriter trouble, this allows one to alter parallel safety of a
function without invalidating Query nodes serialized in the system catalogs.

> 2. Interleaving the execution of two parallel queries by firing up two
> copies of the executor simultaneously can result in leaving parallel
> mode at the wrong time.

Perhaps the parallel mode state should be a reference count, not a boolean.
Alternately, as a first cut, just don't attempt parallelism when we're already
in parallel mode.

> 3. Any code using SPI has to think hard about whether to pass
> OPT_CURSOR_NO_PARALLEL.  For example, PL/pgsql doesn't need to pass
> this flag when caching a plan for a query that will be run to
> completion each time it's executed.  But it DOES need to pass the flag
> for a FOR loop over an SQL statement, because the code inside the FOR
> loop might do parallel-unsafe things while the query is suspended.

That makes sense; the code entering SPI knows best which restrictions it can
tolerate for the life of a given cursor.  (One can imagine finer-grained rules
in the future.  If the current function is itself marked parallel-safe, it's
safe in principle for a FOR-loop SQL statement to use parallelism.)  I do
recommend inverting the sense of the flag, so unmodified non-core PLs will
continue to behave as they do today.

Thanks,
nm

Re: assessing parallel-safety

From

Robert Haas

Date:

08 February 2015, 17:28:34

On Sun, Feb 8, 2015 at 11:31 AM, Noah Misch <noah@leadboat.com> wrote:
> On Sat, Feb 07, 2015 at 08:18:55PM -0500, Robert Haas wrote:
>> There are a few problems with this design that I don't immediately
>> know how to solve:
>>
>> 1. I'm concerned that the query-rewrite step could substitute a query
>> that is not parallel-safe for one that is.  The upper Query might
>> still be flagged as safe, and that's all that planner() looks at.
>
> I would look at determining the query's parallel safety early in the planner
> instead; simplify_function() might be a cheap place to check.  Besides
> avoiding rewriter trouble, this allows one to alter parallel safety of a
> function without invalidating Query nodes serialized in the system catalogs.

Thanks, I'll investigate that approach.

>> 2. Interleaving the execution of two parallel queries by firing up two
>> copies of the executor simultaneously can result in leaving parallel
>> mode at the wrong time.
>
> Perhaps the parallel mode state should be a reference count, not a boolean.
> Alternately, as a first cut, just don't attempt parallelism when we're already
> in parallel mode.

I think changing it to a reference count makes sense.  I'll work on that.

>> 3. Any code using SPI has to think hard about whether to pass
>> OPT_CURSOR_NO_PARALLEL.  For example, PL/pgsql doesn't need to pass
>> this flag when caching a plan for a query that will be run to
>> completion each time it's executed.  But it DOES need to pass the flag
>> for a FOR loop over an SQL statement, because the code inside the FOR
>> loop might do parallel-unsafe things while the query is suspended.
>
> That makes sense; the code entering SPI knows best which restrictions it can
> tolerate for the life of a given cursor.  (One can imagine finer-grained rules
> in the future.  If the current function is itself marked parallel-safe, it's
> safe in principle for a FOR-loop SQL statement to use parallelism.)  I do
> recommend inverting the sense of the flag, so unmodified non-core PLs will
> continue to behave as they do today.

Yeah, that's probably a good idea.  Sort of annoying, but playing with
the patch in the OP made it pretty clear that we cannot possibly just
assume parallelism is OK by default.  In the core code, parallelism is
OK in more places than not, but in the PLs it seems to be the reverse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Robert Haas

Date:

11 February 2015, 14:40:00

On Sun, Feb 8, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 8, 2015 at 11:31 AM, Noah Misch <noah@leadboat.com> wrote:
>> On Sat, Feb 07, 2015 at 08:18:55PM -0500, Robert Haas wrote:
>>> There are a few problems with this design that I don't immediately
>>> know how to solve:
>>>
>>> 1. I'm concerned that the query-rewrite step could substitute a query
>>> that is not parallel-safe for one that is.  The upper Query might
>>> still be flagged as safe, and that's all that planner() looks at.
>>
>> I would look at determining the query's parallel safety early in the planner
>> instead; simplify_function() might be a cheap place to check.  Besides
>> avoiding rewriter trouble, this allows one to alter parallel safety of a
>> function without invalidating Query nodes serialized in the system catalogs.
>
> Thanks, I'll investigate that approach.

This does not seem to work out nicely.  The problem here is that
simplify_function() gets called from eval_const_expressions() which
gets called from a variety of places, but the principal one seems to
be subquery_planner().  So if you have a query with two subqueries,
and the second one contains something parallel-unsafe, you might by
that time have already generated a parallel plan for the first one,
which won't do.  Unless we want to rejigger this so that we do a
complete eval_const_expressions() pass over the entire query tree
(including all subqueries) FIRST, and then only after that go back and
plan all of those subqueries, I don't see how to make this work; and
I'm guessing that there are good reasons not to do that.

Ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Robert Haas

Date:

11 February 2015, 20:21:24

On Wed, Feb 11, 2015 at 9:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 8, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Feb 8, 2015 at 11:31 AM, Noah Misch <noah@leadboat.com> wrote:
>>> On Sat, Feb 07, 2015 at 08:18:55PM -0500, Robert Haas wrote:
>>>> There are a few problems with this design that I don't immediately
>>>> know how to solve:
>>>>
>>>> 1. I'm concerned that the query-rewrite step could substitute a query
>>>> that is not parallel-safe for one that is.  The upper Query might
>>>> still be flagged as safe, and that's all that planner() looks at.
>>>
>>> I would look at determining the query's parallel safety early in the planner
>>> instead; simplify_function() might be a cheap place to check.  Besides
>>> avoiding rewriter trouble, this allows one to alter parallel safety of a
>>> function without invalidating Query nodes serialized in the system catalogs.
>>
>> Thanks, I'll investigate that approach.
>
> This does not seem to work out nicely.  The problem here is that
> simplify_function() gets called from eval_const_expressions() which
> gets called from a variety of places, but the principal one seems to
> be subquery_planner().  So if you have a query with two subqueries,
> and the second one contains something parallel-unsafe, you might by
> that time have already generated a parallel plan for the first one,
> which won't do.  Unless we want to rejigger this so that we do a
> complete eval_const_expressions() pass over the entire query tree
> (including all subqueries) FIRST, and then only after that go back and
> plan all of those subqueries, I don't see how to make this work; and
> I'm guessing that there are good reasons not to do that.

And ... while I was just talking with Jan about this, I realized
something that should have been blindingly obvious to me from the
beginning, which is that anything we do at parse time is doomed to
failure, because the properties of a function that we use to determine
parallel-safety can be modified later:

rhaas=# alter function random() immutable;  -- no it isn't
ALTER FUNCTION

I think we may want a dedicated parallel-safe property for functions
rather than piggybacking on provolatile, but that will probably also
be changeable via ALTER FUNCTION, and stored rules won't get
miraculously updated.  So this definitely can't be something we figure
out at parse-time ... it's got to be determined later.  But at the
moment I see no way to do that without an extra pass over the whole
rewritten query tree.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Noah Misch

Date:

12 February 2015, 05:17:46

On Wed, Feb 11, 2015 at 03:21:12PM -0500, Robert Haas wrote:
> On Wed, Feb 11, 2015 at 9:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > This does not seem to work out nicely.  The problem here is that
> > simplify_function() gets called from eval_const_expressions() which
> > gets called from a variety of places, but the principal one seems to
> > be subquery_planner().  So if you have a query with two subqueries,
> > and the second one contains something parallel-unsafe, you might by
> > that time have already generated a parallel plan for the first one,
> > which won't do.

That is a major mark against putting the check in simplify_function(), agreed.

> > Unless we want to rejigger this so that we do a
> > complete eval_const_expressions() pass over the entire query tree
> > (including all subqueries) FIRST, and then only after that go back and
> > plan all of those subqueries, I don't see how to make this work; and
> > I'm guessing that there are good reasons not to do that.

I expect that would work fine, but I do think it premature to venture that far
out of your way to optimize this new tree examination.  The cost may just not
matter.  Other parts of the planner use code like contain_volatile_functions()
and contain_nonstrict_functions(), which have the same kind of inefficiency
you're looking to avoid here.

> I think we may want a dedicated parallel-safe property for functions
> rather than piggybacking on provolatile, but that will probably also
> be changeable via ALTER FUNCTION, and stored rules won't get
> miraculously updated.  So this definitely can't be something we figure
> out at parse-time ... it's got to be determined later.

Pretty much.

> But at the
> moment I see no way to do that without an extra pass over the whole
> rewritten query tree.  :-(

+1 for doing that.

Re: assessing parallel-safety

From

Amit Kapila

Date:

12 February 2015, 11:40:29

On Thu, Feb 12, 2015 at 1:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>
> I think we may want a dedicated parallel-safe property for functions
> rather than piggybacking on provolatile, but that will probably also
> be changeable via ALTER FUNCTION, and stored rules won't get
> miraculously updated. So this definitely can't be something we figure
> out at parse-time ... it's got to be determined later. But at the
> moment I see no way to do that without an extra pass over the whole
> rewritten query tree. :-(
>

If we have to go this way, then isn't it better to evaluate the same

when we are trying to create parallel path (something like in the

parallel_seq scan patch - create_parallelscan_paths())?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Robert Haas

Date:

12 February 2015, 12:40:22

On Thu, Feb 12, 2015 at 12:16 AM, Noah Misch <noah@leadboat.com> wrote:
> That is a major mark against putting the check in simplify_function(), agreed.

I do see one way to rescue that idea, which is this: put two flags,
parallelModeOK and parallelModeRequired, into PlannerGlobal.  At the
beginning of planning, set parallelModeOK based on our best knowledge
at that time; as we preprocess expressions, update it to false if we
see something that's not parallel-safe.  Emit paths for parallel plans
only if the flag is currently true.  At the end of planning, when we
convert paths to plans, set parallelModeRequired if any parallel plan
elements are generated.  If we started with parallelModeOK = true or
ended up with parallelModeRequired = false, then life is good.  In the
unfortunate event that we end up with parallelModeOK = false and
parallelModeRequired = true, replan, this time with parallelModeOK =
false from the beginning.

If there are no sub-queries involved, this will work out fine -
parallelModeOK will always be definitively set to the right value
before we rely on it for anything.  This includes cases where
subqueries are eliminated by pulling them up.  If there *are*
subqueries, we'll still be OK if we happen to hit the parallel-unsafe
construct before we hit the part - if any - that can benefit from
parallelism.

So this would mostly be pretty cheap, but if you do hit the case where
a re-plan is required it would be pretty painful.

>> > Unless we want to rejigger this so that we do a
>> > complete eval_const_expressions() pass over the entire query tree
>> > (including all subqueries) FIRST, and then only after that go back and
>> > plan all of those subqueries, I don't see how to make this work; and
>> > I'm guessing that there are good reasons not to do that.
>
> I expect that would work fine, but I do think it premature to venture that far
> out of your way to optimize this new tree examination.  The cost may just not
> matter.  Other parts of the planner use code like contain_volatile_functions()
> and contain_nonstrict_functions(), which have the same kind of inefficiency
> you're looking to avoid here.

They do, but those are run on much smaller pieces of the query tree,
never on the whole thing.  I really wish Tom Lane would weigh in here,
as an opinion from him could save me from spending time on a doomed
approach.  I expect criticism of this type:

http://www.postgresql.org/message-id/22266.1269641438@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Robert Haas

Date:

12 February 2015, 16:37:39

On Thu, Feb 12, 2015 at 6:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> If we have to go this way, then isn't it better to evaluate the same
> when we are trying to create parallel path (something like in the
> parallel_seq scan patch - create_parallelscan_paths())?

Probably not, because many queries will scan multiple relations, and
we want to do all of this work just once per query.  Also, when
somebody adds another parallel node (e.g. parallel hash join) that
will need this same information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Robert Haas

Date:

12 February 2015, 16:44:50

On Wed, Feb 11, 2015 at 3:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think we may want a dedicated parallel-safe property for functions
> rather than piggybacking on provolatile ...

I went through the current contents of pg_proc and tried to assess how
much parallel-unsafe stuff we've got.  I think there are actually
three categories of things: (1) functions that can be called in
parallel mode either in the worker or in the leader ("parallel safe"),
(2) functions that can be called in parallel mode in the worker, but
not in the leader ("parallel restricted"), and (3) functions that
cannot be called in parallel mode at all ("parallel unsafe").  On a
first read-through, the number of things that looked not to be
anything other than parallel-safe looked to be fairly small; many of
these could be made parallel-safe with more work, but it's unlikely to
be worth the effort.

current_query() - Restricted because debug_query_string is not copied.
lo_open(), lo_close(), loread(), lowrite(), and other large object
functions - Restricted because large object state is not shared.
age(xid) - Restricted because it uses a transaction-lifespan cache
which is not shared.
now() - Restricted because transaction start timestamp is not copied.
statement_timestamp() - Restricted because statement start timestamp
is not copied.
pg_conf_load_time() - Restricted because PgReloadTime is not copied.
nextval(), currval() - Restricted because sequence-related state is not shared.
setval() - Unsafe because no data can be written in parallel mode.
random(), setseed() - Restricted because random seed state is not
shared. (We could alternatively treat setseed() as unsafe and random()
to be restricted only in sessions where setseed() has never been
called, and otherwise safe.)
pg_stat_get_* - Restricted because there's no guarantee the value
would be the same in the parallel worker as in the leader.
pg_backend_pid() - Restricted because the worker has a different PID.
set_config() - Unsafe because GUC state must remain synchronized.
pg_my_temp_schema() - Restricted because temporary namespaces aren't
shared with parallel workers.
pg_export_snapshot() - Restricted because the worker will go away quickly.
pg_prepared_statement(), pg_cursor() - Restricted because the prepared
statements and cursors are not synchronized with the worker.
pg_listening_channels() - Restricted because listening channels are
not synchronized with the worker.
pg*advisory*lock*() - Restricted because advisory lock state is not
shared with workers - and even if it were, the semantics would be hard
to reason about.
txid_current() - Unsafe because it might attempt XID assignment.
pg_logical_slot*() - Unsafe because they do all kinds of crazy stuff.

That's not a lot, and very little of it is anything you'd care about
parallelizing anyway.  I expect that the incidence of user-written
parallel-unsafe functions will be considerably higher.  I'm not sure
if this impacts the decision about how to design the facility for
assessing parallel-safety or not, but I thought it was worth sharing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Amit Kapila

Date:

12 February 2015, 20:52:25

On Thu, Feb 12, 2015 at 10:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Feb 12, 2015 at 6:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > If we have to go this way, then isn't it better to evaluate the same
> > when we are trying to create parallel path (something like in the
> > parallel_seq scan patch - create_parallelscan_paths())?
>
> Probably not, because many queries will scan multiple relations, and
> we want to do all of this work just once per query.

By this, do you mean to say that if there is any parallel-unsafe

expression (function call) in query, then we won't parallelize any

part of query, if so why is that mandatory?

Can't we parallelize scan on a particular relation if all the expressions

in which that relation is involved are parallel-safe

> Also, when
> somebody adds another parallel node (e.g. parallel hash join) that
> will need this same information.
>

I think we should be able to handle this even if we have per relation

information (something like don't parallelize a node/path if any lower

node is not parallel)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Robert Haas

Date:

12 February 2015, 21:10:42

On Thu, Feb 12, 2015 at 3:52 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Probably not, because many queries will scan multiple relations, and
>> we want to do all of this work just once per query.
>
> By this, do you mean to say that if there is any parallel-unsafe
> expression (function call) in query, then we won't parallelize any
> part of query, if so why is that mandatory?

Because of stuff like set_config() and txid_current(), which will fail
outright in parallel mode.  Also because the user may have defined a
function that updates some other table in the database, which will
also fail outright if called in parallel mode.  Instead of failing, we
want those kinds of things to fall back to a non-parallel plan.

> Can't we parallelize scan on a particular relation if all the expressions
> in which that relation is involved are parallel-safe

No, because the execution of that node can be interleaved in arbitrary
fashion with all the other nodes in the plan tree.  Once we've got
parallel mode active, all the related prohibitions apply to
*everything* we do thereafter, not just that one node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Noah Misch

Date:

13 February 2015, 05:10:53

On Thu, Feb 12, 2015 at 07:40:12AM -0500, Robert Haas wrote:
> On Thu, Feb 12, 2015 at 12:16 AM, Noah Misch <noah@leadboat.com> wrote:
> > That is a major mark against putting the check in simplify_function(), agreed.
> 
> I do see one way to rescue that idea, which is this: put two flags,
> parallelModeOK and parallelModeRequired, into PlannerGlobal.  At the
> beginning of planning, set parallelModeOK based on our best knowledge
> at that time; as we preprocess expressions, update it to false if we
> see something that's not parallel-safe.  Emit paths for parallel plans
> only if the flag is currently true.  At the end of planning, when we
> convert paths to plans, set parallelModeRequired if any parallel plan
> elements are generated.  If we started with parallelModeOK = true or
> ended up with parallelModeRequired = false, then life is good.  In the
> unfortunate event that we end up with parallelModeOK = false and
> parallelModeRequired = true, replan, this time with parallelModeOK =
> false from the beginning.

> So this would mostly be pretty cheap, but if you do hit the case where
> a re-plan is required it would be pretty painful.

> >> > Unless we want to rejigger this so that we do a
> >> > complete eval_const_expressions() pass over the entire query tree
> >> > (including all subqueries) FIRST, and then only after that go back and
> >> > plan all of those subqueries, I don't see how to make this work; and
> >> > I'm guessing that there are good reasons not to do that.
> >
> > I expect that would work fine, but I do think it premature to venture that far
> > out of your way to optimize this new tree examination.

Given your wish to optimize, I recommend first investigating the earlier
thought to issue eval_const_expressions() once per planner() instead of once
per subquery_planner().  Compared to the parallelModeRequired/parallelModeOK
idea, it would leave us with a more maintainable src/backend/optimizer.  I
won't object to either design, though.

Your survey of parallel safety among built-in functions was on-target.

Re: assessing parallel-safety

From

Amit Kapila

Date:

13 February 2015, 08:20:20

On Fri, Feb 13, 2015 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Feb 12, 2015 at 3:52 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> Probably not, because many queries will scan multiple relations, and
> >> we want to do all of this work just once per query.
> >
> > By this, do you mean to say that if there is any parallel-unsafe
> > expression (function call) in query, then we won't parallelize any
> > part of query, if so why is that mandatory?
>
> Because of stuff like set_config() and txid_current(), which will fail
> outright in parallel mode. Also because the user may have defined a
> function that updates some other table in the database, which will
> also fail outright if called in parallel mode. Instead of failing, we
> want those kinds of things to fall back to a non-parallel plan.
>
> > Can't we parallelize scan on a particular relation if all the expressions
> > in which that relation is involved are parallel-safe
>
> No, because the execution of that node can be interleaved in arbitrary
> fashion with all the other nodes in the plan tree. Once we've got
> parallel mode active, all the related prohibitions apply to
> *everything* we do thereafter, not just that one node.
>

Okay, got the point.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Robert Haas

Date:

13 February 2015, 22:13:17

On Fri, Feb 13, 2015 at 12:10 AM, Noah Misch <noah@leadboat.com> wrote:
> Given your wish to optimize, I recommend first investigating the earlier
> thought to issue eval_const_expressions() once per planner() instead of once
> per subquery_planner().  Compared to the parallelModeRequired/parallelModeOK
> idea, it would leave us with a more maintainable src/backend/optimizer.  I
> won't object to either design, though.

In off-list discussions with Tom Lane, he pressed hard on the question
of whether we can zero out the number of functions that are
parallel-unsafe (i.e. can't be run while parallel even in the master)
vs. parallel-restricted (must be run in the master rather than
elsewhere).  The latter category can be handled by strictly local
decision-making, without needing to walk the entire plan tree; e.g.
parallel seq scan can look like this:

Parallel Seq Scan on foo  Filter: a = pg_backend_pid()  Parallel Filter: b = 1

And, indeed, I was pleasantly surprised when surveying the catalogs by
how few functions were truly unsafe, vs. merely needing to be
restricted to the master.  But I can't convince myself that there's
any way sane of allowing database writes even in the master; creating
new combo CIDs there seems disastrous, and users will be sad if a
parallel plan is chosen for some_plpgsql_function_that_does_updates()
and this then errors out because of parallel mode.

Tom also argued that (1) trying to assess parallel-safety before
preprocess_expressions() was doomed to fail, because
preprocess_expressions() can additional function calls via, at least,
inlining and default argument insertion and (2)
preprocess_expressions() can't be moved earlier than without changing
the semantics.  I'm not sure if he's right, but those are sobering
conclusions.  Andres pointed out to me via IM that inlining is
dismissable here; if inlining introduces a parallel-unsafe construct,
the inlined function was mislabeled to begin with, and the user has
earned the error message they get.  Default argument insertion might
not be dismissable although the practical risks seem low.

It's pretty awful that this is such a thorny issue, and the patch may
be rather cringe-worthy, given the (to me) essentially reasonable
nature of wanting to know whether a query has any references to a
function with a certain property in it.

> Your survey of parallel safety among built-in functions was on-target.

Thanks for having a look.   A healthy amount of the way the parallel
mode stuff works came from your brain, so your opinion, while always
valuable, is especially appreciated here.  I have a lot of confidence
in your judgement about this stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Noah Misch

Date:

14 February 2015, 05:10:12

On Fri, Feb 13, 2015 at 05:13:06PM -0500, Robert Haas wrote:
> On Fri, Feb 13, 2015 at 12:10 AM, Noah Misch <noah@leadboat.com> wrote:
> > Given your wish to optimize, I recommend first investigating the earlier
> > thought to issue eval_const_expressions() once per planner() instead of once
> > per subquery_planner().  Compared to the parallelModeRequired/parallelModeOK
> > idea, it would leave us with a more maintainable src/backend/optimizer.  I
> > won't object to either design, though.
> 
> In off-list discussions with Tom Lane, he pressed hard on the question
> of whether we can zero out the number of functions that are
> parallel-unsafe (i.e. can't be run while parallel even in the master)
> vs. parallel-restricted (must be run in the master rather than
> elsewhere).  The latter category can be handled by strictly local
> decision-making, without needing to walk the entire plan tree; e.g.
> parallel seq scan can look like this:
> 
> Parallel Seq Scan on foo
>    Filter: a = pg_backend_pid()
>    Parallel Filter: b = 1
> 
> And, indeed, I was pleasantly surprised when surveying the catalogs by
> how few functions were truly unsafe, vs. merely needing to be
> restricted to the master.  But I can't convince myself that there's
> any way sane of allowing database writes even in the master; creating
> new combo CIDs there seems disastrous, and users will be sad if a
> parallel plan is chosen for some_plpgsql_function_that_does_updates()
> and this then errors out because of parallel mode.

Yep.  The scarcity of parallel-unsafe, built-in functions reflects the
dominant subject matter of built-in functions.  User-defined functions are
more diverse.  It would take quite a big hammer to beat the parallel-unsafe
category into irrelevancy.

> Tom also argued that (1) trying to assess parallel-safety before
> preprocess_expressions() was doomed to fail, because
> preprocess_expressions() can additional function calls via, at least,
> inlining and default argument insertion and (2)
> preprocess_expressions() can't be moved earlier than without changing
> the semantics.  I'm not sure if he's right, but those are sobering
> conclusions.  Andres pointed out to me via IM that inlining is
> dismissable here; if inlining introduces a parallel-unsafe construct,
> the inlined function was mislabeled to begin with, and the user has
> earned the error message they get.  Default argument insertion might
> not be dismissable although the practical risks seem low.

All implementation difficulties being equal, I would opt to check for parallel
safety after inserting default arguments and before inlining.  Checking before
inlining reveals the mislabeling every time instead of revealing it only when
inline_function() gives up.  Timing of the parallel safety check relative to
default argument insertion matters less.  Remember, the risk is merely that a
user will find cause to remove a parallel-safe marking where he/she expected
the system to deduce parallel unsafety.  If implementation difficulties lead
to some other timings, that won't bother me.

Thanks,
nm

Re: assessing parallel-safety

From

Robert Haas

Date:

15 February 2015, 06:29:26

On Sat, Feb 14, 2015 at 12:09 AM, Noah Misch <noah@leadboat.com> wrote:
>> Tom also argued that (1) trying to assess parallel-safety before
>> preprocess_expressions() was doomed to fail, because
>> preprocess_expressions() can additional function calls via, at least,
>> inlining and default argument insertion and (2)
>> preprocess_expressions() can't be moved earlier than without changing
>> the semantics.  I'm not sure if he's right, but those are sobering
>> conclusions.  Andres pointed out to me via IM that inlining is
>> dismissable here; if inlining introduces a parallel-unsafe construct,
>> the inlined function was mislabeled to begin with, and the user has
>> earned the error message they get.  Default argument insertion might
>> not be dismissable although the practical risks seem low.
>
> All implementation difficulties being equal, I would opt to check for parallel
> safety after inserting default arguments and before inlining.  Checking before
> inlining reveals the mislabeling every time instead of revealing it only when
> inline_function() gives up.  Timing of the parallel safety check relative to
> default argument insertion matters less.  Remember, the risk is merely that a
> user will find cause to remove a parallel-safe marking where he/she expected
> the system to deduce parallel unsafety.  If implementation difficulties lead
> to some other timings, that won't bother me.

Fair.  For a first cut at this, I've implemented the approach you
suggested before: just do an extra pass over the whole query tree,
looking for parallel-unsafe functions.  The attached patch implements
that, via a new proparallel system catalog column.  A few notes:

- For testing purposes, I set this up so that the executor activates
parallel mode when running any query that seems to be parallel-safe.
It seems nearly certain we'd only want to do this when a parallel plan
was actually selected, but this behavior is awfully useful for
testing, and revealed a number of bugs in the parallel-safety
markings.  parallel-mode-v6.patch + this patch passes make
check-world.

- This patch doesn't include syntax to let the user set the
proparallel flag on new objects.  Presumably we could add PARALLEL {
SAFE | RESTRICTED | UNSAFE } to CREATE/ALTER FUNCTION, but I haven't
done that yet.  This is enough to assess whether the approach is
fundamentally workable, and to check what the performance
ramifications may be.

- The patch includes a script, fixpgproc.pl, which I used to insert
and update the parallel markings on the system catalogs.  That is, of
course, not intended for commit, but it's mighty useful for
experimentation.

- parallel-mode-v6.patch adds a new restriction for a function to be
considered parallel-safe: it must not acquire any heavyweight locks
that it does not also release.  This allows scans of system catalogs
and the scan relation, but generally rules out scanning other things.
To compensate, this patch marks more stuff parallel-unsafe than the
list I sent previously.  I think that's OK for now, but this is a
pretty onerous restriction which I think we will want to try to relax
at least to the extent of allowing a function to be parallel-safe if
it acquires AccessShareLock on additional relations.  See the README
in the other patch for why I imposed this restriction.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

assess-parallel-safety-v2.patch

Re: assessing parallel-safety

From

Noah Misch

Date:

15 February 2015, 23:25:02

It's important for parallelism to work under the extended query protocol, and
that's nontrivial.  exec_parse_message() sets cursorOptions.
exec_bind_message() runs the planner.  We don't know if a parallel plan is
okay before seeing max_rows in exec_execute_message().

On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote:
> - For testing purposes, I set this up so that the executor activates
> parallel mode when running any query that seems to be parallel-safe.
> It seems nearly certain we'd only want to do this when a parallel plan
> was actually selected, but this behavior is awfully useful for
> testing, and revealed a number of bugs in the parallel-safety
> markings.

Why wouldn't that be a good permanent behavior?  The testing benefit extends
to users.  If I wrongly mark a function something other than
PROPARALLEL_UNSAFE, I'd rather find out ASAP.

> - The patch includes a script, fixpgproc.pl, which I used to insert
> and update the parallel markings on the system catalogs.  That is, of
> course, not intended for commit, but it's mighty useful for
> experimentation.

The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
reset proparallel=u for some built-in functions, such as make_interval.

Some categories of PROPARALLEL_SAFE functions deserve more attention:

- database_to_xml* need to match proparallel of schema_to_xml*, which are PROPARALLEL_UNSAFE due to the heavyweight
lockrule.  cursor_to_xml* probably need to do likewise.
 

- Several functions, such as array_in and anytextcat, can call arbitrary I/O functions.  This implies a policy that I/O
functionsmust be PROPARALLEL_SAFE.  I agree with that decision, but it should be documented. Several json-related
functionscan invoke X->json cast functions; the marking implies that all such cast functions must be PROPARALLEL_SAFE.
That'sprobably okay, too.
 

- All selectivity estimator functions are PROPARALLEL_SAFE.  That implies, for example, that any operator using
eqjoinselought to be PROPARALLEL_SAFE or define its own restriction selectivity estimator function.  On the other hand,
theplanner need not check the parallel safety of selectivity estimator functions.  If we're already in parallel mode at
themoment we enter the planner, we must be planning a query called within a PROPARALLEL_SAFE function.  If the query
usesan unsafe operator, the containing function was mismarked.
 

- inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information they need indeed be available in
workers?

- Assuming you don't want to propagate XactLastRecEnd from the slave back to the master, restrict XLogInsert() during
parallelmode.  Restrict functions that call it, including pg_create_restore_point, pg_switch_xlog and pg_stop_backup.
 

- pg_extension_config_dump modifies the database, so it shall be PROPARALLEL_UNSAFE.  Assuming pg_stat_clear_snapshot
won'treach back to the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED.
 

If a mismarked function elicits "ERROR:  cannot retain locks acquired while in
parallel mode", innocent queries later in the transaction get the same error:

begin;
select 1;            -- ok
savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR
select 1;            -- ERROR
rollback;

> --- a/src/backend/commands/typecmds.c
> +++ b/src/backend/commands/typecmds.c
> @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace,
>                                    false,        /* leakproof */
>                                    false,        /* isStrict */
>                                    PROVOLATILE_IMMUTABLE,        /* volatility */
> +                                  PROPARALLEL_UNSAFE,            /* parallel safety */

Range constructors are PROPARALLEL_SAFE.

> --- a/src/backend/executor/functions.c
> +++ b/src/backend/executor/functions.c
> @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list,
>              if (queryTree->commandType == CMD_UTILITY)
>                  stmt = queryTree->utilityStmt;
>              else
> -                stmt = (Node *) pg_plan_query(queryTree, 0, NULL);
> +                stmt = (Node *) pg_plan_query(queryTree,
> +                    fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
> +                    NULL);

(This code is deciding whether to allow parallelism in one statement of an
SQL-language function.)  Why does it care if the function is read-only?

I see that PL/pgSQL never allows parallelism in queries it launches.

> --- a/src/include/utils/lsyscache.h
> +++ b/src/include/utils/lsyscache.h
> @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype);
>  extern bool op_hashjoinable(Oid opno, Oid inputtype);
>  extern bool op_strict(Oid opno);
>  extern char op_volatile(Oid opno);
> +extern char op_parallel(Oid opno);

No such function defined.


In passing, I noticed a few cosmetic problems in the prerequisite
parallel-mode-v6.patch.  Typos: initating Musn't paralell paralellism
paralllelism processe.  Capitalization of "transactionIdPrecedes".  When
applying the patch, this:

$ git apply ../rev/parallel-mode-v6.patch 
../rev/parallel-mode-v6.patch:2897: indent with spaces.                        (nLocksAcquiredInParallelMode - n));
warning: 1 line adds whitespace errors.

Re: assessing parallel-safety

From

Robert Haas

Date:

18 February 2015, 17:23:24

On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch <noah@leadboat.com> wrote:
> It's important for parallelism to work under the extended query protocol, and
> that's nontrivial.  exec_parse_message() sets cursorOptions.
> exec_bind_message() runs the planner.  We don't know if a parallel plan is
> okay before seeing max_rows in exec_execute_message().

Yes, that's a problem.  One could require users of the extended query
protocol to indicate their willingness to accept a parallel query plan
when sending the bind message by setting the appropriate cursor
option; and one could, when that option is specified, refuse execute
messages with max_rows != 0.  However, that has the disadvantage of
forcing all clients to be updated for the new world order.  Or, one
could assume a parallel plan is OK and re-plan if we choose one and
then a non-zero max_rows shows up.  In the worst case, though, that
could require every query to be planned twice.

> On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote:
>> - For testing purposes, I set this up so that the executor activates
>> parallel mode when running any query that seems to be parallel-safe.
>> It seems nearly certain we'd only want to do this when a parallel plan
>> was actually selected, but this behavior is awfully useful for
>> testing, and revealed a number of bugs in the parallel-safety
>> markings.
>
> Why wouldn't that be a good permanent behavior?  The testing benefit extends
> to users.  If I wrongly mark a function something other than
> PROPARALLEL_UNSAFE, I'd rather find out ASAP.

It might be a good permanent behavior, or it might just annoy users
unnecessarily.  I am unable to guess what is best.

>> - The patch includes a script, fixpgproc.pl, which I used to insert
>> and update the parallel markings on the system catalogs.  That is, of
>> course, not intended for commit, but it's mighty useful for
>> experimentation.
>
> The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
> reset proparallel=u for some built-in functions, such as make_interval.

I can fix that if I add a PARALLEL { SAFE | RESTRICTED | UNSAFE }
clause to CREATE FUNCTION.  Or are you saying that CORF shouldn't
change the existing value of the flag?  I'm not sure exactly what our
policy is here, but I would expect that the CORF is doing the right
thing and we need to add the necessary syntax to CREATE FUNCTION and
then add that clause to those calls.

> Some categories of PROPARALLEL_SAFE functions deserve more attention:
>
> - database_to_xml* need to match proparallel of schema_to_xml*, which are
>   PROPARALLEL_UNSAFE due to the heavyweight lock rule.  cursor_to_xml*
>   probably need to do likewise.

Thanks, fixed.

> - Several functions, such as array_in and anytextcat, can call arbitrary I/O
>   functions.  This implies a policy that I/O functions must be
>   PROPARALLEL_SAFE.  I agree with that decision, but it should be documented.

Where?

>   Several json-related functions can invoke X->json cast functions; the
>   marking implies that all such cast functions must be PROPARALLEL_SAFE.
>   That's probably okay, too.
>
> - All selectivity estimator functions are PROPARALLEL_SAFE.  That implies, for
>   example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or
>   define its own restriction selectivity estimator function.  On the other
>   hand, the planner need not check the parallel safety of selectivity
>   estimator functions.  If we're already in parallel mode at the moment we
>   enter the planner, we must be planning a query called within a
>   PROPARALLEL_SAFE function.  If the query uses an unsafe operator, the
>   containing function was mismarked.

This seems backwards to me.  If some hypothetical selectivity
estimator were PROPARALLEL_UNSAFE, then any operator that uses that
function would also need to be PROPARALLEL_UNSAFE.  The reverse is not
true.  The operator can be as unsafe as it likes and still use a safe
estimator.

> - inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information
>   they need indeed be available in workers?

Nope, marked those PROPARALLEL_RESTRICTED.  Thanks.

> - Assuming you don't want to propagate XactLastRecEnd from the slave back to
>   the master, restrict XLogInsert() during parallel mode.  Restrict functions
>   that call it, including pg_create_restore_point, pg_switch_xlog and
>   pg_stop_backup.

Hmm.  Altogether prohibiting XLogInsert() in parallel mode seems
unwise, because it would foreclose heap_page_prune_opt() in workers.
I realize there's separate conversation about whether pruning during
SELECT queries is good policy, but in the interested of separating
mechanism from policy, and in the sure knowledge that allowing at
least some writes in parallel mode is certainly going to be something
people will want, it seems better to look into propagating
XactLastRecEnd.

> - pg_extension_config_dump modifies the database, so it shall be
>   PROPARALLEL_UNSAFE.  Assuming pg_stat_clear_snapshot won't reach back to
>   the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED.

Right, makes sense.

> If a mismarked function elicits "ERROR:  cannot retain locks acquired while in
> parallel mode", innocent queries later in the transaction get the same error:
>
> begin;
> select 1;                       -- ok
> savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR
> select 1;                       -- ERROR
> rollback;

Hmm, I guess that's a bug in the parallel-mode patch.  Will fix.

>> --- a/src/backend/commands/typecmds.c
>> +++ b/src/backend/commands/typecmds.c
>> @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace,
>>                                                                 false,                /* leakproof */
>>                                                                 false,                /* isStrict */
>>                                                                 PROVOLATILE_IMMUTABLE,                /* volatility
*/
>> +                                                               PROPARALLEL_UNSAFE,                   /* parallel
safety*/ 
>
> Range constructors are PROPARALLEL_SAFE.

Changed.

>> --- a/src/backend/executor/functions.c
>> +++ b/src/backend/executor/functions.c
>> @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list,
>>                       if (queryTree->commandType == CMD_UTILITY)
>>                               stmt = queryTree->utilityStmt;
>>                       else
>> -                             stmt = (Node *) pg_plan_query(queryTree, 0, NULL);
>> +                             stmt = (Node *) pg_plan_query(queryTree,
>> +                                     fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
>> +                                     NULL);
>
> (This code is deciding whether to allow parallelism in one statement of an
> SQL-language function.)  Why does it care if the function is read-only?

I don't know what I was thinking there.  Interestingly, changing that
causes several tests to fail with complaints about lock retention:

SELECT p.name, name(p.hobbies), name(equipment(p.hobbies)) FROM ONLY person p;
SELECT p.name, name(p.hobbies), name(equipment(p.hobbies)) FROM person* p;
SELECT name(equipment(p.hobbies)), p.name, name(p.hobbies) FROM ONLY person p;
SELECT (p.hobbies).equipment.name, p.name, name(p.hobbies) FROM person* p;
SELECT (p.hobbies).equipment.name, name(p.hobbies), p.name FROM ONLY person p;
SELECT name(equipment(p.hobbies)), name(p.hobbies), p.name FROM person* p;

I don't immediately know why.

> I see that PL/pgSQL never allows parallelism in queries it launches.

Yeah, I mentioned to mention that limitation in my initial email.  Fixed.

>> --- a/src/include/utils/lsyscache.h
>> +++ b/src/include/utils/lsyscache.h
>> @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype);
>>  extern bool op_hashjoinable(Oid opno, Oid inputtype);
>>  extern bool op_strict(Oid opno);
>>  extern char op_volatile(Oid opno);
>> +extern char op_parallel(Oid opno);
>
> No such function defined.

Removed.

> In passing, I noticed a few cosmetic problems in the prerequisite
> parallel-mode-v6.patch.  Typos: initating Musn't paralell paralellism
> paralllelism processe.  Capitalization of "transactionIdPrecedes".  When
> applying the patch, this:
>
> $ git apply ../rev/parallel-mode-v6.patch
> ../rev/parallel-mode-v6.patch:2897: indent with spaces.
>                                                     (nLocksAcquiredInParallelMode - n));
> warning: 1 line adds whitespace errors.

OK, will fix those.  Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

assess-parallel-safety-v3.patch

Re: assessing parallel-safety

From

Noah Misch

Date:

19 February 2015, 06:19:36

On Wed, Feb 18, 2015 at 12:23:05PM -0500, Robert Haas wrote:
> On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch <noah@leadboat.com> wrote:
> > It's important for parallelism to work under the extended query protocol, and
> > that's nontrivial.  exec_parse_message() sets cursorOptions.
> > exec_bind_message() runs the planner.  We don't know if a parallel plan is
> > okay before seeing max_rows in exec_execute_message().
> 
> Yes, that's a problem.  One could require users of the extended query
> protocol to indicate their willingness to accept a parallel query plan
> when sending the bind message by setting the appropriate cursor
> option; and one could, when that option is specified, refuse execute
> messages with max_rows != 0.  However, that has the disadvantage of
> forcing all clients to be updated for the new world order.

Right; that would imply a protocol version bump.

> Or, one
> could assume a parallel plan is OK and re-plan if we choose one and
> then a non-zero max_rows shows up.  In the worst case, though, that
> could require every query to be planned twice.

That sounds reasonable as a first cut.  It is perfect for libpq clients, which
always use zero max_rows.  The policy deserves wider scrutiny before release,
but it's the sort of thing we can tweak without wide-ranging effects on the
rest of the parallelism work.  One might use the Bind message portal name as
an additional heuristic.  libpq usually (always?) uses the empty portal name.
A caller specifying a non-empty portal name is far more likely to have a
non-zero max_rows on its way.  More radical still: upon receiving the Bind
message, peek ahead to see if an Execute message is waiting.  If so, let the
max_rows of the Execute inform Bind-associated planning.

> > The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
> > reset proparallel=u for some built-in functions, such as make_interval.
> 
> I can fix that if I add a PARALLEL { SAFE | RESTRICTED | UNSAFE }
> clause to CREATE FUNCTION.

Eventually that, but adding "UPDATE pg_proc" after the CORF is a fine stopgap.

> Or are you saying that CORF shouldn't
> change the existing value of the flag?  I'm not sure exactly what our
> policy is here, but I would expect that the CORF is doing the right
> thing and we need to add the necessary syntax to CREATE FUNCTION and
> then add that clause to those calls.

I agree that CREATE OR REPLACE FUNCTION is doing the right thing.

> > - Several functions, such as array_in and anytextcat, can call arbitrary I/O
> >   functions.  This implies a policy that I/O functions must be
> >   PROPARALLEL_SAFE.  I agree with that decision, but it should be documented.
> 
> Where?

On further thought, it can wait for the CREATE FUNCTION syntax additions.  At
that time, xtypes.sgml is the place.  Perhaps just adding PARALLEL SAFE to the
example CREATE FUNCTION calls will suffice.

> > - All selectivity estimator functions are PROPARALLEL_SAFE.  That implies, for
> >   example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or

(I meant to write "eqsel", not "eqjoinsel".  eqjoinsel is not a restriction
selectivity estimator function, but the rest of the argument applies to both
functions.)  That argument does not apply to all possible selectivity
estimator functions.  It is common for an estimator to call the opercode on
MCV and/or histogram values, which makes the estimator function no more
parallel safe than the weakest associated opercode.  eqsel, eqjoinsel and
scalarltsel all do that.

> >   define its own restriction selectivity estimator function.  On the other
> >   hand, the planner need not check the parallel safety of selectivity
> >   estimator functions.  If we're already in parallel mode at the moment we
> >   enter the planner, we must be planning a query called within a
> >   PROPARALLEL_SAFE function.  If the query uses an unsafe operator, the
> >   containing function was mismarked.
> 
> This seems backwards to me.  If some hypothetical selectivity
> estimator were PROPARALLEL_UNSAFE, then any operator that uses that
> function would also need to be PROPARALLEL_UNSAFE.

It's fair to have a PROPARALLEL_UNSAFE estimator for a PROPARALLEL_SAFE
function, because the planning of a parallel query is often not itself done in
parallel mode.  In that case, "SELECT * FROM tablename WHERE colname OP 0"
might use a parallel seqscan but fail completely if called from inside a
function running in parallel mode.  That is to say, an affected query can
itself use parallelism, but placing the query in a function makes the function
PROPARALLEL_UNSAFE.  Surprising, but not wrong.

Rereading my previous message, I failed to make the bottom line clear: I
recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
estimator's proparallel before calling it in the planner.

> The reverse is not
> true.  The operator can be as unsafe as it likes and still use a safe
> estimator.

Agreed.  "contsel" is PROPARALLEL_SAFE no matter what operator uses it.

> > - Assuming you don't want to propagate XactLastRecEnd from the slave back to
> >   the master, restrict XLogInsert() during parallel mode.  Restrict functions
> >   that call it, including pg_create_restore_point, pg_switch_xlog and
> >   pg_stop_backup.
> 
> Hmm.  Altogether prohibiting XLogInsert() in parallel mode seems
> unwise, because it would foreclose heap_page_prune_opt() in workers.
> I realize there's separate conversation about whether pruning during
> SELECT queries is good policy, but in the interested of separating
> mechanism from policy, and in the sure knowledge that allowing at
> least some writes in parallel mode is certainly going to be something
> people will want, it seems better to look into propagating
> XactLastRecEnd.

Good points; that works for me.

Re: assessing parallel-safety

From

Robert Haas

Date:

12 March 2015, 15:21:49

[ deferring responses to some points until a later time ]

On Thu, Feb 19, 2015 at 1:19 AM, Noah Misch <noah@leadboat.com> wrote:
>> This seems backwards to me.  If some hypothetical selectivity
>> estimator were PROPARALLEL_UNSAFE, then any operator that uses that
>> function would also need to be PROPARALLEL_UNSAFE.
>
> It's fair to have a PROPARALLEL_UNSAFE estimator for a PROPARALLEL_SAFE
> function, because the planning of a parallel query is often not itself done in
> parallel mode.  In that case, "SELECT * FROM tablename WHERE colname OP 0"
> might use a parallel seqscan but fail completely if called from inside a
> function running in parallel mode.  That is to say, an affected query can
> itself use parallelism, but placing the query in a function makes the function
> PROPARALLEL_UNSAFE.  Surprising, but not wrong.
>
> Rereading my previous message, I failed to make the bottom line clear: I
> recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
> estimator's proparallel before calling it in the planner.

But what do these functions do that is actually unsafe?

>> > - Assuming you don't want to propagate XactLastRecEnd from the slave back to
>> >   the master, restrict XLogInsert() during parallel mode.  Restrict functions
>> >   that call it, including pg_create_restore_point, pg_switch_xlog and
>> >   pg_stop_backup.
>>
>> Hmm.  Altogether prohibiting XLogInsert() in parallel mode seems
>> unwise, because it would foreclose heap_page_prune_opt() in workers.
>> I realize there's separate conversation about whether pruning during
>> SELECT queries is good policy, but in the interested of separating
>> mechanism from policy, and in the sure knowledge that allowing at
>> least some writes in parallel mode is certainly going to be something
>> people will want, it seems better to look into propagating
>> XactLastRecEnd.
>
> Good points; that works for me.

The key design decision here seems to be this: How eagerly do we need
to synchronize XactLastRecEnd?  What exactly goes wrong if it isn't
synchronized?  For example, if the value were absolutely critical in
all circumstances, one could imagine storing a shared XactLastRecEnd
in shared memory.  This doesn't appear to be the case: the main
requirement is that we definitely need an up-to-date value at commit
time.  Also, at abort time, we don't really the value for anything
critical, but it's worth kicking the WAL writer so that any
accumulated WAL gets flushed.

Here's an incremental patch - which I will incorporate into the
parallel mode patch if it seems about right to you - that tries to
tackle all this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

sync-xactlastrecend.patch

Re: assessing parallel-safety

From

Noah Misch

Date:

15 March 2015, 06:39:48

On Thu, Mar 12, 2015 at 11:21:37AM -0400, Robert Haas wrote:
> On Thu, Feb 19, 2015 at 1:19 AM, Noah Misch <noah@leadboat.com> wrote:
> > Rereading my previous message, I failed to make the bottom line clear: I
> > recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
> > estimator's proparallel before calling it in the planner.
> 
> But what do these functions do that is actually unsafe?

They call the oprcode function of any operator naming them as an estimator.
Users can add operators that use eqsel() as an estimator, and we have no bound
on what those operators' oprcode can do.  (In practice, parallel-unsafe
operators using eqsel() as an estimator will be rare.)

> >> Hmm.  Altogether prohibiting XLogInsert() in parallel mode seems
> >> unwise, because it would foreclose heap_page_prune_opt() in workers.
> >> I realize there's separate conversation about whether pruning during
> >> SELECT queries is good policy, but in the interested of separating
> >> mechanism from policy, and in the sure knowledge that allowing at
> >> least some writes in parallel mode is certainly going to be something
> >> people will want, it seems better to look into propagating
> >> XactLastRecEnd.
> >
> > Good points; that works for me.
> 
> The key design decision here seems to be this: How eagerly do we need
> to synchronize XactLastRecEnd?  What exactly goes wrong if it isn't
> synchronized?  For example, if the value were absolutely critical in
> all circumstances, one could imagine storing a shared XactLastRecEnd
> in shared memory.  This doesn't appear to be the case: the main
> requirement is that we definitely need an up-to-date value at commit
> time.  Also, at abort time, we don't really the value for anything
> critical, but it's worth kicking the WAL writer so that any
> accumulated WAL gets flushed.

Exactly.  At entry to RecordTransactionCommit(), XactLastRecEnd must point
past the end of all records whose replay is required to satisfy the user's
expectation of transaction durability.  At other times, harm from its value
being wrong is negligible.  I do suggest adding a comment to its definition
explaining when one can rely on it.

> Here's an incremental patch - which I will incorporate into the
> parallel mode patch if it seems about right to you - that tries to
> tackle all this.

I reviewed this.  Minor things:

> @@ -73,10 +74,15 @@ typedef struct FixedParallelState
>      /* Entrypoint for parallel workers. */
>      parallel_worker_main_type    entrypoint;
>  
> -    /* Track whether workers have attached. */
> +    /* Mutex protects remaining fiedlds. */

Typo.

> @@ -2564,10 +2578,19 @@ AbortTransaction(void)
>       * worker, skip this; the user backend must be the one to write the abort
>       * record.
>       */
> -    if (parallel)
> -        latestXid = InvalidTransactionId;
> -    else
> +    if (!parallel)
>          latestXid = RecordTransactionAbort(false);
> +    else
> +    {
> +        latestXid = InvalidTransactionId;
> +
> +        /*
> +         * Since the parallel master won't get our value of XactLastRecEnd in this
> +         * case, we nudge WAL-writer ourselves in this case.  See related comments in
> +         * RecordTransactionAbort for why this matters.
> +         */
> +        XLogSetAsyncXactLSN(XactLastRecEnd);

RecordTransactionAbort() skips this for subtransaction aborts.  I would omit
it here, because a parallel worker abort is, in this respect, more like a
subtransaction abort than like a top-level transaction abort.


While studying README.parallel from parallel-mode-v7.patch to understand the
concept of worker commit/abort, this part gave me the wrong idea:

> +Transaction Integration
> +=======================
...
> +Transaction commit or abort requires careful coordination between backends.
> +Each backend has its own resource owners: buffer pins, catcache or relcache
> +reference counts, tuple descriptors, and so on are managed separately by each
> +backend, and each backend is separately responsible for releasing such
> +resources.  Generally, the commit or abort of a parallel worker is much like
> +a top-transaction commit or abort, but there are a few exceptions.  Most
> +importantly:
> +
> +  - No commit or abort record is written; the initiating backend is
> +    responsible for this.
> +
> +  - Cleanup of pg_temp namespaces is not done.  The initiating backend is
> +    responsible for this, too.
> +
> +The master kills off all remaining workers as part of commit or abort
> +processing.  It must not only kill the workers but wait for them to actually

I took this to mean that workers normally persist until the master commits a
transaction.  Rather, their lifecycle is like the lifecycle of a buffer pin.
When an error interrupts a parallel operation, transaction abort destroys
workers.  Otherwise, the code driving the specific parallel task destroys them
as early as is practical.  (Strictly to catch bugs, transaction commit does
detect and destroy leaked workers.)

> +exit; otherwise, chaos can ensue.  For example, if the master is
> +rolling back the transaction that created the relation being scanned by
> +a worker, the relation could disappear while the worker is still busy
> +scanning it.  That's not safe.

Re: assessing parallel-safety

From

Robert Haas

Date:

16 March 2015, 18:38:40

On Sun, Mar 15, 2015 at 2:39 AM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Mar 12, 2015 at 11:21:37AM -0400, Robert Haas wrote:
>> On Thu, Feb 19, 2015 at 1:19 AM, Noah Misch <noah@leadboat.com> wrote:
>> > Rereading my previous message, I failed to make the bottom line clear: I
>> > recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
>> > estimator's proparallel before calling it in the planner.
>>
>> But what do these functions do that is actually unsafe?
>
> They call the oprcode function of any operator naming them as an estimator.
> Users can add operators that use eqsel() as an estimator, and we have no bound
> on what those operators' oprcode can do.  (In practice, parallel-unsafe
> operators using eqsel() as an estimator will be rare.)

Is there a reason not to make a rule that opclass members must be
parallel-safe?  I ask because I think it's important that the process
of planning a query be categorically parallel-safe.  If we can't count
on that, life gets a lot more difficult - what happens when we're in a
parallel worker and call a SQL or PL/pgsql function?

Thanks for reviewing the incremental patch.  A few comments:

> Exactly.  At entry to RecordTransactionCommit(), XactLastRecEnd must point
> past the end of all records whose replay is required to satisfy the user's
> expectation of transaction durability.  At other times, harm from its value
> being wrong is negligible.  I do suggest adding a comment to its definition
> explaining when one can rely on it.

Good idea.

> RecordTransactionAbort() skips this for subtransaction aborts.  I would omit
> it here, because a parallel worker abort is, in this respect, more like a
> subtransaction abort than like a top-level transaction abort.

No, I don't think so.  A subtransaction abort will be followed by
either a toplevel commit or a toplevel abort, so any xlog written by
the subtransaction will be flushed either synchronously or
asynchronously at that time.  But for an aborting worker, that's not
true: there's nothing to force the worker's xlog out to disk if it's
ahead of the master's XactLastRecEnd.  If our XactLastRecEnd is behind
the master's, then it doesn't matter what we do: an extra flush
attempt is a no-op anyway.  If it's ahead, then we need it to be sure
of getting the same behavior that we would have gotten in the
non-parallel case.

> I took this to mean that workers normally persist until the master commits a
> transaction.  Rather, their lifecycle is like the lifecycle of a buffer pin.
> When an error interrupts a parallel operation, transaction abort destroys
> workers.  Otherwise, the code driving the specific parallel task destroys them
> as early as is practical.  (Strictly to catch bugs, transaction commit does
> detect and destroy leaked workers.)

Good point; will revise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Tom Lane

Date:

16 March 2015, 18:55:33

Robert Haas <robertmhaas@gmail.com> writes:
> Is there a reason not to make a rule that opclass members must be
> parallel-safe?  I ask because I think it's important that the process
> of planning a query be categorically parallel-safe.

I'm not seeing the connection between those two statements.  The planner
doesn't usually execute opclass members, at least not as such.
        regards, tom lane

Re: assessing parallel-safety

From

Robert Haas

Date:

16 March 2015, 19:02:26

On Mon, Mar 16, 2015 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Is there a reason not to make a rule that opclass members must be
>> parallel-safe?  I ask because I think it's important that the process
>> of planning a query be categorically parallel-safe.
>
> I'm not seeing the connection between those two statements.  The planner
> doesn't usually execute opclass members, at least not as such.

Hmm, I guess I'm spouting nonsense there.  The way the operator gets
invoked during planning is that eqsel() calls it.  But that doesn't
require it to be part of an opclass; it just has to be an operator
that's chosen that eqsel as its selectivity estimator.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Tom Lane

Date:

16 March 2015, 19:06:44

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 16, 2015 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Is there a reason not to make a rule that opclass members must be
>>> parallel-safe?  I ask because I think it's important that the process
>>> of planning a query be categorically parallel-safe.

>> I'm not seeing the connection between those two statements.  The planner
>> doesn't usually execute opclass members, at least not as such.

> Hmm, I guess I'm spouting nonsense there.  The way the operator gets
> invoked during planning is that eqsel() calls it.  But that doesn't
> require it to be part of an opclass; it just has to be an operator
> that's chosen that eqsel as its selectivity estimator.

Yeah.  So what we'd want here is a rule that selectivity estimator
functions must be parallel-safe.  For operators using estimators similar
to eqsel() that would imply a requirement on the operator's function
as well, but it's the estimator not any opclass connection that creates
that requirement.
        regards, tom lane

Re: assessing parallel-safety

From

Noah Misch

Date:

17 March 2015, 06:26:32

On Mon, Mar 16, 2015 at 02:38:34PM -0400, Robert Haas wrote:
> On Sun, Mar 15, 2015 at 2:39 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Thu, Mar 12, 2015 at 11:21:37AM -0400, Robert Haas wrote:
> >> On Thu, Feb 19, 2015 at 1:19 AM, Noah Misch <noah@leadboat.com> wrote:
> >> > Rereading my previous message, I failed to make the bottom line clear: I
> >> > recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
> >> > estimator's proparallel before calling it in the planner.
> >>
> >> But what do these functions do that is actually unsafe?
> >
> > They call the oprcode function of any operator naming them as an estimator.
> > Users can add operators that use eqsel() as an estimator, and we have no bound
> > on what those operators' oprcode can do.  (In practice, parallel-unsafe
> > operators using eqsel() as an estimator will be rare.)
> 
> Is there a reason not to make a rule that opclass members must be
> parallel-safe?  I ask because I think it's important that the process
> of planning a query be categorically parallel-safe.  If we can't count
> on that, life gets a lot more difficult - what happens when we're in a
> parallel worker and call a SQL or PL/pgsql function?

Neither that rule, nor its variant downthread, would hurt operator authors too
much.  To make the planner categorically parallel-safe, though, means limiting
evaluate_function() to parallel-safe functions.  That would dramatically slow
selected queries.  It's enough for the PL scenario if planning a parallel-safe
query is itself parallel-safe.  If the planner is parallel-unsafe when
planning a parallel-unsafe query, what would suffer?

> > RecordTransactionAbort() skips this for subtransaction aborts.  I would omit
> > it here, because a parallel worker abort is, in this respect, more like a
> > subtransaction abort than like a top-level transaction abort.
> 
> No, I don't think so.  A subtransaction abort will be followed by
> either a toplevel commit or a toplevel abort, so any xlog written by
> the subtransaction will be flushed either synchronously or
> asynchronously at that time.  But for an aborting worker, that's not
> true: there's nothing to force the worker's xlog out to disk if it's
> ahead of the master's XactLastRecEnd.  If our XactLastRecEnd is behind
> the master's, then it doesn't matter what we do: an extra flush
> attempt is a no-op anyway.  If it's ahead, then we need it to be sure
> of getting the same behavior that we would have gotten in the
> non-parallel case.

Fair enough.

Re: assessing parallel-safety

From

Robert Haas

Date:

17 March 2015, 13:48:32

On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
> Neither that rule, nor its variant downthread, would hurt operator authors too
> much.  To make the planner categorically parallel-safe, though, means limiting
> evaluate_function() to parallel-safe functions.  That would dramatically slow
> selected queries.  It's enough for the PL scenario if planning a parallel-safe
> query is itself parallel-safe.  If the planner is parallel-unsafe when
> planning a parallel-unsafe query, what would suffer?

Good point.  So I guess the rule can be that planning a parallel-safe
query should be parallel-safe.  From there, it follows that estimators
for a parallel-safe operator must also be parallel-safe.  Which seems
fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Robert Haas

Date:

18 March 2015, 16:01:29

On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
>> Neither that rule, nor its variant downthread, would hurt operator authors too
>> much.  To make the planner categorically parallel-safe, though, means limiting
>> evaluate_function() to parallel-safe functions.  That would dramatically slow
>> selected queries.  It's enough for the PL scenario if planning a parallel-safe
>> query is itself parallel-safe.  If the planner is parallel-unsafe when
>> planning a parallel-unsafe query, what would suffer?
>
> Good point.  So I guess the rule can be that planning a parallel-safe
> query should be parallel-safe.  From there, it follows that estimators
> for a parallel-safe operator must also be parallel-safe.  Which seems
> fine.

More work is needed here, but for now, here is a rebased patch, per
Amit's request.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

assess-parallel-safety-v4.patch

Re: assessing parallel-safety

From

Amit Kapila

Date:

19 March 2015, 13:35:26

On Wed, Feb 18, 2015 at 10:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch <noah@leadboat.com> wrote:
> > It's important for parallelism to work under the extended query protocol, and
> > that's nontrivial. exec_parse_message() sets cursorOptions.
> > exec_bind_message() runs the planner. We don't know if a parallel plan is
> > okay before seeing max_rows in exec_execute_message().
>
> Yes, that's a problem. One could require users of the extended query
> protocol to indicate their willingness to accept a parallel query plan
> when sending the bind message by setting the appropriate cursor
> option; and one could, when that option is specified, refuse execute
> messages with max_rows != 0. However, that has the disadvantage of
> forcing all clients to be updated for the new world order.

Another way could be when max_rows != 0, then inform executor to

just execute the plan locally, which means run the parallel seq scan

node in master only. We already need to do the same when no workers

are available at the time of execution.

I think we can inform executor by setting this information in Estate

during ExecutorRun.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Amit Kapila

Date:

19 March 2015, 14:49:15

On Thu, Mar 19, 2015 at 7:05 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Feb 18, 2015 at 10:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch <noah@leadboat.com> wrote:
> > > It's important for parallelism to work under the extended query protocol, and
> > > that's nontrivial. exec_parse_message() sets cursorOptions.
> > > exec_bind_message() runs the planner. We don't know if a parallel plan is
> > > okay before seeing max_rows in exec_execute_message().
> >
> > Yes, that's a problem. One could require users of the extended query
> > protocol to indicate their willingness to accept a parallel query plan
> > when sending the bind message by setting the appropriate cursor
> > option; and one could, when that option is specified, refuse execute
> > messages with max_rows != 0. However, that has the disadvantage of
> > forcing all clients to be updated for the new world order.
>
> Another way could be when max_rows != 0, then inform executor to
> just execute the plan locally, which means run the parallel seq scan

typo

/parallel/partial

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Amit Kapila

Date:

19 March 2015, 15:24:07

On Wed, Mar 18, 2015 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
> >> Neither that rule, nor its variant downthread, would hurt operator authors too
> >> much. To make the planner categorically parallel-safe, though, means limiting
> >> evaluate_function() to parallel-safe functions. That would dramatically slow
> >> selected queries. It's enough for the PL scenario if planning a parallel-safe
> >> query is itself parallel-safe. If the planner is parallel-unsafe when
> >> planning a parallel-unsafe query, what would suffer?
> >
> > Good point. So I guess the rule can be that planning a parallel-safe
> > query should be parallel-safe. From there, it follows that estimators
> > for a parallel-safe operator must also be parallel-safe. Which seems
> > fine.
>
> More work is needed here, but for now, here is a rebased patch, per
> Amit's request.
>

Thanks for rebasing the patch. I have done some performance testing

of this + parallel-mode-v8.1.patch to see the impact of these patches

for non-parallel statements.

First set of tests are done with pgbench read-only workload

Test Environment hydra: (IBM POWER-7 16 cores, 64 hardware threads)

Below data is median of 3 runs (detailed data of 3 runs can be found in

attached document pert_data_assess_parallel_safety.ods):

Shared_buffers- 8GB

Scale Factor - 100

HEAD - Commit - 8d1f2390

	1	8	16	32
HEAD	9098	70080	129891	195862
PATCH	8960	69678	130591	195570

This data shows there is no performance impact (apart from 1~2%

run-to-run difference, which I consider as noise) of these patches

on read only workload.

Second set of test constitutes of long sql statements with many expressions

in where clause to check the impact of extra-pass over query tree in

planner to decide if it contains any parallel-unsafe function.

Before executing below tests, execute test_prepare_parallel_safety.sql

script

Test-1

-----------

SQL statement containing 100 expressions (expressions are such that they

have CoerceViaIO node (presumably the costliest one in function

contain_parallel_unsafe_walker())) in where clause, refer attached sql

script (test_parallel_safety.sql)

Data for 10 runs (Highest of 10 runs):

HEAD - 1.563 ms

PATCH - 1.589 ms

Test-2

------------

Similar SQL statement as used for Test-1, but containing 500 expressions,

refer attached script (test_more_parallel_safety.sql)

Data for 5 runs (Median of 5 runs):

HEAD - 5.011 ms

PATCH - 5.201 ms

Second set of tests shows that there is about 1.5 to 3.8% performance

regression with patches. I think having such long expressions and that

to containing CoerceViaIO nodes is very unusual scenario, so this

doesn't seem to be too much of a concern.

Apart from this, I have one observation:

static int

exec_stmt_execsql(PLpgSQL_execstate *estate,

PLpgSQL_stmt_execsql

*stmt)

{

ParamListInfo paramLI;

long tcount;

int rc;

PLpgSQL_expr *expr =

stmt->sqlstmt;

* On the first call for this statement generate the plan, and detect

* whether

the statement is INSERT/UPDATE/DELETE

if (expr->plan == NULL)

{

ListCell *l;

exec_prepare_plan(estate, expr, 0);

Shouldn't we need parallelOk in function exec_stmt_execsql()

to pass cursoroption in above function as we have done in

exec_run_select()?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: assessing parallel-safety

From

Amit Kapila

Date:

20 March 2015, 10:44:26

On Thu, Mar 19, 2015 at 8:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 18, 2015 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
> > >> Neither that rule, nor its variant downthread, would hurt operator authors too
> > >> much. To make the planner categorically parallel-safe, though, means limiting
> > >> evaluate_function() to parallel-safe functions. That would dramatically slow
> > >> selected queries. It's enough for the PL scenario if planning a parallel-safe
> > >> query is itself parallel-safe. If the planner is parallel-unsafe when
> > >> planning a parallel-unsafe query, what would suffer?
> > >
> > > Good point. So I guess the rule can be that planning a parallel-safe
> > > query should be parallel-safe. From there, it follows that estimators
> > > for a parallel-safe operator must also be parallel-safe. Which seems
> > > fine.
> >
> > More work is needed here, but for now, here is a rebased patch, per
> > Amit's request.
> >
>
> Apart from this, I have one observation:
> static int
> exec_stmt_execsql(PLpgSQL_execstate *estate,
> PLpgSQL_stmt_execsql
> *stmt)
> {
> ParamListInfo paramLI;
> long tcount;
> int rc;
> PLpgSQL_expr *expr =
> stmt->sqlstmt;
>
> /*
> * On the first call for this statement generate the plan, and detect
> * whether
> the statement is INSERT/UPDATE/DELETE
> */
> if (expr->plan == NULL)
> {
> ListCell *l;
>
> exec_prepare_plan(estate, expr, 0);
>
> Shouldn't we need parallelOk in function exec_stmt_execsql()
> to pass cursoroption in above function as we have done in
> exec_run_select()?
>

Today while integrating parallel_seqscan patch with this patch, I had

another observation which is that even if the function is parallel-unsafe,

it still can treat statements inside that function as parallel-safe and allow

parallelism on such statements.

I think the code in question is as below:

@@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list,
if (queryTree->commandType == CMD_UTILITY)
stmt = queryTree->utilityStmt;
else
- stmt = (Node *) pg_plan_query(queryTree, 0, NULL);
+ stmt = (Node *) pg_plan_query(queryTree,
+ fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
+ NULL);

Basically this is executing a statement inside a function and if the

function is parallel-unsafe, and statement it is trying to execute is

parallel-safe (contains no other parallel-unsafe expressions), then

it will choose a parallel plan for such a statement. Shouldn't we

try to avoid such cases?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Thom Brown

Date:

20 March 2015, 12:09:35

On 18 March 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
>>> Neither that rule, nor its variant downthread, would hurt operator authors too
>>> much.  To make the planner categorically parallel-safe, though, means limiting
>>> evaluate_function() to parallel-safe functions.  That would dramatically slow
>>> selected queries.  It's enough for the PL scenario if planning a parallel-safe
>>> query is itself parallel-safe.  If the planner is parallel-unsafe when
>>> planning a parallel-unsafe query, what would suffer?
>>
>> Good point.  So I guess the rule can be that planning a parallel-safe
>> query should be parallel-safe.  From there, it follows that estimators
>> for a parallel-safe operator must also be parallel-safe.  Which seems
>> fine.
>
> More work is needed here, but for now, here is a rebased patch, per
> Amit's request.

This no longer applies due to changes in commit
13dbc7a824b3f905904cab51840d37f31a07a9ef.

-- 
Thom

Re: assessing parallel-safety

From

Alvaro Herrera

Date:

20 March 2015, 13:15:51

Thom Brown wrote:
> On 18 March 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
> >>> Neither that rule, nor its variant downthread, would hurt operator authors too
> >>> much.  To make the planner categorically parallel-safe, though, means limiting
> >>> evaluate_function() to parallel-safe functions.  That would dramatically slow
> >>> selected queries.  It's enough for the PL scenario if planning a parallel-safe
> >>> query is itself parallel-safe.  If the planner is parallel-unsafe when
> >>> planning a parallel-unsafe query, what would suffer?
> >>
> >> Good point.  So I guess the rule can be that planning a parallel-safe
> >> query should be parallel-safe.  From there, it follows that estimators
> >> for a parallel-safe operator must also be parallel-safe.  Which seems
> >> fine.
> >
> > More work is needed here, but for now, here is a rebased patch, per
> > Amit's request.
> 
> This no longer applies due to changes in commit
> 13dbc7a824b3f905904cab51840d37f31a07a9ef.

You should be able to drop the pg_proc.h changes and run the supplied
perl program.  (I'm not sure that sending the patched pg_proc.h together
with this patch is all that useful, really.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: assessing parallel-safety

From

Thom Brown

Date:

20 March 2015, 13:56:31

On 20 March 2015 at 13:16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thom Brown wrote:
>> On 18 March 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >> On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
>> >>> Neither that rule, nor its variant downthread, would hurt operator authors too
>> >>> much.  To make the planner categorically parallel-safe, though, means limiting
>> >>> evaluate_function() to parallel-safe functions.  That would dramatically slow
>> >>> selected queries.  It's enough for the PL scenario if planning a parallel-safe
>> >>> query is itself parallel-safe.  If the planner is parallel-unsafe when
>> >>> planning a parallel-unsafe query, what would suffer?
>> >>
>> >> Good point.  So I guess the rule can be that planning a parallel-safe
>> >> query should be parallel-safe.  From there, it follows that estimators
>> >> for a parallel-safe operator must also be parallel-safe.  Which seems
>> >> fine.
>> >
>> > More work is needed here, but for now, here is a rebased patch, per
>> > Amit's request.
>>
>> This no longer applies due to changes in commit
>> 13dbc7a824b3f905904cab51840d37f31a07a9ef.
>
> You should be able to drop the pg_proc.h changes and run the supplied
> perl program.  (I'm not sure that sending the patched pg_proc.h together
> with this patch is all that useful, really.)

Thanks.  All patches applied and building okay.
-- 
Thom

Re: assessing parallel-safety

From

Thom Brown

Date:

20 March 2015, 14:24:42

On 20 March 2015 at 13:55, Thom Brown <thom@linux.com> wrote:
> On 20 March 2015 at 13:16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Thom Brown wrote:
>>> On 18 March 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:
>>> > On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> >> On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
>>> >>> Neither that rule, nor its variant downthread, would hurt operator authors too
>>> >>> much.  To make the planner categorically parallel-safe, though, means limiting
>>> >>> evaluate_function() to parallel-safe functions.  That would dramatically slow
>>> >>> selected queries.  It's enough for the PL scenario if planning a parallel-safe
>>> >>> query is itself parallel-safe.  If the planner is parallel-unsafe when
>>> >>> planning a parallel-unsafe query, what would suffer?
>>> >>
>>> >> Good point.  So I guess the rule can be that planning a parallel-safe
>>> >> query should be parallel-safe.  From there, it follows that estimators
>>> >> for a parallel-safe operator must also be parallel-safe.  Which seems
>>> >> fine.
>>> >
>>> > More work is needed here, but for now, here is a rebased patch, per
>>> > Amit's request.
>>>
>>> This no longer applies due to changes in commit
>>> 13dbc7a824b3f905904cab51840d37f31a07a9ef.
>>
>> You should be able to drop the pg_proc.h changes and run the supplied
>> perl program.  (I'm not sure that sending the patched pg_proc.h together
>> with this patch is all that useful, really.)
>
> Thanks.  All patches applied and building okay.

Okay, breakage experienced, but not sure which thread this belongs on.

createdb pgbench
pgbench -i -s 200 pgbench

CREATE TABLE pgbench_accounts_1 (CHECK (bid = 1)) INHERITS (pgbench_accounts);
...
CREATE TABLE pgbench_accounts_200 (CHECK (bid = 200)) INHERITS
(pgbench_accounts);

WITH del AS (DELETE FROM pgbench_accounts WHERE bid = 1 RETURNING *)
INSERT INTO pgbench_accounts_1 SELECT * FROM del;
...
WITH del AS (DELETE FROM pgbench_accounts WHERE bid = 200 RETURNING *)
INSERT INTO pgbench_accounts_200 SELECT * FROM del;

VACUUM ANALYSE;


# SELECT name, setting FROM pg_settings WHERE name IN
('parallel_seqscan_degree','max_worker_processes','seq_page_cost');         name           | setting
-------------------------+---------max_worker_processes    | 20parallel_seqscan_degree | 8seq_page_cost           |
1000
(3 rows)

# EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;
server closed the connection unexpectedly   This probably means the server terminated abnormally   before or while
processingthe request.
 
The connection to the server was lost. Attempting reset: Failed.


Log file:

2015-03-20 14:19:30 GMT [4285]: [10-1]
user=thom,db=pgbench,client=[local] DEBUG:  StartTransactionCommand
2015-03-20 14:19:30 GMT [4285]: [11-1]
user=thom,db=pgbench,client=[local] DEBUG:  StartTransaction
2015-03-20 14:19:30 GMT [4285]: [12-1]
user=thom,db=pgbench,client=[local] DEBUG:  name: unnamed; blockState:     DEFAULT; state: INPROGR, xid/subid/cid:
0/1/0,nestl
 
vl: 1, children:
2015-03-20 14:19:30 GMT [4285]: [13-1]
user=thom,db=pgbench,client=[local] DEBUG:  ProcessUtility
2015-03-20 14:19:30 GMT [4285]: [14-1]
user=thom,db=pgbench,client=[local] DEBUG:  rehashing catalog cache id
45 for pg_class; 257 tups, 128 buckets
2015-03-20 14:19:30 GMT [4285]: [15-1]
user=thom,db=pgbench,client=[local] DEBUG:  rehashing catalog cache id
47 for pg_statistic; 257 tups, 128 buckets
2015-03-20 14:19:30 GMT [4285]: [16-1]
user=thom,db=pgbench,client=[local] DEBUG:  rehashing catalog cache id
47 for pg_statistic; 513 tups, 256 buckets
2015-03-20 14:19:30 GMT [4285]: [17-1]
user=thom,db=pgbench,client=[local] DEBUG:  rehashing catalog cache id
47 for pg_statistic; 1025 tups, 512 buckets
2015-03-20 14:19:31 GMT [4273]: [76-1] user=,db=,client= DEBUG:
forked new backend, pid=4286 socket=10
2015-03-20 14:19:31 GMT [4286]: [1-1]
user=thom,db=pgbench,client=[local] DEBUG:  postgres child[4286]:
starting with (
2015-03-20 14:19:31 GMT [4273]: [77-1] user=,db=,client= DEBUG:
reaping dead processes
2015-03-20 14:19:31 GMT [4273]: [78-1] user=,db=,client= DEBUG:
server process (PID 4285) was terminated by signal 11: Segmentation
fault
2015-03-20 14:19:31 GMT [4273]: [79-1] user=,db=,client= DETAIL:
Failed process was running: EXPLAIN SELECT DISTINCT bid FROM
pgbench_accounts;
2015-03-20 14:19:31 GMT [4273]: [80-1] user=,db=,client= LOG:  server
process (PID 4285) was terminated by signal 11: Segmentation fault
2015-03-20 14:19:31 GMT [4273]: [81-1] user=,db=,client= DETAIL:
Failed process was running: EXPLAIN SELECT DISTINCT bid FROM
pgbench_accounts;
2015-03-20 14:19:31 GMT [4273]: [82-1] user=,db=,client= LOG:
terminating any other active server processes
2015-03-20 14:19:31 GMT [4273]: [83-1] user=,db=,client= DEBUG:
sending SIGQUIT to process 4286
2015-03-20 14:19:31 GMT [4273]: [84-1] user=,db=,client= DEBUG:
sending SIGQUIT to process 4279
2015-03-20 14:19:31 GMT [4286]: [2-1]
user=thom,db=pgbench,client=[local] DEBUG:        postgres
2015-03-20 14:19:31 GMT [4273]: [85-1] user=,db=,client= DEBUG:
sending SIGQUIT to process 4277
2015-03-20 14:19:31 GMT [4286]: [3-1]
user=thom,db=pgbench,client=[local] DEBUG:  )
2015-03-20 14:19:31 GMT [4273]: [86-1] user=,db=,client= DEBUG:
sending SIGQUIT to process 4280
2015-03-20 14:19:31 GMT [4273]: [87-1] user=,db=,client= DEBUG:
sending SIGQUIT to process 4281
2015-03-20 14:19:31 GMT [4273]: [88-1] user=,db=,client= DEBUG:
sending SIGQUIT to process 4282
2015-03-20 14:19:31 GMT [4286]: [4-1]
user=thom,db=pgbench,client=[local] WARNING:  terminating connection
because of crash of another server process
2015-03-20 14:19:31 GMT [4286]: [5-1]
user=thom,db=pgbench,client=[local] DETAIL:  The postmaster has
commanded this server process to roll back the current transactionand exit, because another server process exited
abnormallyand
 
possibly corrupted shared memory.
2015-03-20 14:19:31 GMT [4286]: [6-1]
user=thom,db=pgbench,client=[local] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2015-03-20 14:19:31 GMT [4277]: [2-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4277]: [3-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4277]: [4-1] user=,db=,client= DEBUG:
proc_exit(-1): 0 callbacks to make
2015-03-20 14:19:31 GMT [4279]: [1-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4279]: [2-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4279]: [3-1] user=,db=,client= DEBUG:
proc_exit(-1): 0 callbacks to make
2015-03-20 14:19:31 GMT [4286]: [7-1]
user=thom,db=pgbench,client=[local] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4286]: [8-1]
user=thom,db=pgbench,client=[local] DEBUG:  shmem_exit(-1): 0
on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4286]: [9-1]
user=thom,db=pgbench,client=[local] DEBUG:  proc_exit(-1): 0 callbacks
to make
2015-03-20 14:19:31 GMT [4280]: [1-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4280]: [2-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4280]: [3-1] user=,db=,client= DEBUG:
proc_exit(-1): 0 callbacks to make
2015-03-20 14:19:31 GMT [4281]: [12-1] user=,db=,client= WARNING:
terminating connection because of crash of another server process
2015-03-20 14:19:31 GMT [4281]: [13-1] user=,db=,client= DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2015-03-20 14:19:31 GMT [4281]: [14-1] user=,db=,client= HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
2015-03-20 14:19:31 GMT [4281]: [15-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4281]: [16-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4281]: [17-1] user=,db=,client= DEBUG:
proc_exit(-1): 0 callbacks to make
2015-03-20 14:19:31 GMT [4282]: [4-1] user=,db=,client= DEBUG:
writing stats file "pg_stat/global.stat"
2015-03-20 14:19:31 GMT [4282]: [5-1] user=,db=,client= DEBUG:
writing stats file "pg_stat/db_16384.stat"
2015-03-20 14:19:31 GMT [4282]: [6-1] user=,db=,client= DEBUG:
removing temporary stats file "pg_stat_tmp/db_16384.stat"
2015-03-20 14:19:31 GMT [4282]: [7-1] user=,db=,client= DEBUG:
writing stats file "pg_stat/db_0.stat"
2015-03-20 14:19:31 GMT [4282]: [8-1] user=,db=,client= DEBUG:
removing temporary stats file "pg_stat_tmp/db_0.stat"
2015-03-20 14:19:31 GMT [4282]: [9-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4282]: [10-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4282]: [11-1] user=,db=,client= DEBUG:
proc_exit(-1): 0 callbacks to make
2015-03-20 14:19:31 GMT [4273]: [89-1] user=,db=,client= DEBUG:
reaping dead processes
2015-03-20 14:19:31 GMT [4273]: [90-1] user=,db=,client= DEBUG:
server process (PID 4286) exited with exit code 2
2015-03-20 14:19:31 GMT [4273]: [91-1] user=,db=,client= DEBUG:
reaping dead processes
2015-03-20 14:19:31 GMT [4273]: [92-1] user=,db=,client= LOG:  all
server processes terminated; reinitializing
2015-03-20 14:19:31 GMT [4273]: [93-1] user=,db=,client= DEBUG:
shmem_exit(1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4273]: [94-1] user=,db=,client= DEBUG:
shmem_exit(1): 4 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4273]: [95-1] user=,db=,client= DEBUG:
cleaning up dynamic shared memory control segment with ID 1804289383
2015-03-20 14:19:31 GMT [4273]: [96-1] user=,db=,client= DEBUG:
invoking IpcMemoryCreate(size=4211515392)
2015-03-20 14:19:31 GMT [4273]: [97-1] user=,db=,client= DEBUG:  mmap
with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2015-03-20 14:19:31 GMT [4273]: [98-1] user=,db=,client= DEBUG:
SlruScanDirectory invoking callback on pg_notify/0000
2015-03-20 14:19:31 GMT [4273]: [99-1] user=,db=,client= DEBUG:
removing file "pg_notify/0000"
2015-03-20 14:19:31 GMT [4273]: [100-1] user=,db=,client= DEBUG:
dynamic shared memory system will support 312 segments
2015-03-20 14:19:31 GMT [4273]: [101-1] user=,db=,client= DEBUG:
created dynamic shared memory control segment 1478456815 (2508 bytes)
2015-03-20 14:19:31 GMT [4287]: [1-1] user=,db=,client= LOG:  database
system was interrupted; last known up at 2015-03-20 14:19:19 GMT
2015-03-20 14:19:31 GMT [4287]: [2-1] user=,db=,client= DEBUG:
checkpoint record is at 1/C4354BA0
2015-03-20 14:19:31 GMT [4287]: [3-1] user=,db=,client= DEBUG:  redo
record is at 1/C4354BA0; shutdown TRUE
2015-03-20 14:19:31 GMT [4287]: [4-1] user=,db=,client= DEBUG:  next
transaction ID: 0/2046; next OID: 24576
2015-03-20 14:19:31 GMT [4287]: [5-1] user=,db=,client= DEBUG:  next
MultiXactId: 1; next MultiXactOffset: 0
2015-03-20 14:19:31 GMT [4287]: [6-1] user=,db=,client= DEBUG:  oldest
unfrozen transaction ID: 676, in database 1
2015-03-20 14:19:31 GMT [4287]: [7-1] user=,db=,client= DEBUG:  oldest
MultiXactId: 1, in database 1
2015-03-20 14:19:31 GMT [4287]: [8-1] user=,db=,client= DEBUG:  commit
timestamp Xid oldest/newest: 0/0
2015-03-20 14:19:31 GMT [4287]: [9-1] user=,db=,client= DEBUG:
transaction ID wrap limit is 2147484323, limited by database with OID
1
2015-03-20 14:19:31 GMT [4287]: [10-1] user=,db=,client= DEBUG:
MultiXactId wrap limit is 2147483648, limited by database with OID 1
2015-03-20 14:19:31 GMT [4287]: [11-1] user=,db=,client= DEBUG:
starting up replication slots
2015-03-20 14:19:31 GMT [4287]: [12-1] user=,db=,client= LOG:
database system was not properly shut down; automatic recovery in
progress
2015-03-20 14:19:31 GMT [4287]: [13-1] user=,db=,client= DEBUG:
resetting unlogged relations: cleanup 1 init 0
2015-03-20 14:19:31 GMT [4287]: [14-1] user=,db=,client= LOG:  invalid
record length at 1/C4354C10
2015-03-20 14:19:31 GMT [4287]: [15-1] user=,db=,client= LOG:  redo is
not required
2015-03-20 14:19:31 GMT [4287]: [16-1] user=,db=,client= DEBUG:
resetting unlogged relations: cleanup 0 init 1
2015-03-20 14:19:31 GMT [4287]: [17-1] user=,db=,client= DEBUG:
performing replication slot checkpoint
2015-03-20 14:19:31 GMT [4287]: [18-1] user=,db=,client= DEBUG:
attempting to remove WAL segments older than log file
0000000000000001000000C3
2015-03-20 14:19:31 GMT [4287]: [19-1] user=,db=,client= DEBUG:
SlruScanDirectory invoking callback on pg_multixact/offsets/0000
2015-03-20 14:19:31 GMT [4287]: [20-1] user=,db=,client= DEBUG:
SlruScanDirectory invoking callback on pg_multixact/members/0000
2015-03-20 14:19:31 GMT [4287]: [21-1] user=,db=,client= DEBUG:
SlruScanDirectory invoking callback on pg_multixact/offsets/0000
2015-03-20 14:19:31 GMT [4287]: [22-1] user=,db=,client= DEBUG:
shmem_exit(0): 1 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4287]: [23-1] user=,db=,client= DEBUG:
shmem_exit(0): 3 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4287]: [24-1] user=,db=,client= DEBUG:
proc_exit(0): 2 callbacks to make
2015-03-20 14:19:31 GMT [4287]: [25-1] user=,db=,client= DEBUG:  exit(0)
2015-03-20 14:19:31 GMT [4287]: [26-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4287]: [27-1] user=,db=,client= DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-03-20 14:19:31 GMT [4287]: [28-1] user=,db=,client= DEBUG:
proc_exit(-1): 0 callbacks to make
2015-03-20 14:19:31 GMT [4273]: [102-1] user=,db=,client= DEBUG:
reaping dead processes
2015-03-20 14:19:31 GMT [4288]: [1-1] user=,db=,client= DEBUG:
checkpointer updated shared memory configuration values
2015-03-20 14:19:31 GMT [4273]: [103-1] user=,db=,client= LOG:
database system is ready to accept connections

-- 
Thom

Re: assessing parallel-safety

From

Robert Haas

Date:

20 March 2015, 15:03:03

On Fri, Mar 20, 2015 at 10:24 AM, Thom Brown <thom@linux.com> wrote:
> 2015-03-20 14:19:31 GMT [4273]: [78-1] user=,db=,client= DEBUG:
> server process (PID 4285) was terminated by signal 11: Segmentation
> fault

Any chance you can get us a stack backtrace of this crash?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Thom Brown

Date:

20 March 2015, 15:08:55

On 20 March 2015 at 15:02, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 20, 2015 at 10:24 AM, Thom Brown <thom@linux.com> wrote:
>> 2015-03-20 14:19:31 GMT [4273]: [78-1] user=,db=,client= DEBUG:
>> server process (PID 4285) was terminated by signal 11: Segmentation
>> fault
>
> Any chance you can get us a stack backtrace of this crash?

(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000770843 in pfree ()
(gdb) bt
#0  0x0000000000770843 in pfree ()
#1  0x00000000005a382f in ExecEndFunnel ()
#2  0x000000000059fe75 in ExecEndAppend ()
#3  0x00000000005920bd in standard_ExecutorEnd ()
#4  0x000000000055004b in ExplainOnePlan ()
#5  0x000000000055025d in ExplainOneQuery ()
#6  0x000000000055064d in ExplainQuery ()
#7  0x0000000000680db1 in standard_ProcessUtility ()
#8  0x000000000067e1c1 in PortalRunUtility ()
#9  0x000000000067ef1d in FillPortalStore ()
#10 0x000000000067f8eb in PortalRun ()
#11 0x000000000067d628 in PostgresMain ()
#12 0x0000000000462c5e in ServerLoop ()
#13 0x000000000062e363 in PostmasterMain ()
#14 0x00000000004636ad in main ()

-- 
Thom

Re: assessing parallel-safety

From

Robert Haas

Date:

20 March 2015, 15:25:33

On Fri, Mar 20, 2015 at 11:08 AM, Thom Brown <thom@linux.com> wrote:
> On 20 March 2015 at 15:02, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 20, 2015 at 10:24 AM, Thom Brown <thom@linux.com> wrote:
>>> 2015-03-20 14:19:31 GMT [4273]: [78-1] user=,db=,client= DEBUG:
>>> server process (PID 4285) was terminated by signal 11: Segmentation
>>> fault
>>
>> Any chance you can get us a stack backtrace of this crash?
>
> (gdb) cont
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000770843 in pfree ()
> (gdb) bt
> #0  0x0000000000770843 in pfree ()
> #1  0x00000000005a382f in ExecEndFunnel ()
> #2  0x000000000059fe75 in ExecEndAppend ()
> #3  0x00000000005920bd in standard_ExecutorEnd ()
> #4  0x000000000055004b in ExplainOnePlan ()
> #5  0x000000000055025d in ExplainOneQuery ()
> #6  0x000000000055064d in ExplainQuery ()
> #7  0x0000000000680db1 in standard_ProcessUtility ()
> #8  0x000000000067e1c1 in PortalRunUtility ()
> #9  0x000000000067ef1d in FillPortalStore ()
> #10 0x000000000067f8eb in PortalRun ()
> #11 0x000000000067d628 in PostgresMain ()
> #12 0x0000000000462c5e in ServerLoop ()
> #13 0x000000000062e363 in PostmasterMain ()
> #14 0x00000000004636ad in main ()

OK, thanks.  That looks like it's probably the fault of parallel seq
scan patch rather than this one.  It would help if you could build
with debug symbols so that we can see line numbers and arguments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Thom Brown

Date:

20 March 2015, 17:24:58

<div dir="ltr">On 20 March 2015 at 15:25, Robert Haas <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br />> On Fri, Mar 20, 2015 at 11:08 AM,
ThomBrown <<a href="mailto:thom@linux.com">thom@linux.com</a>> wrote:<br />>> On 20 March 2015 at 15:02,
RobertHaas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> wrote:<br />>>> On Fri,
Mar20, 2015 at 10:24 AM, Thom Brown <<a href="mailto:thom@linux.com">thom@linux.com</a>> wrote:<br
/>>>>>2015-03-20 14:19:31 GMT [4273]: [78-1] user=,db=,client= DEBUG:<br />>>>> server process
(PID4285) was terminated by signal 11: Segmentation<br />>>>> fault<br />>>><br />>>> Any
chanceyou can get us a stack backtrace of this crash?<br />>><br />>> (gdb) cont<br />>>
Continuing.<br/>>><br />>> Program received signal SIGSEGV, Segmentation fault.<br />>>
0x0000000000770843in pfree ()<br />>> (gdb) bt<br />>> #0  0x0000000000770843 in pfree ()<br />>> #1
 0x00000000005a382fin ExecEndFunnel ()<br />>> #2  0x000000000059fe75 in ExecEndAppend ()<br />>> #3
 0x00000000005920bdin standard_ExecutorEnd ()<br />>> #4  0x000000000055004b in ExplainOnePlan ()<br />>>
#5 0x000000000055025d in ExplainOneQuery ()<br />>> #6  0x000000000055064d in ExplainQuery ()<br />>> #7
 0x0000000000680db1in standard_ProcessUtility ()<br />>> #8  0x000000000067e1c1 in PortalRunUtility ()<br
/>>>#9  0x000000000067ef1d in FillPortalStore ()<br />>> #10 0x000000000067f8eb in PortalRun ()<br
/>>>#11 0x000000000067d628 in PostgresMain ()<br />>> #12 0x0000000000462c5e in ServerLoop ()<br />>>
#130x000000000062e363 in PostmasterMain ()<br />>> #14 0x00000000004636ad in main ()<br />><br />> OK,
thanks. That looks like it's probably the fault of parallel seq<br />> scan patch rather than this one.  It would
helpif you could build<br />> with debug symbols so that we can see line numbers and arguments.<br /><br />Sure.<br
/><br/>Program received signal SIGABRT, Aborted.<br />0x00007f5a49fce1d5 in raise () from
/lib/x86_64-linux-gnu/libc.so.6<br/>(gdb) bt<br />#0  0x00007f5a49fce1d5 in raise () from
/lib/x86_64-linux-gnu/libc.so.6<br/>#1  0x00007f5a49fd1388 in abort () from /lib/x86_64-linux-gnu/libc.so.6<br />#2
 0x00000000007a053ain ExceptionalCondition (conditionName=conditionName@entry=0x813a4b "!(IsInParallelMode())",
errorType=errorType@entry=0x7da1d6"FailedAssertion", fileName=fileName@entry=0x81397d "parallel.c",
lineNumber=lineNumber@entry=123)at assert.c:54<br />#3  0x00000000004cd5ba in CreateParallelContext
(entrypoint=entrypoint@entry=0x659d2c<ParallelQueryMain>, nworkers=nworkers@entry=8) at parallel.c:123<br />#4
 0x000000000065a1c0in InitializeParallelWorkers (plan=0x281e6a0, estate=estate@entry=0x28b99a8,
rel=rel@entry=0x7f594eab2370,inst_options_space=inst_options_space@entry=0x28bbfa8,
buffer_usage_space=buffer_usage_space@entry=0x28bbfb0,responseqp=responseqp@entry=0x28bbf98,
pcxtp=pcxtp@entry=0x28bbf90,nWorkers=8) at backendworker.c:279<br />#5  0x00000000005d0e75 in InitFunnel
(node=node@entry=0x28bbf00,estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at nodeFunnel.c:61<br />#6
 0x00000000005d1026in ExecInitFunnel (node=0x281e738, estate=0x28b99a8, eflags=17) at nodeFunnel.c:121<br />#7
 0x00000000005c0f95in ExecInitNode (node=0x281e738, estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at
execProcnode.c:201<br/>#8  0x00000000005cd316 in ExecInitAppend (node=<optimized out>, estate=0x28b99a8,
eflags=17)at nodeAppend.c:168<br />#9  0x00000000005c0f25 in ExecInitNode (node=0x288b990,
estate=estate@entry=0x28b99a8,eflags=eflags@entry=17) at execProcnode.c:163<br />#10 0x00000000005ce849 in ExecInitAgg
(node=0x288ba28,estate=0x28b99a8, eflags=17) at nodeAgg.c:1580<br />#11 0x00000000005c10bf in ExecInitNode
(node=node@entry=0x288ba28,estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at execProcnode.c:302<br />#12
0x00000000005bfb35in InitPlan (queryDesc=queryDesc@entry=0x28b5868, eflags=eflags@entry=17) at execMain.c:939<br />#13
0x00000000005bfd49in standard_ExecutorStart (queryDesc=0x28b5868, eflags=17) at execMain.c:234<br />#14
0x00000000005bfd95in ExecutorStart (queryDesc=queryDesc@entry=0x28b5868, eflags=eflags@entry=1) at execMain.c:134<br
/>#150x0000000000573f21 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x28b7878, into=into@entry=0x0,
es=es@entry=0x24cde68,queryString=queryString@entry=0x248a398 "EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;",
params=params@entry=0x0,planduration=planduration@entry=0x7fffb64f4bf0) at explain.c:478<br />#16 0x0000000000574160 in
ExplainOneQuery(query=<optimized out>, into=into@entry=0x0, es=es@entry=0x24cde68,
queryString=queryString@entry=0x248a398"EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;", params=params@entry=0x0)
atexplain.c:346<br />#17 0x000000000057478a in ExplainQuery (stmt=stmt@entry=0x248b1b0,
queryString=queryString@entry=0x248a398"EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;", params=params@entry=0x0,
dest=dest@entry=0x24cddd0)at explain.c:234<br />#18 0x00000000006c6419 in standard_ProcessUtility (parsetree=0x248b1b0,
queryString=0x248a398"EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,dest=0x24cddd0, completionTag=0x7fffb64f4d90 "") at utility.c:657<br />#19 0x00000000006c6808 in
ProcessUtility(parsetree=parsetree@entry=0x248b1b0, queryString=<optimized out>,
context=context@entry=PROCESS_UTILITY_TOPLEVEL,params=<optimized out>, dest=dest@entry=0x24cddd0,
completionTag=completionTag@entry=0x7fffb64f4d90"") at utility.c:333<br />#20 0x00000000006c3272 in PortalRunUtility
(portal=portal@entry=0x24f2e28,utilityStmt=0x248b1b0, isTopLevel=<optimized out>, dest=dest@entry=0x24cddd0,
completionTag=completionTag@entry=0x7fffb64f4d90"") at pquery.c:1188<br />#21 0x00000000006c4039 in FillPortalStore
(portal=portal@entry=0x24f2e28,isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1062<br />#22 0x00000000006c4a12 in
PortalRun(portal=portal@entry=0x24f2e28, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x248b5e8,altdest=altdest@entry=0x248b5e8, completionTag=completionTag@entry=0x7fffb64f4fa0 "") at
pquery.c:786<br/>#23 0x00000000006c12c3 in exec_simple_query (query_string=query_string@entry=0x248a398 "EXPLAIN SELECT
DISTINCTbid FROM pgbench_accounts;") at postgres.c:1107<br />#24 0x00000000006c2de4 in PostgresMain (argc=<optimized
out>,argv=argv@entry=0x2421c28, dbname=0x2421a90 "pgbench", username=<optimized out>) at postgres.c:4118<br
/>#250x0000000000665c55 in BackendRun (port=port@entry=0x2447540) at postmaster.c:4148<br />#26 0x00000000006675a8 in
BackendStartup(port=port@entry=0x2447540) at postmaster.c:3833<br />#27 0x000000000066784b in ServerLoop () at
postmaster.c:1601<br/>#28 0x000000000066898d in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x2420c90) at
postmaster.c:1248<br/>#29 0x00000000005f5a25 in main (argc=1, argv=0x2420c90) at main.c:221<br /><br /><br />-- <br
/>Thom</div>

Re: assessing parallel-safety

From

Robert Haas

Date:

20 March 2015, 18:08:00

On Fri, Mar 20, 2015 at 1:24 PM, Thom Brown <thom@linux.com> wrote:
>> OK, thanks.  That looks like it's probably the fault of parallel seq
>> scan patch rather than this one.  It would help if you could build
>> with debug symbols so that we can see line numbers and arguments.
>
> Sure.
>
> Program received signal SIGABRT, Aborted.
> 0x00007f5a49fce1d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> #0  0x00007f5a49fce1d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f5a49fd1388 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00000000007a053a in ExceptionalCondition
> (conditionName=conditionName@entry=0x813a4b "!(IsInParallelMode())",
> errorType=errorType@entry=0x7da1d6 "FailedAssertion",
> fileName=fileName@entry=0x81397d "parallel.c",
> lineNumber=lineNumber@entry=123) at assert.c:54
> #3  0x00000000004cd5ba in CreateParallelContext
> (entrypoint=entrypoint@entry=0x659d2c <ParallelQueryMain>,
> nworkers=nworkers@entry=8) at parallel.c:123
> #4  0x000000000065a1c0 in InitializeParallelWorkers (plan=0x281e6a0,
> estate=estate@entry=0x28b99a8, rel=rel@entry=0x7f594eab2370,
> inst_options_space=inst_options_space@entry=0x28bbfa8,
> buffer_usage_space=buffer_usage_space@entry=0x28bbfb0,
> responseqp=responseqp@entry=0x28bbf98, pcxtp=pcxtp@entry=0x28bbf90,
> nWorkers=8) at backendworker.c:279
> #5  0x00000000005d0e75 in InitFunnel (node=node@entry=0x28bbf00,
> estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at nodeFunnel.c:61
> #6  0x00000000005d1026 in ExecInitFunnel (node=0x281e738, estate=0x28b99a8,
> eflags=17) at nodeFunnel.c:121
> #7  0x00000000005c0f95 in ExecInitNode (node=0x281e738,
> estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at execProcnode.c:201
> #8  0x00000000005cd316 in ExecInitAppend (node=<optimized out>,
> estate=0x28b99a8, eflags=17) at nodeAppend.c:168
> #9  0x00000000005c0f25 in ExecInitNode (node=0x288b990,
> estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at execProcnode.c:163
> #10 0x00000000005ce849 in ExecInitAgg (node=0x288ba28, estate=0x28b99a8,
> eflags=17) at nodeAgg.c:1580
> #11 0x00000000005c10bf in ExecInitNode (node=node@entry=0x288ba28,
> estate=estate@entry=0x28b99a8, eflags=eflags@entry=17) at execProcnode.c:302
> #12 0x00000000005bfb35 in InitPlan (queryDesc=queryDesc@entry=0x28b5868,
> eflags=eflags@entry=17) at execMain.c:939
> #13 0x00000000005bfd49 in standard_ExecutorStart (queryDesc=0x28b5868,
> eflags=17) at execMain.c:234
> #14 0x00000000005bfd95 in ExecutorStart
> (queryDesc=queryDesc@entry=0x28b5868, eflags=eflags@entry=1) at
> execMain.c:134
> #15 0x0000000000573f21 in ExplainOnePlan
> (plannedstmt=plannedstmt@entry=0x28b7878, into=into@entry=0x0,
> es=es@entry=0x24cde68, queryString=queryString@entry=0x248a398 "EXPLAIN
> SELECT DISTINCT bid FROM pgbench_accounts;", params=params@entry=0x0,
> planduration=planduration@entry=0x7fffb64f4bf0) at explain.c:478
> #16 0x0000000000574160 in ExplainOneQuery (query=<optimized out>,
> into=into@entry=0x0, es=es@entry=0x24cde68,
> queryString=queryString@entry=0x248a398 "EXPLAIN SELECT DISTINCT bid FROM
> pgbench_accounts;", params=params@entry=0x0) at explain.c:346
> #17 0x000000000057478a in ExplainQuery (stmt=stmt@entry=0x248b1b0,
> queryString=queryString@entry=0x248a398 "EXPLAIN SELECT DISTINCT bid FROM
> pgbench_accounts;", params=params@entry=0x0, dest=dest@entry=0x24cddd0) at
> explain.c:234
> #18 0x00000000006c6419 in standard_ProcessUtility (parsetree=0x248b1b0,
> queryString=0x248a398 "EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x24cddd0,
> completionTag=0x7fffb64f4d90 "") at utility.c:657
> #19 0x00000000006c6808 in ProcessUtility
> (parsetree=parsetree@entry=0x248b1b0, queryString=<optimized out>,
> context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
> dest=dest@entry=0x24cddd0, completionTag=completionTag@entry=0x7fffb64f4d90
> "") at utility.c:333
> #20 0x00000000006c3272 in PortalRunUtility (portal=portal@entry=0x24f2e28,
> utilityStmt=0x248b1b0, isTopLevel=<optimized out>,
> dest=dest@entry=0x24cddd0, completionTag=completionTag@entry=0x7fffb64f4d90
> "") at pquery.c:1188
> #21 0x00000000006c4039 in FillPortalStore (portal=portal@entry=0x24f2e28,
> isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1062
> #22 0x00000000006c4a12 in PortalRun (portal=portal@entry=0x24f2e28,
> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
> dest=dest@entry=0x248b5e8, altdest=altdest@entry=0x248b5e8,
> completionTag=completionTag@entry=0x7fffb64f4fa0 "") at pquery.c:786
> #23 0x00000000006c12c3 in exec_simple_query
> (query_string=query_string@entry=0x248a398 "EXPLAIN SELECT DISTINCT bid FROM
> pgbench_accounts;") at postgres.c:1107
> #24 0x00000000006c2de4 in PostgresMain (argc=<optimized out>,
> argv=argv@entry=0x2421c28, dbname=0x2421a90 "pgbench", username=<optimized
> out>) at postgres.c:4118
> #25 0x0000000000665c55 in BackendRun (port=port@entry=0x2447540) at
> postmaster.c:4148
> #26 0x00000000006675a8 in BackendStartup (port=port@entry=0x2447540) at
> postmaster.c:3833
> #27 0x000000000066784b in ServerLoop () at postmaster.c:1601
> #28 0x000000000066898d in PostmasterMain (argc=argc@entry=1,
> argv=argv@entry=0x2420c90) at postmaster.c:1248
> #29 0x00000000005f5a25 in main (argc=1, argv=0x2420c90) at main.c:221

That might be a different crash than the first one you showed.  But it
looks like the problem here is that the parallel sequential scan patch
is calling CreateParallelContext even though this is just an EXPLAIN
and we're not actually running the query.  It shouldn't do that.
(This might be an argument for postponing CreateParallelContext()
until run time, as I've suggested before.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Amit Kapila

Date:

21 March 2015, 14:28:46

On Fri, Mar 20, 2015 at 7:54 PM, Thom Brown <thom@linux.com> wrote:
>
> On 20 March 2015 at 13:55, Thom Brown <thom@linux.com> wrote:
> > On 20 March 2015 at 13:16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >> Thom Brown wrote:
> >>> On 18 March 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:
> >>> > On Tue, Mar 17, 2015 at 9:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>> >> On Tue, Mar 17, 2015 at 2:26 AM, Noah Misch <noah@leadboat.com> wrote:
> >>> >>> Neither that rule, nor its variant downthread, would hurt operator authors too
> >>> >>> much. To make the planner categorically parallel-safe, though, means limiting
> >>> >>> evaluate_function() to parallel-safe functions. That would dramatically slow
> >>> >>> selected queries. It's enough for the PL scenario if planning a parallel-safe
> >>> >>> query is itself parallel-safe. If the planner is parallel-unsafe when
> >>> >>> planning a parallel-unsafe query, what would suffer?
> >>> >>
> >>> >> Good point. So I guess the rule can be that planning a parallel-safe
> >>> >> query should be parallel-safe. From there, it follows that estimators
> >>> >> for a parallel-safe operator must also be parallel-safe. Which seems
> >>> >> fine.
> >>> >
> >>> > More work is needed here, but for now, here is a rebased patch, per
> >>> > Amit's request.
> >>>
> >>> This no longer applies due to changes in commit
> >>> 13dbc7a824b3f905904cab51840d37f31a07a9ef.
> >>
> >> You should be able to drop the pg_proc.h changes and run the supplied
> >> perl program. (I'm not sure that sending the patched pg_proc.h together
> >> with this patch is all that useful, really.)
> >
> > Thanks. All patches applied and building okay.
>
> Okay, breakage experienced, but not sure which thread this belongs on.
>

I think if you face the issue issue after applying parallel_seqscan patch,

then you can report on that thread and if it turns out to be something

related to other thread, then we can shift the discussion of resolution

on that thread.

> createdb pgbench
> pgbench -i -s 200 pgbench
>
> CREATE TABLE pgbench_accounts_1 (CHECK (bid = 1)) INHERITS (pgbench_accounts);
> ...
> CREATE TABLE pgbench_accounts_200 (CHECK (bid = 200)) INHERITS
> (pgbench_accounts);
>

I managed to reproduce the Assertion reported by you as:

#2 0x00000000007a053a in ExceptionalCondition (conditionName=conditionName@entry=0x813a4b "!(IsInParallelMode())", errorType=errorType@entry=0x7da1d6 "FailedAssertion", fileName=fileName@entry=0x81397d "parallel.c", lineNumber=lineNumber@entry=123) at assert.c:54
#3 0x00000000004cd5ba in CreateParallelContext (entrypoint=entrypoint@entry=0x659d2c <ParallelQueryMain>, nworkers=nworkers@entry=8) at parallel.c:123

The reason is that CreateParallelContext() expects to be called

in ParallelMode and we enter into parallel-mode after InitPlan()

in standard_ExecutorStart(). So the probable fix could be

to EnterParallelMode before initializing the plan.

I still could not reproduce the crash you have reported as:

>> #0 0x0000000000770843 in pfree ()
>> #1 0x00000000005a382f in ExecEndFunnel ()
>> #2 0x000000000059fe75 in ExecEndAppend ()
>> #3 0x00000000005920bd in standard_ExecutorEnd ()

Could you let me know which all patches you have tried

and on top of which commit.

I am trying on the commit as mentioned in mail[1]. Basically

have you tried the versions mentioned in that mail:

HEAD Commit-id : 8d1f2390
parallel-mode-v8.1.patch [2]
assess-parallel-safety-v4.patch [1]
parallel-heap-scan.patch [3]
parallel_seqscan_v11.patch (Attached with this mail)

If something else, could you let me know the same so that I can try

that to reproduce the issue reported by you.

[1]

http://www.postgresql.org/message-id/CAA4eK1JSSonzKSN=L-DWuCEWdLqkbMUjvfpE3fGW2tn2zPo2RQ@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Thom Brown

Date:

21 March 2015, 18:30:46

On 21 March 2015 at 14:28, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Mar 20, 2015 at 7:54 PM, Thom Brown <thom@linux.com> wrote:
> createdb pgbench
> pgbench -i -s 200 pgbench
>
> CREATE TABLE pgbench_accounts_1 (CHECK (bid = 1)) INHERITS (pgbench_accounts);
> ...
> CREATE TABLE pgbench_accounts_200 (CHECK (bid = 200)) INHERITS
> (pgbench_accounts);
>

I managed to reproduce the Assertion reported by you as:

#2 0x00000000007a053a in ExceptionalCondition (conditionName=conditionName@entry=0x813a4b "!(IsInParallelMode())", errorType=errorType@entry=0x7da1d6 "FailedAssertion", fileName=fileName@entry=0x81397d "parallel.c", lineNumber=lineNumber@entry=123) at assert.c:54
#3 0x00000000004cd5ba in CreateParallelContext (entrypoint=entrypoint@entry=0x659d2c <ParallelQueryMain>, nworkers=nworkers@entry=8) at parallel.c:123

The reason is that CreateParallelContext() expects to be called
in ParallelMode and we enter into parallel-mode after InitPlan()
in standard_ExecutorStart(). So the probable fix could be
to EnterParallelMode before initializing the plan.

I still could not reproduce the crash you have reported as:
>> #0 0x0000000000770843 in pfree ()
>> #1 0x00000000005a382f in ExecEndFunnel ()
>> #2 0x000000000059fe75 in ExecEndAppend ()
>> #3 0x00000000005920bd in standard_ExecutorEnd ()

Could you let me know which all patches you have tried
and on top of which commit.

I am trying on the commit as mentioned in mail[1]. Basically
have you tried the versions mentioned in that mail:

HEAD Commit-id : 8d1f2390
parallel-mode-v8.1.patch [2]
assess-parallel-safety-v4.patch [1]
parallel-heap-scan.patch [3]
parallel_seqscan_v11.patch (Attached with this mail)

If something else, could you let me know the same so that I can try
that to reproduce the issue reported by you.

Looks like one of the patches I applied is newer than the one in your list:

HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
parallel-mode-v9.patch
assess-parallel-safety-v4.patch
parallel-heap-scan.patch
parallel_seqscan_v11.patch

Thom

Re: assessing parallel-safety

From

Amit Kapila

Date:

25 March 2015, 10:35:10

On Sun, Mar 22, 2015 at 12:00 AM, Thom Brown <thom@linux.com> wrote:
>
> On 21 March 2015 at 14:28, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Mar 20, 2015 at 7:54 PM, Thom Brown <thom@linux.com> wrote:
>> > createdb pgbench
>> > pgbench -i -s 200 pgbench
>> >
>> > CREATE TABLE pgbench_accounts_1 (CHECK (bid = 1)) INHERITS (pgbench_accounts);
>> > ...
>> > CREATE TABLE pgbench_accounts_200 (CHECK (bid = 200)) INHERITS
>> > (pgbench_accounts);
>> >
>>
>> I managed to reproduce the Assertion reported by you as:
>>
>> #2 0x00000000007a053a in ExceptionalCondition (conditionName=conditionName@entry=0x813a4b "!(IsInParallelMode())", errorType=errorType@entry=0x7da1d6 "FailedAssertion", fileName=fileName@entry=0x81397d "parallel.c", lineNumber=lineNumber@entry=123) at assert.c:54
>> #3 0x00000000004cd5ba in CreateParallelContext (entrypoint=entrypoint@entry=0x659d2c <ParallelQueryMain>, nworkers=nworkers@entry=8) at parallel.c:123
>>
>> The reason is that CreateParallelContext() expects to be called
>> in ParallelMode and we enter into parallel-mode after InitPlan()
>> in standard_ExecutorStart(). So the probable fix could be
>> to EnterParallelMode before initializing the plan.
>>
>> I still could not reproduce the crash you have reported as:
>> >> #0 0x0000000000770843 in pfree ()
>> >> #1 0x00000000005a382f in ExecEndFunnel ()
>> >> #2 0x000000000059fe75 in ExecEndAppend ()
>> >> #3 0x00000000005920bd in standard_ExecutorEnd ()
>>
>>
>> Could you let me know which all patches you have tried
>> and on top of which commit.
>>
>> I am trying on the commit as mentioned in mail[1]. Basically
>> have you tried the versions mentioned in that mail:
>>
>> HEAD Commit-id : 8d1f2390
>> parallel-mode-v8.1.patch [2]
>> assess-parallel-safety-v4.patch [1]
>> parallel-heap-scan.patch [3]
>> parallel_seqscan_v11.patch (Attached with this mail)
>>
>> If something else, could you let me know the same so that I can try
>> that to reproduce the issue reported by you.
>>
>
> Looks like one of the patches I applied is newer than the one in your list:
>

Okay, then that was the reason why you were seeing the crash whereas

I could not reproduce, however I have integrated the patch with the

v-9 of parallel-mode patch and posted it on appropriate thread.

One point to note is that I think there is one small issue in one of the

latest commits [1]. After that is fixed you can once verify with latest

patch.

[1] - http://www.postgresql.org/message-id/CAA4eK1+NwUJ9ik61yGfZBcN85dQuNEvd38_h1zngCdZrGLGQTQ@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Amit Kapila

Date:

25 March 2015, 10:37:11

On Fri, Mar 20, 2015 at 11:37 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>
> That might be a different crash than the first one you showed. But it
> looks like the problem here is that the parallel sequential scan patch
> is calling CreateParallelContext even though this is just an EXPLAIN
> and we're not actually running the query. It shouldn't do that.
> (This might be an argument for postponing CreateParallelContext()
> until run time, as I've suggested before.)
>

Okay, I have postponed the CreateParallelContext() until runtime in

the latest patch posted on Parallel Seq Scan thread.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Robert Haas

Date:

22 May 2015, 15:55:38

On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown <thom@linux.com> wrote:
> Looks like one of the patches I applied is newer than the one in your list:
>
> HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
> parallel-mode-v9.patch
> assess-parallel-safety-v4.patch
> parallel-heap-scan.patch
> parallel_seqscan_v11.patch

This patch hasn't been updated for a while, so here is a rebased
version now that we're hopefully mostly done making changes to
pg_proc.h for 9.5.  parallel-mode-v10 was mostly committed, and
parallel-heap-scan has now been incorporated into the parallel_seqscan
patch.  So you should now be able to test this feature by applying
just this patch, and the new version of the parallel seqscan patch
which I am given to understand that Amit will be posting pretty soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

assess-parallel-safety-v5.patch

Re: assessing parallel-safety

From

Amit Kapila

Date:

02 July 2015, 03:20:10

On Thu, May 21, 2015 at 10:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown <thom@linux.com> wrote:
> > Looks like one of the patches I applied is newer than the one in your list:
> >
> > HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
> > parallel-mode-v9.patch
> > assess-parallel-safety-v4.patch
> > parallel-heap-scan.patch
> > parallel_seqscan_v11.patch
>
> This patch hasn't been updated for a while, so here is a rebased
> version now that we're hopefully mostly done making changes to
> pg_proc.h for 9.5. parallel-mode-v10 was mostly committed, and
> parallel-heap-scan has now been incorporated into the parallel_seqscan
> patch. So you should now be able to test this feature by applying
> just this patch, and the new version of the parallel seqscan patch
> which I am given to understand that Amit will be posting pretty soon.
>

This patch again needs rebase, but anyway I think this should be present

in the CF-1 as parallel seq scan patch is dependent on it. So I am moving

this to upcoming CF unless there is any objection for doing so.

I know that CF-1 time is already started, but just now realized that this patch

should also be present in it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Amit Kapila

Date:

03 July 2015, 12:00:37

On Thu, Jul 2, 2015 at 8:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, May 21, 2015 at 10:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown <thom@linux.com> wrote:
> > > Looks like one of the patches I applied is newer than the one in your list:
> > >
> > > HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
> > > parallel-mode-v9.patch
> > > assess-parallel-safety-v4.patch
> > > parallel-heap-scan.patch
> > > parallel_seqscan_v11.patch
> >
> > This patch hasn't been updated for a while, so here is a rebased
> > version now that we're hopefully mostly done making changes to
> > pg_proc.h for 9.5. parallel-mode-v10 was mostly committed, and
> > parallel-heap-scan has now been incorporated into the parallel_seqscan
> > patch. So you should now be able to test this feature by applying
> > just this patch, and the new version of the parallel seqscan patch
> > which I am given to understand that Amit will be posting pretty soon.
> >
>
> This patch again needs rebase.

Attached, find the rebased patch which can be used to review/test latest

version of parallel_seqscan patch which I am going to post in the parallel

seq. scan thread soonish.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

assess-parallel-safety-v6.tar.gz

Re: assessing parallel-safety

From

Amit Kapila

Date:

15 July 2015, 05:50:10

On Fri, Jul 3, 2015 at 5:30 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Attached, find the rebased patch which can be used to review/test latest
> version of parallel_seqscan patch which I am going to post in the parallel
> seq. scan thread soonish.

I went ahead and completed this patch with respect to adding new clause

in CREATE/ALTER FUNCTION which can allow users to define the

parallel property for User defined functions.

The new clause is defined as

Create Function ... PARALLEL {SAFE | RESTRICTED | UNSAFE}

Alter Function ... PARALLEL {SAFE | RESTRICTED | UNSAFE}

I have added PARALLEL as non-reserved keyword and store other

parameters as options which can be verified during CreateFunction.

Apart from this, added pg_dump support and updated the docs.

I have changed the parallel safety for some of the newly added

functions for pg_replication_origin* which will be defined as:

pg_replication_origin_create - unsafe, because it can start a transaction

pg_replication_origin_drop - unsafe, because it can start a transaction

pg_replication_origin_session_setup - unsafe, because it changes shared

memory state(ReplicationState) that is not shared among workers.

pg_replication_origin_session_reset - unsafe, because it changes shared

memory state(ReplicationState) that is not shared among workers.

pg_replication_origin_advance - unsafe, because it changes shared

memory state(ReplicationState) that is not shared among workers.

pg_replication_origin_progress - unsafe, because it can call XlogFlush

which can change shared memory state.

pg_replication_origin_session_progress - unsafe, because it can call

XlogFlush which can change shared memory state.

pg_replication_origin_xact_setup - Restricted,

because replorigin_sesssion_origin_lsn is not shared

pg_replication_origin_xact_reset - Restricted,

because replorigin_sesssion_origin_lsn is not shared

pg_replication_origin_session_is_setup - Restricted,

because replorigin_sesssion_origin is not shared

pg_show_replication_origin_status - Restricted, because

ReplicationState is not shared.

pg_replication_origin_oid - Safe

One issue which I think we should fix in this patch as pointed earlier

is, in some cases, Function containing Select stmt won't be able to

use parallel plan even though it is marked as parallel safe.

create or replace function parallel_func_select() returns integer

as $$

declare

ret_val int;

begin

Select c1 into ret_val from t1 where c1 = 1;

return ret_val;

end;

$$ language plpgsql Parallel Safe;

Above function won't be able to use parallel plan for Select statement

because of below code:

static int

exec_stmt_execsql(PLpgSQL_execstate *estate,

PLpgSQL_stmt_execsql

*stmt)

{

if (expr->plan == NULL)

{

exec_prepare_plan(estate, expr, 0);

}

As the cursorOptions passed in this function are always 0, planner

will treat it as unsafe function. Shouldn't we need to check parallelOk

to pass cursoroption in above function as is done in exec_run_select()

in patch?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

assess-parallel-safety-v7.tar.gz

Re: assessing parallel-safety

From

Robert Haas

Date:

15 July 2015, 20:32:30

On Wed, Jul 15, 2015 at 1:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> One issue which I think we should fix in this patch as pointed earlier
> is, in some cases, Function containing Select stmt won't be able to
> use parallel plan even though it is marked as parallel safe.
>
> create or replace function parallel_func_select() returns integer
> as $$
> declare
>     ret_val int;
> begin
>         Select c1 into ret_val from t1 where c1 = 1;
> return ret_val;
> end;
> $$ language plpgsql Parallel Safe;
>
> Above function won't be able to use parallel plan for Select statement
> because of below code:
>
> static int
> exec_stmt_execsql(PLpgSQL_execstate *estate,
>  PLpgSQL_stmt_execsql
> *stmt)
> {
> ..
> if (expr->plan == NULL)
> {
> ..
> exec_prepare_plan(estate, expr, 0);
> }
> ..
> }
>
> As the cursorOptions passed in this function are always 0, planner
> will treat it as unsafe function. Shouldn't we need to check parallelOk
> to pass cursoroption in above function as is done in exec_run_select()
> in patch?

exec_stmt_execsql is called by exec_stmt_open and exec_stmt_forc.
Those are cursor operations and thus - I think - parallelism can't be
used there.  I think it would be OK to enable parallelism when we are
using it to execute a toplevel SQL statement, but I didn't do it
because I wasn't entirely sure that we couldn't pass the same
stmt->sqlstmt into exec_stmt_execsql from multiple contexts, some of
which are parallel-safe and others not.  I don't think that can
happen, but my understanding of PL/pgsql is not the greatest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: assessing parallel-safety

From

Amit Kapila

Date:

16 July 2015, 03:37:12

On Thu, Jul 16, 2015 at 2:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> exec_stmt_execsql is called by exec_stmt_open and exec_stmt_forc.
> Those are cursor operations and thus - I think - parallelism can't be
> used there.

Right, but it also gets called from exec_stmt where a parallel-safe

statement could be passed to it. So it seems to me that we

should enable parallelism for that path in code.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: assessing parallel-safety

From

Amit Kapila

Date:

03 September 2015, 10:06:02

On Wed, Jul 15, 2015 at 11:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> I went ahead and completed this patch with respect to adding new clause
> in CREATE/ALTER FUNCTION which can allow users to define the
> parallel property for User defined functions.
>

Attached, find the rebased patch which can be used to review/test latest
version of parallel_seqscan patch which I am going to post in the parallel
seq. scan thread soonish.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

assess-parallel-safety-v8.tar

Re: assessing parallel-safety

From

Robert Haas

Date:

16 September 2015, 19:42:13

On Thu, Sep 3, 2015 at 6:05 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 15, 2015 at 11:20 AM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>> I went ahead and completed this patch with respect to adding new clause
>> in CREATE/ALTER FUNCTION which can allow users to define the
>> parallel property for User defined functions.
>
> Attached, find the rebased patch which can be used to review/test latest
> version of parallel_seqscan patch which I am going to post in the parallel
> seq. scan thread soonish.

I've committed this with some modifications.  There's lots of room for
improvement here, but this is a tolerable first cut.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company