Thread: Re: [COMMITTERS] pgsql: Don't generate parallel paths for rels with parallel-restricted

On Thu, Jun 9, 2016 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> Don't generate parallel paths for rels with parallel-restricted outputs.
>
> Surely this bit is wrong on its face:
>
> @@ -971,6 +980,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
>             adjust_appendrel_attrs(root,
>                                    (Node *) rel->reltarget->exprs,
>                                    appinfo);
> +       childrel->reltarget_has_non_vars = rel->reltarget_has_non_vars;
>
>         /*
>          * We have to make child entries in the EquivalenceClass data
>
> The entire point of what we are doing here is that Vars might get replaced
> with things that are not Vars.  See the comment a dozen lines above.

Well, it doesn't look wrong to me (I added it, Amit's patch lacked it)
but it might be wrong all the same.  Are you saying that
adjust_appendrel_attrs() might translate Vars into non-Vars?  If so,
then yeah, this isn't right.

> More generally, if we need such a flag (which I doubt really), perhaps
> it should be part of PathTarget rather than expecting that it can
> successfully be maintained separately?

The problem with that is that it seems to be rather invasive.  We
don't really need this information for every PathTarget we build.
Currently, at least, we just need it for the default PathTargets for
join and scan relations.

> It seems like the only reason that we would need such a flag is that
> applying has_parallel_hazard() to a Var is a bit expensive thanks to
> the typeid_is_temp() test --- but if you ask me, that test is stupid
> and should be removed.  What's the argument for supposing that a temp
> table's rowtype is any more mutable intra-query than any other rowtype?

Even if has_parallel_hazard didn't involve a typeid_is_temp() test,
walking a possibly-long linked list and recursing through everything
in side of it doesn't seem like something that's so cheap as to make
it not worth avoiding.

But in response to your specific question, as the comment immediately
before that test explains, that's not the motivation at all.  If you
run the regression tests with the attached patch and
force_parallel_mode=on, you get failures like this:

*** /Users/rhaas/pgsql/src/test/regress/expected/rowtypes.out
2016-06-09 12:16:16.000000000 -0400---
/Users/rhaas/pgsql/src/test/regress/results/rowtypes.out    2016-06-09
13:21:59.000000000 -0400
***************
*** 38,48 ****
  (1 row)

  select '(Joe,)'::fullname;    -- ok, null 2nd column
!  fullname
! ----------
!  (Joe,)
! (1 row)
!
  select '(Joe)'::fullname;     -- bad
  ERROR:  malformed record literal: "(Joe)"
  LINE 1: select '(Joe)'::fullname;
--- 38,44 ----
  (1 row)

  select '(Joe,)'::fullname;    -- ok, null 2nd column
! ERROR:  cannot access temporary tables during a parallel operation
  select '(Joe)'::fullname;     -- bad
  ERROR:  malformed record literal: "(Joe)"
  LINE 1: select '(Joe)'::fullname;

That error is coming from relation_open().  It might be possible to
find a way to nerf the check in relation_open() enough to let this
case work while making the cases that we need to fail still fail, but
it wasn't obvious to me how to do that when I posted this patch and it
still isn't.  It would certainly be a good idea to improve this at
some point - perhaps by finding a way to make access to temporary
relations safe from within a parallel query - but there was no time
for that this release cycle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 9, 2016 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It seems like the only reason that we would need such a flag is that
>> applying has_parallel_hazard() to a Var is a bit expensive thanks to
>> the typeid_is_temp() test --- but if you ask me, that test is stupid
>> and should be removed.  What's the argument for supposing that a temp
>> table's rowtype is any more mutable intra-query than any other rowtype?

> That error is coming from relation_open().  It might be possible to
> find a way to nerf the check in relation_open() enough to let this
> case work while making the cases that we need to fail still fail,

Well, yeah, you could remove it.  It's inappropriate.  The right place
for such an error check is an attempt to actually access any data within
a temp table, ie someplace in localbuf.c.  There is no reason a worker
shouldn't be allowed to look at the catalog entries for a temp table;
they're just like any other catalog entries.
        regards, tom lane



On Thu, Jun 9, 2016 at 1:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jun 9, 2016 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> It seems like the only reason that we would need such a flag is that
>>> applying has_parallel_hazard() to a Var is a bit expensive thanks to
>>> the typeid_is_temp() test --- but if you ask me, that test is stupid
>>> and should be removed.  What's the argument for supposing that a temp
>>> table's rowtype is any more mutable intra-query than any other rowtype?
>
>> That error is coming from relation_open().  It might be possible to
>> find a way to nerf the check in relation_open() enough to let this
>> case work while making the cases that we need to fail still fail,
>
> Well, yeah, you could remove it.  It's inappropriate.  The right place
> for such an error check is an attempt to actually access any data within
> a temp table, ie someplace in localbuf.c.  There is no reason a worker
> shouldn't be allowed to look at the catalog entries for a temp table;
> they're just like any other catalog entries.

That's a possibility.  Do you think it's a good idea to go making such
changes right now, with beta2 just around the corner?  Do you want to
work on it?  Are you asking me to work on it?

There's one other related thing I'm concerned about, which is that the
code in namespace.c that manages pg_temp doesn't know anything about
parallelism. So it will interpret pg_temp to mean the pg_temp_NNN
schema for its own backend ID rather than the leader's backend ID.
I'm not sure that's a problem, but I haven't thought deeply about it.

Could you answer my question about whether adjust_appendrel_attrs()
might translate Vars into non-Vars?  The code comment in that function
header doesn't seem to me to very clear about it.  I'm guessing that
the answer is yes, so maybe the line of code you're complaining about
should just say:

childrel->reltarget_has_non_vars = true;

...but that seems like it might suck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 9, 2016 at 1:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, yeah, you could remove it.  It's inappropriate.  The right place
>> for such an error check is an attempt to actually access any data within
>> a temp table, ie someplace in localbuf.c.  There is no reason a worker
>> shouldn't be allowed to look at the catalog entries for a temp table;
>> they're just like any other catalog entries.

> That's a possibility.  Do you think it's a good idea to go making such
> changes right now, with beta2 just around the corner?  Do you want to
> work on it?  Are you asking me to work on it?

I'll do it, if you don't want to.  The rowtype test in has_parallel_hazard
has made me acutely uncomfortable since I first saw it, because I don't
think it's either maintainable or adequate for its alleged purpose.
Never mind that it makes has_parallel_hazard probably several times slower
than it needs to be.

> There's one other related thing I'm concerned about, which is that the
> code in namespace.c that manages pg_temp doesn't know anything about
> parallelism. So it will interpret pg_temp to mean the pg_temp_NNN
> schema for its own backend ID rather than the leader's backend ID.
> I'm not sure that's a problem, but I haven't thought deeply about it.

Hmmm.  I'll take a look.

> Could you answer my question about whether adjust_appendrel_attrs()
> might translate Vars into non-Vars?

Yes, absolutely.  It may be that this code accidentally fails to fail
because nothing is actually looking at the flag for a childrel ...
but that's obviously not something to rely on long-term.

> The code comment in that function
> header doesn't seem to me to very clear about it.  I'm guessing that
> the answer is yes, so maybe the line of code you're complaining about
> should just say:
> childrel->reltarget_has_non_vars = true;
> ...but that seems like it might suck.

[ shrug... ]  I'm still of the opinion that we should just drop
reltarget_has_non_vars; the most charitable thing I can say about it
is that it's premature optimization.
        regards, tom lane



On Thu, Jun 9, 2016 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jun 9, 2016 at 1:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, yeah, you could remove it.  It's inappropriate.  The right place
>>> for such an error check is an attempt to actually access any data within
>>> a temp table, ie someplace in localbuf.c.  There is no reason a worker
>>> shouldn't be allowed to look at the catalog entries for a temp table;
>>> they're just like any other catalog entries.
>
>> That's a possibility.  Do you think it's a good idea to go making such
>> changes right now, with beta2 just around the corner?  Do you want to
>> work on it?  Are you asking me to work on it?
>
> I'll do it, if you don't want to.  The rowtype test in has_parallel_hazard
> has made me acutely uncomfortable since I first saw it, because I don't
> think it's either maintainable or adequate for its alleged purpose.
> Never mind that it makes has_parallel_hazard probably several times slower
> than it needs to be.

It's been on my list of things to look into for a long time, but I
don't think it's likely to make it to the top in the next week.  So if
you feel motivated to have a whack at it, that's great.  I have not
been convinced that it is entirely straightforward, but maybe it is.

>> There's one other related thing I'm concerned about, which is that the
>> code in namespace.c that manages pg_temp doesn't know anything about
>> parallelism. So it will interpret pg_temp to mean the pg_temp_NNN
>> schema for its own backend ID rather than the leader's backend ID.
>> I'm not sure that's a problem, but I haven't thought deeply about it.
>
> Hmmm.  I'll take a look.

Thanks.

>> Could you answer my question about whether adjust_appendrel_attrs()
>> might translate Vars into non-Vars?
>
> Yes, absolutely.  It may be that this code accidentally fails to fail
> because nothing is actually looking at the flag for a childrel ...
> but that's obviously not something to rely on long-term.

Yes, I think that's not likely to hold up very long at all.

<ponders>

Hmm.  I wonder if the right thing to do is to clear the child's flag
if it is currently set, but leave it clear if it already is clear.
Let me go look at how that translated_vars mapping is constructed...

>> The code comment in that function
>> header doesn't seem to me to very clear about it.  I'm guessing that
>> the answer is yes, so maybe the line of code you're complaining about
>> should just say:
>> childrel->reltarget_has_non_vars = true;
>> ...but that seems like it might suck.
>
> [ shrug... ]  I'm still of the opinion that we should just drop
> reltarget_has_non_vars; the most charitable thing I can say about it
> is that it's premature optimization.

I'm surprised by that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> There's one other related thing I'm concerned about, which is that the
>> code in namespace.c that manages pg_temp doesn't know anything about
>> parallelism. So it will interpret pg_temp to mean the pg_temp_NNN
>> schema for its own backend ID rather than the leader's backend ID.
>> I'm not sure that's a problem, but I haven't thought deeply about it.

> Hmmm.  I'll take a look.

Yeah, that was pretty broken, but I believe I've fixed it.
        regards, tom lane



<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jun 9, 2016 at 11:38 PM, Tom Lane <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br />><br />> Robert Haas <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>writes:<br />><br />> > Could you answer my
questionabout whether adjust_appendrel_attrs()<br />> > might translate Vars into non-Vars?<br />><br />>
Yes,absolutely.<br /><br />Isn't this true only for UNION ALL cases and not for inheritance child relations (at least
thatis what seems to be mentioned in comments for translated_vars in AppendRelInfo)?   If that is right, then I think
thereshouldn't be a problem w.r.t parallel plans even if adjust_appendrel_attrs() translate Vars into non-Vars, because
forUNION ALL it seems parent rels rtekind is RTE_SUBQUERY and we consider such rels as parallel unsafe (see
set_rel_consider_parallel()). So it doesn't matter even if child rels target list contains any parallel unsafe
expression,as we are not going to create parallel paths for such relations.</div><div class="gmail_quote"><br
/></div><divclass="gmail_quote"><br />With Regards,<br />Amit Kapila.<br />EnterpriseDB: <a
href="http://www.enterprisedb.com/"target="_blank">http://www.enterprisedb.com</a><br /></div></div></div> 
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Thu, Jun 9, 2016 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Could you answer my question about whether adjust_appendrel_attrs()
>>> might translate Vars into non-Vars?

>> Yes, absolutely.

> Isn't this true only for UNION ALL cases and not for inheritance child
> relations (at least that is what seems to be mentioned in comments
> for translated_vars in AppendRelInfo)?

Correct.

> If that is right, then I think
> there shouldn't be a problem w.r.t parallel plans even if
> adjust_appendrel_attrs() translate Vars into non-Vars, because for UNION
> ALL it seems parent rels rtekind is RTE_SUBQUERY and we consider such rels
> as parallel unsafe (see set_rel_consider_parallel()).

Hm, that would explain why you haven't been able to exhibit a bug on HEAD
as it stands; but it doesn't make the code any less broken on its own
terms, or any less of a risk for future bugs when we try to relax the
no-subqueries restriction.

I am still of the opinion that reltarget_has_non_vars is simply a bad
idea.  It's wrong as-committed, it will be a permanent maintenance hazard,
and there is no evidence that it saves anything meaningful even as the
code stood yesterday, let alone after cae1c788b.  I think we should just
remove it and allow those has_parallel_hazard calls to occur
unconditionally, and will go do that unless I hear some really good
argument to the contrary.
        regards, tom lane



On Fri, Jun 10, 2016 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Thu, Jun 9, 2016 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> Could you answer my question about whether adjust_appendrel_attrs()
>>>> might translate Vars into non-Vars?
>
>>> Yes, absolutely.
>
>> Isn't this true only for UNION ALL cases and not for inheritance child
>> relations (at least that is what seems to be mentioned in comments
>> for translated_vars in AppendRelInfo)?
>
> Correct.
>
>> If that is right, then I think
>> there shouldn't be a problem w.r.t parallel plans even if
>> adjust_appendrel_attrs() translate Vars into non-Vars, because for UNION
>> ALL it seems parent rels rtekind is RTE_SUBQUERY and we consider such rels
>> as parallel unsafe (see set_rel_consider_parallel()).
>
> Hm, that would explain why you haven't been able to exhibit a bug on HEAD
> as it stands; but it doesn't make the code any less broken on its own
> terms, or any less of a risk for future bugs when we try to relax the
> no-subqueries restriction.
>
> I am still of the opinion that reltarget_has_non_vars is simply a bad
> idea.  It's wrong as-committed, it will be a permanent maintenance hazard,
> and there is no evidence that it saves anything meaningful even as the
> code stood yesterday, let alone after cae1c788b.  I think we should just
> remove it and allow those has_parallel_hazard calls to occur
> unconditionally, and will go do that unless I hear some really good
> argument to the contrary.

OK.  We can always revisit it another time if need be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Jun 9, 2016 at 8:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> There's one other related thing I'm concerned about, which is that the
>>> code in namespace.c that manages pg_temp doesn't know anything about
>>> parallelism. So it will interpret pg_temp to mean the pg_temp_NNN
>>> schema for its own backend ID rather than the leader's backend ID.
>>> I'm not sure that's a problem, but I haven't thought deeply about it.
>
>> Hmmm.  I'll take a look.
>
> Yeah, that was pretty broken, but I believe I've fixed it.

Thanks.  Looks reasonable on a quick once-over.

For the record, I think much of what constitutes "broken" is arbitrary
here - anything that doesn't work can be labelled "well, that's not
supported under parallel query, label your function
parallel-restricted or parallel-unsafe".  And I think we're going to
have to take exactly that approach in many cases.  I have no illusions
that the current infrastructure covers everything that users will want
to do, and I think we'll be uncovering and removing limitations of
this sort for a long time.  But clearly the more state we synchronize,
the happier users will be.  I probably should have done something like
what you did here sooner, but I didn't think it could be done that
simply.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, Jun 10, 2016 at 8:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Thu, Jun 9, 2016 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Robert Haas <robertmhaas@gmail.com> writes:
> >>> Could you answer my question about whether adjust_appendrel_attrs()
> >>> might translate Vars into non-Vars?
>
> >> Yes, absolutely.
>
> > Isn't this true only for UNION ALL cases and not for inheritance child
> > relations (at least that is what seems to be mentioned in comments
> > for translated_vars in AppendRelInfo)?
>
> Correct.
>
> > If that is right, then I think
> > there shouldn't be a problem w.r.t parallel plans even if
> > adjust_appendrel_attrs() translate Vars into non-Vars, because for UNION
> > ALL it seems parent rels rtekind is RTE_SUBQUERY and we consider such rels
> > as parallel unsafe (see set_rel_consider_parallel()).
>
> Hm, that would explain why you haven't been able to exhibit a bug on HEAD
> as it stands;
>

Right, so I have moved "Failed assertion in parallel worker (ExecInitSubPlan)" item to CLOSE_WAIT state as I don't think there is any known pending issue in that item.  I have moved it to CLOSE_WAIT state because we have derived our queries to reproduce the problem based on original report[1].  If next run of sqlsmith doesn't show any problem in this context then we will move it to resolved.


Amit Kapila writes:

> I have moved it to CLOSE_WAIT state because we have derived our
> queries to reproduce the problem based on original report[1].  If next
> run of sqlsmith doesn't show any problem in this context then we will
> move it to resolved.

I don't have access to my testing horse power this weekend so I can
report on tuesday at the earliest.  Unless someone else feels like
running sqlsmith…

regards,
Andreas
--
SQLsmith error of the day: time zone "Bruce Momjian" not recognized.



On Sat, Jun 11, 2016 at 2:07 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>
> Amit Kapila writes:
>
> > I have moved it to CLOSE_WAIT state because we have derived our
> > queries to reproduce the problem based on original report[1].  If next
> > run of sqlsmith doesn't show any problem in this context then we will
> > move it to resolved.
>
> I don't have access to my testing horse power this weekend so I can
> report on tuesday at the earliest.  Unless someone else feels like
> running sqlsmith…
>

No issues and many thanks for testing.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Amit Kapila writes:

> Right, so I have moved "Failed assertion in parallel worker
> (ExecInitSubPlan)" item to CLOSE_WAIT state as I don't think there is any
> known pending issue in that item.  I have moved it to CLOSE_WAIT state
> because we have derived our queries to reproduce the problem based on
> original report[1].  If next run of sqlsmith doesn't show any problem in
> this context then we will move it to resolved.

It ran for about 100e6 queries by now without tripping on any parallel
worker related assertions.  I tested with min_parallel_relation_size_v1.patch
applied and set to 64kB to have more exposure during testing.

regards,
Andreas