Thread: FailedAssertion on partprune

FailedAssertion on partprune

From

Jaime Casanova

Date:

24 July 2018, 07:55:52

Hi,

I was trying sqlsmith on REL_11_STABLE (commit
1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at
hand, and got an assertion failure.
It seems to happen during planning on prunning time but only when
tables get bigger than certain size.

I configured it with "--enable-debug --enable-depend
--enable-cassert", attached is the backtrace and a script to reproduce
the problem.

--
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FailedAssertion on partprune

From

David Rowley

Date:

24 July 2018, 10:16:24

On 24 July 2018 at 16:55, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:
> I was trying sqlsmith on REL_11_STABLE (commit
> 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at
> hand, and got an assertion failure.
> It seems to happen during planning on prunning time but only when
> tables get bigger than certain size.
>
> I configured it with "--enable-debug --enable-depend
> --enable-cassert", attached is the backtrace and a script to reproduce
> the problem.

It looks like an AppendPath which has a non-NIL partitioned_rels List
won't always only contain leaf partitions as subpaths.  The plan that
the planner is trying to make here is:

https://explain.depesz.com/s/26zj

In this case, the 2nd to 5th Append subplan's parents are actually
partitioned tables rather than leaf partitions.
make_partition_pruneinfo() assumes these will always be leaf
partitions. Looking more closely at accumulate_append_subpath() I can
see that there are cases where it won't pull sub-Append paths into the
main Append. That's what we're seeing in this plan.

The fix is either to have make_partition_pruneinfo() return NIL to
disable run-time pruning when any of the subpaths are not a leaf
partition (it might be a good safety precaution regardless), or we
could maybe think harder about what can be done in
accumulate_append_subpath() to flatten the Append/MergeAppend paths in
this case.

I've attached a patch for the former.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

run-time_prune_tablesample_fix.patch

Re: FailedAssertion on partprune

From

Tom Lane

Date:

24 July 2018, 16:46:03

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 24 July 2018 at 16:55, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:
>> I was trying sqlsmith on REL_11_STABLE (commit
>> 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at
>> hand, and got an assertion failure.

> In this case, the 2nd to 5th Append subplan's parents are actually
> partitioned tables rather than leaf partitions.
> make_partition_pruneinfo() assumes these will always be leaf
> partitions. Looking more closely at accumulate_append_subpath() I can
> see that there are cases where it won't pull sub-Append paths into the
> main Append. That's what we're seeing in this plan.

Hm, wouldn't this be fixed by your pending patch at
<CAKJS1f_eYwHk2x0xX7qW42rV_GRsJGBMe3AqN9MYLRSs1S+CiA@mail.gmail.com>
?

(I apologize for having seemingly dropped the ball on that thread;
RL has intruded on my time a bit lately.  I hope to get it dealt with
next week.)

            regards, tom lane

Re: FailedAssertion on partprune

From

David Rowley

Date:

24 July 2018, 17:14:34

On 25 July 2018 at 01:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm, wouldn't this be fixed by your pending patch at
> <CAKJS1f_eYwHk2x0xX7qW42rV_GRsJGBMe3AqN9MYLRSs1S+CiA@mail.gmail.com>
> ?

It's a different issue. I coded run-time pruning with the incorrect
assumption that we only get leaf partitions under an Append which have
a non-empty partitioned_rels List.  The other patch fixes it to
supported mixed hierarchies from UNION ALLs. It'll still trip up on
anything apart from leaf partitions being in the subpaths list.

Thinking again about the patch I submitted upthread; I wonder if it's
actually possible to support pruning with Jamie's query. Without
looking at the code, I don't quite see the reason that the
sub-partitioned table wouldn't be correctly pruned by the run-time
pruning code.  It could just be a matter of removing the failing
Assert(). I'll do a bit more testing and confirm.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: FailedAssertion on partprune

From

Alvaro Herrera

Date:

31 July 2018, 02:25:54

On 2018-Jul-25, David Rowley wrote:

> Thinking again about the patch I submitted upthread; I wonder if it's
> actually possible to support pruning with Jamie's query. Without
> looking at the code, I don't quite see the reason that the
> sub-partitioned table wouldn't be correctly pruned by the run-time
> pruning code.  It could just be a matter of removing the failing
> Assert(). I'll do a bit more testing and confirm.

Not looking at the code right now either, but removing that assert and
then removing the TABLESAMPLE clause, the query returns identical
results with and without pruning, so maybe you're right.  No time for
further looking now.

(SELECT 'Jaime' <> 'Jamie' COLLATE es_EC)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FailedAssertion on partprune

From

David Rowley

Date:

01 August 2018, 17:35:09

On 31 July 2018 at 11:25, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Not looking at the code right now either, but removing that assert and
> then removing the TABLESAMPLE clause, the query returns identical
> results with and without pruning, so maybe you're right.  No time for
> further looking now.

I thought about this a bit more and it seems fine just to remove the
Assert. The patch that's pending in [1] adds all the correct handling
for subplans that don't belong in any partition hierarchy. That's not
the case for Jaime's plan, but those sub-partitioned tables will be
identified just like any other partition by the run-time pruning code
and can be pruned in the same way.

The attached patch removes the Assert, but I think it should be
probably be done as part of [1]'s patch since that's also adding the
code to handle subplans for tables that don't belong in the partition
hierarchy.

I also spent a bit of time trying to create a simple test case for
this and I've discovered that it's really not very easy and I have
doubts about how stable such a plan might be.

> (SELECT 'Jaime' <> 'Jamie' COLLATE es_EC)

Apologies. It was a finger auto-pilot malfunction.

[1] https://www.postgresql.org/message-id/CAKJS1f_eYwHk2x0xX7qW42rV_GRsJGBMe3AqN9MYLRSs1S%2BCiA%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

remove_incorrect_partprune_assert.patch

Re: FailedAssertion on partprune

From

Tom Lane

Date:

02 August 2018, 20:54:14

[ bringing this discussion back to the original thread ]

David Rowley <david.rowley@2ndquadrant.com> writes:
> The attached patch removes the Assert, but I think it should be
> probably be done as part of [1]'s patch since that's also adding the
> code to handle subplans for tables that don't belong in the partition
> hierarchy.

Over in that thread, we had

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 2 August 2018 at 11:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I didn't include this change because (a) it's late, (b) no test
>> case was included, and (c) I don't entirely believe it anyway.
>> How would a rel be both leaf and nonleaf?  Isn't this indicative
>> of a problem somewhere upstream in the planner?

> It's probably best discussed on the other thread, but it seems to be
> by design in accumulate_append_subpath(). Parallel Append nodes don't
> get flattened if they contain a mix of parallel aware and non-parallel
> aware subplans.

Yeah, looking at the explain posted upthread, the issue is specifically
that some of the child paths might be for Gathers over subsets of the
partitioning hierarchy.  It's not real clear to me whether such a subset
would necessarily correspond to a subtree of the hierarchy, but if it
does, you'd want to be able to apply run-time pruning at the parent
Append, in hopes of not having to execute the Gather at all.  Separately
from that, if you do execute the Gather, applying runtime pruning within
its child Append is also a good thing.

So what's unclear to me is whether removing that Assert is sufficient
to make all of that work nicely.  I'm concerned in particular that the
planner might get confused about which pruning tests to apply at each
level.  (EXPLAIN isn't a very adequate tool for testing this, since
it fails to show anything about what pruning tests are attached to an
Append.  I wonder if we need to create something that does show that.)

I think it'd be a real good idea to have a test case to exercise this.
Jaime's presented test case is too expensive to consider as a regression
test, but I bet that a skinnied-down version could be made to work
once you'd set all the parallelization parameters to liberal values,
like in select_parallel.sql.

            regards, tom lane

Re: FailedAssertion on partprune

From

Robert Haas

Date:

02 August 2018, 20:56:19

On Thu, Aug 2, 2018 at 1:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (EXPLAIN isn't a very adequate tool for testing this, since
> it fails to show anything about what pruning tests are attached to an
> Append.  I wonder if we need to create something that does show that.)

I've asked for that more than once and still think it's a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

Robert Haas

Date:

02 August 2018, 22:53:41

David Rowley <david.rowley@2ndquadrant.com> writes:
>> It's probably best discussed on the other thread, but it seems to be
>> by design in accumulate_append_subpath(). Parallel Append nodes don't
>> get flattened if they contain a mix of parallel aware and non-parallel
>> aware subplans.

I don't really understand the issue at hand, but let me just make a
comment about accumulate_append_subpath().  If we have a regular
Append on top of another regular Append or on top of a MergeAppend, we
can flatten the lower Merge(Append) into the upper one.  No problem.
If, however, the lower path is a Parallel Append, we can't.  Consider
this plan:

Gather
-> Append
  -> Partial Subpath #1
  -> Parallel Append
    -> Partial Subpath #2
    -> Non-partial Subpath

Making Partial Subpath #2 a child of the Append rather than the
Parallel Append would be correct; the only downside is that we'd lose
the nice worker-balancing stuff that the Parallel Append does.
However, making Non-partial Subpath a child of the Append rather than
the Parallel Append would be dead wrong, because the Parallel Append
will make sure that it only gets executed by one of the cooperating
processes, and the regular Append won't.  If the upper Append were
changed to a Parallel Append, then we could flatten the whole thing
just fine, as long as we're careful to keep partial paths and
non-partial paths separated.  Also, if the situation is reversed, with
an upper Parallel Append and a regular Append beneath it, that's
always flatten-able.

> Yeah, looking at the explain posted upthread, the issue is specifically
> that some of the child paths might be for Gathers over subsets of the
> partitioning hierarchy.  It's not real clear to me whether such a subset
> would necessarily correspond to a subtree of the hierarchy, but if it
> does, you'd want to be able to apply run-time pruning at the parent
> Append, in hopes of not having to execute the Gather at all.  Separately
> from that, if you do execute the Gather, applying runtime pruning within
> its child Append is also a good thing.

That sounds right to me.  If I'm not mistaken, in that EXPLAIN, the
radicado2012 and radicado2017 partitions are ineligible for parallel
query for some reason, maybe a reloption setting, but the other
partitions are all eligible, or rather, their subpartitions are
eligible.  If they were all OK for parallel query, the plan would
probably just end up with Gather over Parallel Append over individual
sample scans.

In general, I think the only ways to get an Append are inheritance,
partitioning, and set operations.  If an upper Append is due to
partitioning or inheritance, then I think that any (Merge)Append nodes
beneath it must be a subtree of the partitioning or inheritance
hierarchy, respectively.  Partitioning and inheritance can't be mixed,
and set operations can't occur below partition expansion.  However, if
the upper Append was created by a set operation, then its child Append
nodes could be from any of those causes.  For instance, it seems
possible to imagine an upper Append that is implementing EXCEPT and a
lower Append that is implementing both partition expansion and UNION
ALL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

David Rowley

Date:

03 August 2018, 03:39:59

On 3 August 2018 at 05:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, looking at the explain posted upthread, the issue is specifically
> that some of the child paths might be for Gathers over subsets of the
> partitioning hierarchy.  It's not real clear to me whether such a subset
> would necessarily correspond to a subtree of the hierarchy, but if it
> does, you'd want to be able to apply run-time pruning at the parent
> Append, in hopes of not having to execute the Gather at all.  Separately
> from that, if you do execute the Gather, applying runtime pruning within
> its child Append is also a good thing.

Yeah, that "good thing" was the reason I changed my mind from aborting
run-time pruning in my first patch idea to allowing it because if
pruning says so, it's fine to remove an entire sub-partitioned
hierarchy.

If we look at the Assert that's in question.

/* Same rel cannot be both leaf and non-leaf */
Assert(relid_subplan_map[rti] == 0);

If this fails then the subplan path that was found must have a parent
that's a partitioned table. If we're worried about those containing
subplans which are not partitions then we can probably add some
Asserts somewhere else.

One place that we could do that with a bit of cover would be inside
the bms_num_members(allmatchedsubplans) < list_length(subpaths)
condition, we could do:

Assert(root->simple_rte_array[parentrel->relid]->relkind !=
   RELKIND_PARTITIONED_TABLE);

There should never be any other_subplans when the parent is a partitioned table.

I've attached a patch to demonstrate what I mean here.

> So what's unclear to me is whether removing that Assert is sufficient
> to make all of that work nicely.  I'm concerned in particular that the
> planner might get confused about which pruning tests to apply at each
> level.  (EXPLAIN isn't a very adequate tool for testing this, since
> it fails to show anything about what pruning tests are attached to an
> Append.  I wonder if we need to create something that does show that.)

Pruning is only ever applied at the top-level when the RelOptInfo is a
RELOPT_BASEREL.  It might well be that we want to reconsider that now
that we've seen some plans where the subpaths are not always pulled
into the top-level [Merge]Append path, but as of now I don't quite see
how this could get broken. We're just pruning with a coarser
granularity.

> I think it'd be a real good idea to have a test case to exercise this.
> Jaime's presented test case is too expensive to consider as a regression
> test, but I bet that a skinnied-down version could be made to work
> once you'd set all the parallelization parameters to liberal values,
> like in select_parallel.sql.

Yeah, I did spend a bit of time trying to cut down that test and
failed at it. The fact that it seemed so hard to do that didn't give
me much confidence that the plan produced by the test would be very
stable, but perhaps we can just deal with that if we see its unstable.

I'll try again with the test.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

run-time_prune_asserts_for_discussion.patch

Re: FailedAssertion on partprune

From

David Rowley

Date:

03 August 2018, 03:42:21

On 3 August 2018 at 07:53, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't really understand the issue at hand, but let me just make a
> comment about accumulate_append_subpath().  If we have a regular
> Append on top of another regular Append or on top of a MergeAppend, we
> can flatten the lower Merge(Append) into the upper one.  No problem.
> If, however, the lower path is a Parallel Append, we can't.  Consider
> this plan:
>
> Gather
> -> Append
>   -> Partial Subpath #1
>   -> Parallel Append
>     -> Partial Subpath #2
>     -> Non-partial Subpath
>
> Making Partial Subpath #2 a child of the Append rather than the
> Parallel Append would be correct; the only downside is that we'd lose
> the nice worker-balancing stuff that the Parallel Append does.
> However, making Non-partial Subpath a child of the Append rather than
> the Parallel Append would be dead wrong, because the Parallel Append
> will make sure that it only gets executed by one of the cooperating
> processes, and the regular Append won't.  If the upper Append were
> changed to a Parallel Append, then we could flatten the whole thing
> just fine, as long as we're careful to keep partial paths and
> non-partial paths separated.  Also, if the situation is reversed, with
> an upper Parallel Append and a regular Append beneath it, that's
> always flatten-able.

Wouldn't that code have more flexibility to flatten the Append if it
were to, instead of looking at the Append's subpaths, look at the
subpath's Parent RelOptInfo paths and just use the Append and
MergeAppend as a container to identify the relations that must be
included, rather than the paths that should be?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: FailedAssertion on partprune

From

Robert Haas

Date:

03 August 2018, 16:25:50

On Thu, Aug 2, 2018 at 8:42 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Wouldn't that code have more flexibility to flatten the Append if it
> were to, instead of looking at the Append's subpaths, look at the
> subpath's Parent RelOptInfo paths and just use the Append and
> MergeAppend as a container to identify the relations that must be
> included, rather than the paths that should be?

Well, we actually start with a list of RelOptInfos, take the cheapest
path from each whether partial or non-partial, and then build a
Parallel Append out of the result, so there's no need to work backward
from the chosen path to the Parallel Append.  As to whether it's ever
better to take a path other than the cheapest to facilitate
flattening, I'm not sure.  At the moment, I can't think of a scenario
in which that would pay off, but there may be one.  I think that if
we're using Parallel Append at all, we're likely going to use it
throughout the plan tree, in which case everything will get flattened
just fine.  There might well be corner cases where that isn't so,
though, and in some of those flattening may be advantageous, but I
can't think of anything really clear and obvious.  You might have
better ideas than I do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

Tom Lane

Date:

09 August 2018, 00:52:28

I spent some more time poking at Jaime's example.  I can reduce the
problem query to this and still get the Assert crash:

select random()
from radicado tablesample bernoulli (9.7)
where radi_texto = radi_inst_actu
limit 33;

None of the elements of this query can be removed without causing the
Assert to go away, which is weird because none of them have any apparent
connection to partition pruning.

With the Assert taken out, we find that this is the plan that's causing
the problem:

                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..919.71 rows=33 width=8)
   ->  Append  (cost=0.00..12792.40 rows=459 width=8)
         ->  Sample Scan on radicado2012  (cost=0.00..2939.18 rows=93 width=8)
               Sampling: bernoulli ('9.7'::real)
               Filter: (radi_texto = radi_inst_actu)
         ->  Gather  (cost=1000.00..2485.72 rows=93 width=8)
               Workers Planned: 2
               ->  Parallel Append  (cost=0.00..1476.18 rows=39 width=0)
                     ->  Sample Scan on radicado2013_part00  (cost=0.00..1475.99 rows=47 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
                     ->  Sample Scan on radicado2013_part01  (cost=0.00..1461.88 rows=46 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
         ->  Gather  (cost=1000.00..2491.15 rows=93 width=8)
               Workers Planned: 2
               ->  Parallel Append  (cost=0.00..1481.62 rows=39 width=0)
                     ->  Sample Scan on radicado2014_part00  (cost=0.00..1481.42 rows=47 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
                     ->  Sample Scan on radicado2014_part01  (cost=0.00..1464.05 rows=46 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
         ->  Gather  (cost=1000.00..2482.47 rows=93 width=8)
               Workers Planned: 2
               ->  Parallel Append  (cost=0.00..1472.93 rows=39 width=0)
                     ->  Sample Scan on radicado2015_part00  (cost=0.00..1472.74 rows=47 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
                     ->  Sample Scan on radicado2015_part01  (cost=0.00..1454.28 rows=46 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
         ->  Gather  (cost=1000.00..2380.72 rows=86 width=8)
               Workers Planned: 2
               ->  Parallel Append  (cost=0.00..1371.90 rows=36 width=0)
                     ->  Sample Scan on radicado2016_part00  (cost=0.00..1371.72 rows=43 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
                     ->  Sample Scan on radicado2016_part01  (cost=0.00..1365.21 rows=43 width=0)
                           Sampling: bernoulli ('9.7'::real)
                           Filter: (radi_texto = radi_inst_actu)
         ->  Sample Scan on radicado2017  (cost=0.00..10.87 rows=1 width=8)
               Sampling: bernoulli ('9.7'::real)
               Filter: (radi_texto = radi_inst_actu)
(44 rows)

Now that seems to me to be a rather weird plan: why doesn't it prefer
to flatten everything into one parallel append?  Indeed, if you take
out any of the remaining query parts such as the LIMIT, that's what
it does.  I think that its willingness to do this is actually kind
of a bug, because this query is going to be a total disaster in terms
of the number of workers it will try to use --- way more than the
user would expect given max_parallel_workers_per_gather = 2.

In any case, I now understand David's concern about creating a
usable regression test case that produces a plan like this.
It seems to depend on a very corner-case-y and possibly buggy
set of costing behaviors.

So, while I'd be willing to go ahead and commit the Assert-removal, this
actually increases my concern about whether a correct set of pruning
instructions get generated in cases like this, because I now feel fairly
confident in saying that we haven't tested the logic for such cases *at
all*.  (Note that Jaime's query doesn't have any pruning steps after the
dust settles --- if make_partition_pruneinfo doesn't crash, it eventually
figures out that there are no quals usable for pruning.  So the fact that
it goes through without the Assert does nothing to assuage my fear.)

I think we'd be well advised to either change the logic to prohibit
multi-level pruning plans from being generated, or do something to
allow forcing them to be generated for testing purposes.

            regards, tom lane

Re: FailedAssertion on partprune

From

Tom Lane

Date:

09 August 2018, 06:33:17

I wrote:
> Now that seems to me to be a rather weird plan: why doesn't it prefer
> to flatten everything into one parallel append?  Indeed, if you take
> out any of the remaining query parts such as the LIMIT, that's what
> it does.  I think that its willingness to do this is actually kind
> of a bug, because this query is going to be a total disaster in terms
> of the number of workers it will try to use --- way more than the
> user would expect given max_parallel_workers_per_gather = 2.

Oh ... never mind that last.  The parent Append will run its children
sequentially, so that the Gathers execute one at a time, and at no
point will more than 2 workers be active.

Nonetheless, it's a damfool query plan, because we'll be going through
parallel worker startup/shutdown overhead 4 times for no benefit.
We really should put in some kind of logic to force those sibling
Gathers to be aggregated into one, or else disallow them altogether.

            regards, tom lane

Re: FailedAssertion on partprune

From

David Rowley

Date:

11 August 2018, 16:16:13

On 9 August 2018 at 15:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Nonetheless, it's a damfool query plan, because we'll be going through
> parallel worker startup/shutdown overhead 4 times for no benefit.
> We really should put in some kind of logic to force those sibling
> Gathers to be aggregated into one, or else disallow them altogether.

I started debugging this to see where things go wrong. I discovered
that add_paths_to_append_rel() is called yet again from
apply_scanjoin_target_to_paths() and this is where it's all going
wrong. The problem is that the gather paths have been tagged onto the
partial paths by this time, so accumulate_append_subpath() has no code
to look through those to fetch the Append/MergeAppend subpaths, so it
just appends the entire path to the subpaths List.  This all worked
before 96030f9a481. This commit moved where generate_gather_paths() is
called.

I'm having a hard time understanding why we need to call
add_paths_to_append_rel() from apply_scanjoin_target_to_paths().  The
name of the function does not seem to imply that we'd do this. The
comment just explains "what" rather than "why" making it a bit
useless.

/* Build new paths for this relation by appending child paths. */
if (live_children != NIL)
add_paths_to_append_rel(root, rel, live_children);

I also think that accumulate_append_subpath() is a bit of a fragile
way of flattening the append rel's paths, but I feel it's probably
apply_scanjoin_target_to_paths() that's at fault here.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: FailedAssertion on partprune

From

Robert Haas

Date:

13 August 2018, 21:52:45

On Wed, Aug 8, 2018 at 11:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oh ... never mind that last.  The parent Append will run its children
> sequentially, so that the Gathers execute one at a time, and at no
> point will more than 2 workers be active.

Yep.

> Nonetheless, it's a damfool query plan, because we'll be going through
> parallel worker startup/shutdown overhead 4 times for no benefit.
> We really should put in some kind of logic to force those sibling
> Gathers to be aggregated into one, or else disallow them altogether.

Disallowing them could be a big performance regression.  Combining
them is reasonable, but it's not clear to me how you'd fit that into
the planner structure.  There's no convenient RelOptInfo to associate
with the aggregated-Gather.  It can't use the parent RelOptInfo unless
all of the children have a partial path, and in that case this plan
never would have been generated in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

Robert Haas

Date:

14 August 2018, 00:23:22

On Sat, Aug 11, 2018 at 9:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I started debugging this to see where things go wrong. I discovered
> that add_paths_to_append_rel() is called yet again from
> apply_scanjoin_target_to_paths() and this is where it's all going
> wrong. The problem is that the gather paths have been tagged onto the
> partial paths by this time, so accumulate_append_subpath() has no code
> to look through those to fetch the Append/MergeAppend subpaths, so it
> just appends the entire path to the subpaths List.  This all worked
> before 96030f9a481. This commit moved where generate_gather_paths() is
> called.

I'm baffled as to why looking through Gather to find
Append/MergeAppend subpaths would ever be a sane thing to do.

> I'm having a hard time understanding why we need to call
> add_paths_to_append_rel() from apply_scanjoin_target_to_paths().  The
> name of the function does not seem to imply that we'd do this. The
> comment just explains "what" rather than "why" making it a bit
> useless.
>
> /* Build new paths for this relation by appending child paths. */
> if (live_children != NIL)
> add_paths_to_append_rel(root, rel, live_children);
>
> I also think that accumulate_append_subpath() is a bit of a fragile
> way of flattening the append rel's paths, but I feel it's probably
> apply_scanjoin_target_to_paths() that's at fault here.

In my original design, apply_scanjoin_target_to_paths() -- or whatever
I called it in the original patch versions -- built an entirely new
RelOptInfo, so that we ended up with one RelOptInfo representing the
unvarnished result of scan/join planning, and a second one
representing the result of applying the scan/join target to that
result.  However, Amit Kapila was able to demonstrate that this caused
a small but noticeable regression in planning speed, which seemed like
it might plausibly cause some issues for people running very simple
queries.

In that original design, if the topmost scan/join rel was partitioned,
the new "tlist" upper rel was also partitioned, and in the same way.
In short, this was a kind of "partitionwise tlist" feature.  For each
child of the topmost scan/join rel, we built a child "tlist" rel,
which got the same paths but with the correct tlist applied to each
one.  The path for the toplevel "tlist" upper rel could then be built
by appending the children from each child "tlist" rel, or else by the
usual method of sticking a Result node over the cheapest path for the
topmost scan/join rel.  In general, the first method is superior.
Instead of a plan like Result -> Append -> (N children each producing
the unprojected tlist), you get Append -> (N children each producing
the projected tlist), which is cheaper.

To avoid the performance overhead of creating a whole new tree of
upper rels, I rewrote the code so that it modifies the RelOptInfos in
place.  That unfortunately makes the logic a bit more confusing, and
it sounds from your remarks like I didn't comment it as clearly as
might have been possible.  But the basic idea is the same: we want the
projection to be passed down to the child nodes rather than getting a
Result node on top.  The commit that did most of this --
11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 -- also solved a few other
problems, as noted in the commit message, so I don't think we want to
rip it out wholesale.  There might be better ways to do some of it,
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

Alvaro Herrera

Date:

16 August 2018, 19:36:29

On 2018-Aug-13, Robert Haas wrote:

> In my original design, apply_scanjoin_target_to_paths() -- or whatever
> I called it in the original patch versions -- built an entirely new
> RelOptInfo, so that we ended up with one RelOptInfo representing the
> unvarnished result of scan/join planning, and a second one
> representing the result of applying the scan/join target to that
> result.  However, Amit Kapila was able to demonstrate that this caused
> a small but noticeable regression in planning speed, which seemed like
> it might plausibly cause some issues for people running very simple
> queries.
> 
> In that original design, if the topmost scan/join rel was partitioned,
> the new "tlist" upper rel was also partitioned, and in the same way.
> In short, this was a kind of "partitionwise tlist" feature.  For each
> child of the topmost scan/join rel, we built a child "tlist" rel,
> which got the same paths but with the correct tlist applied to each
> one.  The path for the toplevel "tlist" upper rel could then be built
> by appending the children from each child "tlist" rel, or else by the
> usual method of sticking a Result node over the cheapest path for the
> topmost scan/join rel.  In general, the first method is superior.
> Instead of a plan like Result -> Append -> (N children each producing
> the unprojected tlist), you get Append -> (N children each producing
> the projected tlist), which is cheaper.

I have to admit I have a hard time understanding this stuff.  I don't
know what a "scan/join target" is (or scan/join relation for that
matter) ... and this "tlist relation" thing is an entirely new concept
to me.

I agree that pushing down the projection looks worthwhile.

> To avoid the performance overhead of creating a whole new tree of
> upper rels, I rewrote the code so that it modifies the RelOptInfos in
> place.  That unfortunately makes the logic a bit more confusing, and
> it sounds from your remarks like I didn't comment it as clearly as
> might have been possible.  But the basic idea is the same: we want the
> projection to be passed down to the child nodes rather than getting a
> Result node on top.  The commit that did most of this --
> 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 -- also solved a few other
> problems, as noted in the commit message, so I don't think we want to
> rip it out wholesale.  There might be better ways to do some of it,
> though.

So,
1. I'm not sure that partition pruning devs are on the hook for
   producing a test case for the problem originally reported,
   since we're clearly dealing with a wacko plan,
2. I wonder if this should be considered an open item related to commit
   11cf92f6e2e1 instead of partprune.

The other option is to ignore this whole affair, mark this as solved in
the open items list, and move on.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FailedAssertion on partprune

From

Alvaro Herrera

Date:

16 August 2018, 19:37:57

On 2018-Jul-23, Jaime Casanova wrote:


> I was trying sqlsmith on REL_11_STABLE (commit
> 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at
> hand, and got an assertion failure.
> It seems to happen during planning on prunning time but only when
> tables get bigger than certain size.

Hi Jaime

Did you try your luck running sqlsmith after this fix?  It'd be
interesting to know if there are further failures.

Thanks for the report,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FailedAssertion on partprune

From

Robert Haas

Date:

16 August 2018, 21:52:10

On Thu, Aug 16, 2018 at 12:36 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I have to admit I have a hard time understanding this stuff.  I don't
> know what a "scan/join target" is (or scan/join relation for that
> matter) ... and this "tlist relation" thing is an entirely new concept
> to me.

Well, I invented the "tlist relation" concept, and it never actually
got committed, so it's not surprising that it's new to you.

The point is that if you do something like SELECT a.x + b.y FROM x, y
WHERE <stuff>, the planner will initially produce a plan that emits
a.x and b.y.  But that's not actually what we want to produce: we want
to compute the expression a.x + b.y.  So that expression is the
scan/join target.  After getting the initial plan that produces a.x
and b.y as separate columns, we perform surgery on that plan to make
it return the computed expression instead.

If x and y are partitioned and the join is partition-wise, it's better
to perform this transformation recursively, getting each child-join to
return a.x + b.y; the alternative is for each child-join to return a.x
and b.y as a separate columns and then perform the projection at the
top level.

> So,
> 1. I'm not sure that partition pruning devs are on the hook for
>    producing a test case for the problem originally reported,
>    since we're clearly dealing with a wacko plan,
> 2. I wonder if this should be considered an open item related to commit
>    11cf92f6e2e1 instead of partprune.
>
> The other option is to ignore this whole affair, mark this as solved in
> the open items list, and move on.

I don't know whether there's actually a defect here any more.  I was
trying to dispel some perceived confusion on the part of David and Tom
about what this code was trying to accomplish, but the fact that the
code caused some confusion does not necessarily mean that there's
anything wrong with it.  On the other hand, perhaps it does have
functional deficiencies, or perhaps the comments need work.  Or maybe
this code is fine taken in isolation but there are still problems in
how it interacts with partition pruning.  Let's start by deciding what
we think the problems are, and then we can decide who should fix them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

David Rowley

Date:

17 August 2018, 09:49:24

On 17 August 2018 at 06:52, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't know whether there's actually a defect here any more.  I was
> trying to dispel some perceived confusion on the part of David and Tom
> about what this code was trying to accomplish, but the fact that the
> code caused some confusion does not necessarily mean that there's
> anything wrong with it.  On the other hand, perhaps it does have
> functional deficiencies, or perhaps the comments need work.  Or maybe
> this code is fine taken in isolation but there are still problems in
> how it interacts with partition pruning.  Let's start by deciding what
> we think the problems are, and then we can decide who should fix them.

I think Tom and I both agree that the plan mentioned in [1] and [2] is
not quite as optimal as it could be. The investigation I did in [3]
highlights where this got broken and the reason why it got broken.

Tom removed the offending Assert() in 59ef49d26d2, but mentioned
there's no test case which tries to ensure having a partitioned table
as an Append child works correctly.  I agree that it would be nice to
have a test for this, but I did try a couple of times to come up with
a test case to do this, but I was unable to come up with anything
suitable compact for the regression tests, and even at that, given how
difficult it is to get a plan into this shape, I didn't hold up much
hope of the test's plan remaining stable. Tom seems to agree with this
as he mentioned in [2].

I think the best path forward is if you (Robert) could verify the
behaviour that I mentioned in [3] is expected behaviour, if it is then
perhaps the path forward is for me to try harder to write a compact
test for this, but if you discover that the behaviour is unexpected
then I think the best course of action is to fix it to properly
flatten the
Append which will mean there's no need for a test case, and perhaps
the Assert that was removed can be put back again.

[1] https://www.postgresql.org/message-id/CAKJS1f9MQd5ntg6xXg7jJe0jB_bWvCDRmaH6yNzZGF%2BTVa5M%3DA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/8600.1533765148%40sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/CAKJS1f_j9BSJ7svDF7uid6EMm%2Bfu%2BcAhonZ7hcqiYU4ssv3rtg%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: FailedAssertion on partprune

From

David Rowley

Date:

17 August 2018, 09:58:23

On 14 August 2018 at 09:23, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Aug 11, 2018 at 9:16 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> I started debugging this to see where things go wrong. I discovered
>> that add_paths_to_append_rel() is called yet again from
>> apply_scanjoin_target_to_paths() and this is where it's all going
>> wrong. The problem is that the gather paths have been tagged onto the
>> partial paths by this time, so accumulate_append_subpath() has no code
>> to look through those to fetch the Append/MergeAppend subpaths, so it
>> just appends the entire path to the subpaths List.  This all worked
>> before 96030f9a481. This commit moved where generate_gather_paths() is
>> called.
>
> I'm baffled as to why looking through Gather to find
> Append/MergeAppend subpaths would ever be a sane thing to do.

Can you explain why it's less sane than what the current code is
doing?  Below a Gather there will be partial paths, but we can also
get those in a Parallel Append, which the accumulate_append_subpath()
code already attempts to handle. If the Gather Path is there already
then I guess one difference would be that the caller would need to
ensure that another Gather path is placed below the Parallel Append
again.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: FailedAssertion on partprune

From

"Jonathan S. Katz"

Date:

28 August 2018, 01:05:16

On Aug 17, 2018, at 2:49 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 17 August 2018 at 06:52, Robert Haas <robertmhaas@gmail.com> wrote:
I don't know whether there's actually a defect here any more. I was
trying to dispel some perceived confusion on the part of David and Tom
about what this code was trying to accomplish, but the fact that the
code caused some confusion does not necessarily mean that there's
anything wrong with it. On the other hand, perhaps it does have
functional deficiencies, or perhaps the comments need work. Or maybe
this code is fine taken in isolation but there are still problems in
how it interacts with partition pruning. Let's start by deciding what
we think the problems are, and then we can decide who should fix them.

I think Tom and I both agree that the plan mentioned in [1] and [2] is
not quite as optimal as it could be. The investigation I did in [3]
highlights where this got broken and the reason why it got broken.

Tom removed the offending Assert() in 59ef49d26d2, but mentioned
there's no test case which tries to ensure having a partitioned table
as an Append child works correctly. I agree that it would be nice to
have a test for this, but I did try a couple of times to come up with
a test case to do this, but I was unable to come up with anything
suitable compact for the regression tests, and even at that, given how
difficult it is to get a plan into this shape, I didn't hold up much
hope of the test's plan remaining stable. Tom seems to agree with this
as he mentioned in [2].

I think the best path forward is if you (Robert) could verify the
behaviour that I mentioned in [3] is expected behaviour, if it is then
perhaps the path forward is for me to try harder to write a compact
test for this, but if you discover that the behaviour is unexpected
then I think the best course of action is to fix it to properly
flatten the
Append which will mean there's no need for a test case, and perhaps
the Assert that was removed can be put back again.

[1] https://www.postgresql.org/message-id/CAKJS1f9MQd5ntg6xXg7jJe0jB_bWvCDRmaH6yNzZGF%2BTVa5M%3DA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/8600.1533765148%40sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/CAKJS1f_j9BSJ7svDF7uid6EMm%2Bfu%2BcAhonZ7hcqiYU4ssv3rtg%40mail.gmail.com

On behalf of the RMT, I just want to make sure this keeps moving along.

It sounds like the next step is for Robert to verify that [3] is the expected

behavior and then David can decide what to do from there.

Thanks!

Jonathan

Attachment

signature.asc

Re: FailedAssertion on partprune

From

Robert Haas

Date:

29 August 2018, 20:38:57

On Mon, Aug 27, 2018 at 6:05 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
> On behalf of the RMT, I just want to make sure this keeps moving along.
> It sounds like the next step is for Robert to verify that [3] is the
> expected
> behavior and then David can decide what to do from there.

Yes, that's the expected behavior.  If we didn't regenerate the paths
there, we'd end up with

Result
-> Append
 -> [various paths that produce a tlist which needs to be adjusted
later by the result node]

Instead of:

Append
-> [various paths that produce an adjusted tlist]

Paths of the former kind have already been generated; we regenerate
paths here to get the latter kind as well, which end up displacing the
old ones on cost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

Robert Haas

Date:

29 August 2018, 21:23:58

On Fri, Aug 17, 2018 at 2:58 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> I'm baffled as to why looking through Gather to find
>> Append/MergeAppend subpaths would ever be a sane thing to do.
>
> Can you explain why it's less sane than what the current code is
> doing?  Below a Gather there will be partial paths, but we can also
> get those in a Parallel Append, which the accumulate_append_subpath()
> code already attempts to handle.

Sorry, I lost track of this email.  accumulate_append_subpath's job is
to flatten stacked Append nodes into a single Append node.  But you
can't in general flatten two Append nodes that are separated by
something else in the middle, because that other node probably does
something.  To take a really simple example, consider:

Append
-> Result
  -> Append
    -> ...
-> ...

Well, the Result node computes a new target list, so the inner Append
node can't just blindly be combined with the outer Append node.  You
might be able to make it work if you adjusted the inner Append node's
children to produce the same target list that the Result node
produces, but you can't just flatten it blindly.

The same thing is true of Gather:

Append
-> Gather
  -> Append
    -> ...
-> ...

The child paths of the inner Append node are known to be parallel-safe
and are necessarily partial paths unless the inner Append is a
Parallel Append; the paths under the outer Append are definitely not
partial and might not even be parallel-safe (they can't be
parallel-unsafe, or we couldn't have a Gather anywhere in the plan,
but they *could* be parallel-restricted).  If you tried to flatten the
inner Append into the outer one, you'd no longer have anything like a
valid plan, because you can't have a partial path anywhere in the tree
without a Gather someplace higher up, which wouldn't be true here any
more, which I guess is what you meant by ...

> If the Gather Path is there already
> then I guess one difference would be that the caller would need to
> ensure that another Gather path is placed below the Parallel Append
> again.

...this, but that's not right.  The Gather would have to go ABOVE the
Parallel Append.

The broader point here, though, is that even if you could do this kind
of surgery, it's the wrong way of going about the problem.  In any
case where Append-Gather-Append could be flattened to Gather-Append,
we should have just generated Gather-Append initially, rather than
trying to rejigger things that way later.  And I think that's what the
code does.  I'm not sure exactly what's going wrong with runtime
partition pruning in this case, but ISTM that if runtime partition
pruning doesn't work in some obscure case, that's just a limitation we
should accept until somebody gets around to working on it.  The key
thing right now is not to have crashes or wrong behavior, so that we
can ship.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: FailedAssertion on partprune

From

Jaime Casanova

Date:

31 August 2018, 19:48:55

On Thu, 16 Aug 2018 at 11:38, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2018-Jul-23, Jaime Casanova wrote:
>
>
> > I was trying sqlsmith on REL_11_STABLE (commit
> > 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at
> > hand, and got an assertion failure.
> > It seems to happen during planning on prunning time but only when
> > tables get bigger than certain size.
>
> Hi Jaime
>
> Did you try your luck running sqlsmith after this fix?  It'd be
> interesting to know if there are further failures.
>

intermitently... so far so good!
i'm adding things (taken from release notes) slowly every often

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FailedAssertion on partprune

From

"Jonathan S. Katz"

Date:

31 August 2018, 20:24:11

On 8/29/18 1:38 PM, Robert Haas wrote:
> On Mon, Aug 27, 2018 at 6:05 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
>> On behalf of the RMT, I just want to make sure this keeps moving along.
>> It sounds like the next step is for Robert to verify that [3] is the
>> expected
>> behavior and then David can decide what to do from there.
>
> Yes, that's the expected behavior.  If we didn't regenerate the paths
> there, we'd end up with
>
> Result
> -> Append
>  -> [various paths that produce a tlist which needs to be adjusted
> later by the result node]
>
> Instead of:
>
> Append
> -> [various paths that produce an adjusted tlist]
>
> Paths of the former kind have already been generated; we regenerate
> paths here to get the latter kind as well, which end up displacing the
> old ones on cost.
>

An update from the RMT: after our meeting today, we decided that we
would drop this from the list of open items for the 11 major release.
The initial issue was already fixed and we understand that developing
the test for this will take some work and we do not thing it is needed
for the v11 release.

Thanks again for working on this!

Jonathan

Attachment

signature.asc