Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join) - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join)
Date
Msg-id 786.1565541557@sss.pgh.pa.us
Whole thread Raw
In response to BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join)  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15947: Worse plan is chosen after giving the planner morefreedom (partitionwise join)  (Keith <keith@keithf4.com>)
List pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> After creating of the tables below:
> ...
> ANALYZE sg_19_01_d, sg_19_02_d, sc_19_01_d, sc_19_02_d;

Hm, you made a tactical error there: you should have done

ANALYZE sg, sc;

ie analyze the parent partitioned tables not the partitions.
As you have it, we never compute inherited stats across the
whole partitioned tables, which leads to wrong estimates
about the join size and hence cost:

 Aggregate  (cost=147.25..147.26 rows=1 width=8) (actual time=9.934..9.934 rows=1 loops=1)
   ->  Hash Join  (cost=80.00..146.62 rows=250 width=0) (actual time=6.204..9.632 rows=2000 loops=1)
   ...

After analyzing the parents it's much better:

 Aggregate  (cost=195.00..195.01 rows=1 width=8) (actual time=6.199..6.200 rows=1 loops=1)
   ->  Hash Semi Join  (cost=88.00..190.00 rows=2000 width=0) (actual time=2.586..5.885 rows=2000 loops=1)
   ...

The size and cost estimates for partitionwise-join paths are made by
adding up the per-partition sizes/costs, so that those are a lot
closer to being correct even without any parent-level stats:

 Aggregate  (cost=175.00..175.01 rows=1 width=8) (actual time=4.562..4.563 rows=1 loops=1)
   ->  Append  (cost=39.00..170.00 rows=2000 width=0) (actual time=1.105..4.327 rows=2000 loops=1)
   ...

So *if* you have parent-level stats in place, this example works well:
the partitionwise join is estimated as cheaper than the other way, and
that estimate is correct, and all is great.

But when you don't, why doesn't it seize on the incorrectly-estimated-
as-cheaper non-partitioned join?  The answer seems to be that
apply_scanjoin_target_to_paths throws away all the non-partitioned
paths, if the join is partitioned.  It claims that

     * If the rel is partitioned, we want to drop its existing paths and
     * generate new ones.  This function would still be correct if we kept the
     * existing paths: we'd modify them to generate the correct target above
     * the partitioning Append, and then they'd compete on cost with paths
     * generating the target below the Append.  However, in our current cost
     * model the latter way is always the same or cheaper cost, so modifying
     * the existing paths would just be useless work.

but evidently that's not really true when considering partitioned vs
non-partitioned joins.  It holds only if cost estimates that are made
in very different ways are comparable.

I'm inclined to think that the partitioned estimates are more trustworthy
than the non-partitioned ones, so maybe we should leave things as they
stand for now.  Still, this is another point that's causing me to form
an increasingly hardened conviction that apply_scanjoin_target_to_paths
needs to be nuked from orbit.  It's been nothing but trouble since it
was committed, which is unsurprising considering how many planner
structural conventions it tramples on.  I'm not very sure what a better
way would look like though :-(.

Another point that this example raises is that letting autoanalyze
ignore partition parent tables is not just a minor problem, but
a potentially serious bug, causing very poor plan choices to be
made for any nontrivial queries involving partitioned tables.

I don't see us doing anything about either of these points in the
very short term, and certainly not back-patching any changes into
released branches.  But we ought to think about fixes going forward.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Erik Rijkers
Date:
Subject: Re: BUG #15947: Worse plan is chosen after giving the planner morefreedom (partitionwise join)
Next
From: Tom Lane
Date:
Subject: Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query