Re: Planning performance problem (67626.278ms) - Mailing list pgsql-performance

From Tom Lane
Subject Re: Planning performance problem (67626.278ms)
Date
Msg-id 508897.1625005916@sss.pgh.pa.us
Whole thread Raw
In response to Re: Planning performance problem (67626.278ms)  (Manuel Weitzman <manuelweitzman@gmail.com>)
Responses Re: Planning performance problem (67626.278ms)  (Manuel Weitzman <manuelweitzman@gmail.com>)
List pgsql-performance
Manuel Weitzman <manuelweitzman@gmail.com> writes:
> On 29-06-2021, at 15:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That seems a bit broken; a given WHERE clause should produce only one
>> RestrictInfo.  Can you provide a more concrete example?

>> explain (analyze, buffers)
>> select * from a
>> join b b1 on (b1.a = a.a)
>> join b b2 on (b2.a = a.a)
>> where b1.a in (1,100,10000,1000000,1000001);

Hm.  By my count, this example generates 3 RestrictInfos during
deconstruct_jointree, representing the three original clauses
from the query, and then 4 more in generate_join_implied_equalities,
representing the EC-derived clauses

    a.a = b1.a
    a.a = b2.a
    b1.a = b2.a
    b1.a = a.a

The third of these seems legit enough; it's a new fact that we
want to apply while considering joining b1 directly to b2.
The other ones get made despite create_join_clause's efforts
to avoid making duplicate clauses, because of two things:

1. create_join_clause doesn't trouble to look for commuted
equivalents, which perhaps is penny-wise and pound-foolish.
The cost of re-deriving selectivity estimates could be way
more than the cost of checking this.

2. Although these look like they ought to be equivalent to the
original clauses (modulo commutation, for some of them), they don't
look that way to create_join_clause, because it's also checking
for parent_ec equality.  Per the comment,

 * parent_ec is either equal to ec (if the clause is a potentially-redundant
 * join clause) or NULL (if not).  We have to treat this as part of the
 * match requirements --- it's possible that a clause comparing the same two
 * EMs is a join clause in one join path and a restriction clause in another.

It might be worth digging into the git history to see why that
became a thing and then considering whether there's a way around it.
(I'm pretty sure that comment is mine, but I don't recall the details
anymore.)

Anyway, it's certainly not the case that we're making new
RestrictInfos for every pair of rels.  It looks that way in this
example because the join vars all belong to the same EC, but
that typically wouldn't be the case in more complex queries.

So we could look into whether this code can be improved to share
RestrictInfos across more cases.  Another thought is that even
if we need to keep original and derived clauses separate, maybe it'd
be all right to copy previously-determined cached selectivity values
from an original clause to an otherwise-identical derived clause
(cf. commute_restrictinfo()).  I'm not sure though whether it's
reliably the case that we'd have filled in selectivities for the
original clauses before this code wants to clone them.

            regards, tom lane



pgsql-performance by date:

Previous
From: Manuel Weitzman
Date:
Subject: Re: Planning performance problem (67626.278ms)
Next
From: Manuel Weitzman
Date:
Subject: Re: Planning performance problem (67626.278ms)