Re: Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Parameterized-path cost comparisons need some work
Date
Msg-id 22521.1334797983@sss.pgh.pa.us
Whole thread Raw
In response to Re: Parameterized-path cost comparisons need some work  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
So while testing this patch I've found out that there is a pretty nasty
bug in HEAD as well as in my current formulation of the patch.  Here
is an example using the regression database:

select count(*) from (tenk1 a join tenk1 b on a.unique1=b.unique2) left join tenk1 c on a.unique2 = b.unique1 and
c.thousand= a.thousand join int4_tbl on b.thousand = f1;
 

The correct answer to this query is 10, according to all previous PG
branches, but HEAD is reporting zero.  A look at the query plan
soon finds the smoking gun:
Aggregate  (cost=264.29..264.30 rows=1 width=0)  ->  Nested Loop Left Join  (cost=0.00..264.16 rows=50 width=0)
JoinFilter: (a.unique2 = b.unique1)        ->  Nested Loop  (cost=0.00..234.21 rows=50 width=12)              Join
Filter:(b.unique2 = a.unique1)              ->  Nested Loop  (cost=0.00..211.85 rows=50 width=8)                    ->
SeqScan on int4_tbl  (cost=0.00..1.05 rows=5 width=4)                    ->  Index Scan using tenk1_thous_tenthous on
tenk1b  (cost=0.00..42.04 rows=10 width=12)                          Index Cond: (thousand = int4_tbl.f1)
-> Index Scan using tenk1_unique2 on tenk1 a  (cost=0.00..0.43 rows=1 width=12)                    Index Cond: (unique2
=b.unique1)        ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c  (cost=0.00..0.45 rows=10 width=4)
    Index Cond: (thousand = a.thousand)
 

The condition a.unique2 = b.unique1 is getting pushed down into
the a/b join, where it should not go; applying it there causes
a/b join rows to be discarded when they ought to survive and
then be null-extended at the left join.

What this demonstrates is that the rule for identifying safe movable
clauses that's used in HEAD is not good enough.  It rejects clauses
that reference relations that are on the inside of a left join relative
to the target relation --- but the problematic clause doesn't actually
reference c, so that doesn't trigger.  The issue would go away if we
examined required_relids instead of clause_relids, but that causes
other, perfectly safe, optimizations to be discarded.

I believe that the correct formulation of the join clause movement rule
is "outer join clauses can't be pushed into the left-hand side of their
outer join".  However, the present representation of clauses doesn't
provide enough information to test this cheaply during scan planning
(indeed maybe it can't be done at all after distribute_qual_to_rels,
because we don't retain any explicit memory of which clauses are above
or below which outer joins).  Looking at nullable_relids isn't the right
thing because what that tells you about is what's on the right-hand side
of the outer join, not the left side.

So I'm coming to the conclusion that to get this right, we need to
add another relids field to RestrictInfo and have it filled in during
distribute_qual_to_rels.  This is really probably going to end up
cheaper than what we have today, because the join clause movement
tests are somewhat expensive as the code stands, and it should be
possible to reduce the number of operations needed there.

What I have in mind is that the new field would be named something like
outer_left_relids, and would list the relids of all base rels that are
in the left-hand side of the outer join that the clause belongs to
(or be NULL if the clause isn't an outer-join clause).  This is cheaply
constructable during distribute_qual_to_rels, we just are not bothering
to record the information at present.  Then the join clause movement
checks can easily detect that it would be unsafe to push down such a
clause to any of the listed relations.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Bug #6593, extensions, and proposed new patch policy
Next
From: Peter Geoghegan
Date:
Subject: Timsort performance, quicksort (was: Re: Memory usage during sorting)