Re: [HACKERS] Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Performance improvement for joins where outer side is unique
Date
Msg-id 4444.1491506795@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance improvement for joins where outer side is unique  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 2 April 2017 at 21:21, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I've attached an updated patch which updates the regression test output of
>> a recent commit to include the "Unique Inner" in the expected results.

> The patch must've fallen off. Attempt number 2 at attaching.

I'm looking through this, and I'm failing to see where it deals with
the problem we discussed last time, namely that you can't apply the
optimization unless all clauses that were used in the uniqueness
proof are included in the join's merge/hash conditions + joinquals.

It might be sufficient to ignore is_pushed_down conditions (at an
outer join only) while trying to make the proofs; but if it's doing
that, I don't see where.

I don't especially like the centralized unique_rels cache structure.
It's not terribly clear what it's for, and you're making uncomfortably
large assumptions about never indexing off the end of the array, despite
not having any range checks for the subscripts.  Wouldn't it be better to
add simple List fields into RelOptInfo, representing the outer rels this
rel has been proven unique or not-unique for?  That would dodge the array
size question and would be more readily extensible to someday applying
this to join rels.

I also think some more thought is needed about handling JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER cases.  In the first place, the patch cavalierly
violates the statement in joinpath.c that those jointype values never
propagate outside that module.  In the second place, shouldn't
JOIN_UNIQUE_INNER automatically result in the path getting marked
inner_unique?  I suspect the logic in add_paths_to_joinrel ought to
look something like
   if (jointype == JOIN_UNIQUE_INNER)       extra.inner_unique = true;   else       extra.inner_unique =
innerrel_is_unique(root,outerrel, innerrel,                       (jointype == JOIN_UNIQUE_OUTER ? JOIN_INNER :
jointype),                                              restrictlist); 

        regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether
Next
From: Simon Riggs
Date:
Subject: Re: [pgsql-students] [HACKERS] [GSoC] Push-based query executor discussion