Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Performance improvement for joins where outer side is unique
Date
Msg-id CA+TgmoYJc+hBsvCQiPpuHM6sCRc1NGq5FWO10yeJtSeoH4ji8w@mail.gmail.com
Whole thread Raw
In response to Re: Performance improvement for joins where outer side is unique  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance improvement for joins where outer side is unique
List pgsql-hackers
On Fri, Mar 11, 2016 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I started re-reading this thread in preparation for looking at the
> patch, and this bit in your initial message jumped out at me:
>
>> In all of our join algorithms in the executor, if the join type is SEMI,
>> we skip to the next outer row once we find a matching inner row. This is
>> because we don't want to allow duplicate rows in the inner side to
>> duplicate outer rows in the result set. Obviously this is required per SQL
>> spec. I believe we can also skip to the next outer row in this case when
>> we've managed to prove that no other row can possibly exist that matches
>> the current outer row, due to a unique index or group by/distinct clause
>> (for subqueries).
>
> I wondered why, instead of inventing an extra semantics-modifying flag,
> we couldn't just change the jointype to *be* JOIN_SEMI when we've
> discovered that the inner side is unique.
>
> Now of course this only works if the join type was INNER to start with.
> If it was a LEFT join, you'd need an "outer semi join" jointype which
> we haven't got at the moment.  But I wonder whether inventing that
> jointype wouldn't let us arrive at a less messy handling of things in
> the executor and EXPLAIN.  I'm not very enamored of plastering this
> "match_first_tuple_only" flag on every join, in part because it doesn't
> appear to have sensible semantics for other jointypes such as JOIN_RIGHT.
> And I'd really be happier to see the information reflected by join type
> than a new line in EXPLAIN, also.

The new join pushdown code in postgres_fdw does not grok SEMI and ANTI
joins because there is no straightforward way of reducing those back
to SQL.  They can originate in multiple ways and not all of those can
be represented easily.  I think it would be nice to do something to
fix this.  For example, if a LEFT join WHERE outer_column IS NULL
turns into an ANTI join, it would be nice if that were marked in some
way so that postgres_fdw could conveniently emit it in the original
form.

Maybe the people who have been working on that patch just haven't been
creative enough in thinking about how to solve this problem, but it
makes me greet the idea of more join types that don't map directly
back to SQL with somewhat mixed feelings.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat
Next
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.