Re: [HACKERS] Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Performance improvement for joins where outer side is unique
Date
Msg-id 1545.1485535460@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Performance improvement for joins where outer side is unique  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Performance improvement for joins where outer side is unique  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
I wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> hmm. I'm having trouble understanding why this is a problem for Unique
>> joins, but not for join removal?

> Ah, you know what, that's just mistaken.  I was thinking that we
> short-circuited the join on the strength of the hash (or merge) quals
> only, but actually we check all the joinquals first.  As long as the
> uniqueness proof uses only joinquals and not conditions that will end up
> as otherquals, it's fine.

Actually, after thinking about that some more, it seems to me that there
is a performance (not correctness) issue here: suppose that we have
something like
select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y

If there's a unique index on t2.x, we'll be able to mark the join
inner-unique.  However, short-circuiting would only occur after
finding a row that passes both joinquals.  If the y condition is
true for only a few rows, this would pretty nearly disable the
optimization.  Ideally we would short-circuit after testing the x
condition only, but there's no provision for that.

This might not be a huge problem for outer joins.  My sense of typical
SQL style is that the joinquals (ON conditions) are likely to be
exactly what you need to prove inner uniqueness, while random other
conditions will be pushed-down from WHERE and hence will be otherquals.
But I'm afraid it is quite a big deal for inner joins, where we dump
all available conditions into the joinquals.  We might need to rethink
that choice.

At least for merge and hash joins, it's tempting to think about a
short-circuit test being made after testing just the merge/hash quals.
But we'd have to prove uniqueness using only the merge/hash quals,
so the planning cost might be unacceptably high --- particularly for
merge joins which often don't use all available mergeable quals.
In the end I think we probably want to keep the short-circuit in the
same place where it is for SEMI/ANTI cases (which have to have it
exactly there for semantic correctness).

I'm afraid though that we may have to do something about the
irrelevant-joinquals issue in order for this to be of much real-world
use for inner joins.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superusercheck
Next
From: Stephen Frost
Date:
Subject: Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superusercheck