Re: join removal - Mailing list pgsql-hackers

From Robert Haas
Subject Re: join removal
Date
Msg-id 603c8f070908161431u7b07bf30g2c0ddd3eef220ebd@mail.gmail.com
Whole thread Raw
In response to Re: join removal  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: join removal  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Aug 9, 2009 at 12:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> distinct_col_search() is going to return the relevant equality
>> operator from the argument list, which is ultimately going to come
>> from the RestrictInfo for the join clause.  So I need to see whether
>> that's compatible with the index, but equality_ops_are_compatible()
>> wants two equality operators, and what I have is one equality operator
>> and one operator class.
>
> For that you just check if the operator is a member of the class.
> (You might need to verify that it's an equality operator in the class
> too; not clear if the context is enough to be sure that it's not '<'
> for example.)

It seems that the needed checks are very similar to the ones that we
already implement when setting restrictinfo->mergeopfamilies.  That is
filled in by get_mergejoin_opfamilies(), which checks for btree
opfamilies where the strategy number is BTEqualStrategyNumber.  This
might cease to be the correct check in the (not-too-distant?) future
if we end up implementing other kinds of unique indices, but right now
btrees are all there is.

One possibility would be to have relation_is_distinct_for() call
get_mergejoin_opfamilies() for each operator; then for each index we
can check whether the opfamily of the relevant index column is in the
returned list.  This seems a bit wasteful, though, since I believe
that relation_is_distinct_for() would be called from joinpath.c, which
has access to restrictinfo->mergeopfamilies already.

I'm wondering whether it would make more sense to modify the proposed
API for relation_is_distinct_for() in some way so that we don't lose
this information.  It seems to me that the overall process here is
something like this (recalling that I'm focusing only on removing LEFT
joins at this point):

1. Given a joinrel, innerrel, and outerrel, find the list of
RestrictInfos for which (a) restrictinfo->mergeopfamilies != NIL, (b)
restrictinfo->outer_is_left is well-defined (as per logic in
select_mergejoin_clauses), and (c) the outer side is a Var.  If this
list is NIL, then give up; join removal is not possible.

2. Check whether any attributes from the outer side are used above the
join; if so, then give up; join removal is not possible.

3. Extract the column numbers from the Vars found in step 1(C) and the
mergeopfamilies found in step 1(A).

4. Look a unique, non-expression index (which must also have
index->indpred == NIL or index->predOK) for which every column number
appears in the list of column numbers computed in step 3, with one of
the corresponding opfamilies also found in step (2).  If one is found,
then the join is removable.

Thoughts?

...Robert


pgsql-hackers by date:

Previous
From: James Pye
Date:
Subject: Re: [PATCH] plpythonu datatype conversion improvements
Next
From: vanek
Date:
Subject: build error