Re: Improving our clauseless-join heuristics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Improving our clauseless-join heuristics
Date
Msg-id 22683.1334730536@sss.pgh.pa.us
Whole thread Raw
In response to Re: Improving our clauseless-join heuristics  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Improving our clauseless-join heuristics  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
Amit Kapila <amit.kapila@huawei.com> writes:
>> I'm afraid I'm still not following you very well.  Perhaps you could
>> submit a proposed patch?

> Before that can you please explain in little more detail (if possible with
> small example) about the idea you have told in original mail : "is there any
> join clause that both these relations participate in?"

Um ... wasn't that well enough explained already?

I think there are basically two cases.  You can have a join clause that
is immediately useful for joining two relations, say
select ... from a,b where a.x = b.y;

This is "immediate" in the sense that you can apply it when joining a
to b, regardless of any other relations involved in the query.

Or you can have a case like
select ... from a,b,c where (a.x + b.y) = c.z;

This clause is not immediately useful for joining any two of the three
relations in the query.  It will be useful when we get to level 3,
particularly so if we chose to join a and b first and there's an index
on c.z.  But we would have had to accept doing a cartesian join of a and
b to arrive at that situation.  In this example, we have no alternative
except to do some cartesian join at level 2 --- but as soon as we add
some more tables and join clauses to the example, we could get
distracted from the possibility that a cartesian join of a and b might
be a good idea.

Given that make_rels_by_joins doesn't (and shouldn't IMO) have any
detailed understanding of the semantics of particular join clauses,
I would not expect it to realize that joining a to b is the most likely
option out of the three possible clauseless joins that are available
at level 2 in this query.  It's going to have to generate all 3, and
then costing at the next level will figure out what's best to do.

However, I think it *does* need to understand that clauses relating
3 or more relations can work like this.  In the code as it stood before
last week, it would actively reject joining a to b if there were any
additional relations in the query.  That's just not right.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Wolfgang Wilhelm
Date:
Subject: Re: Bug tracker tool we need
Next
From: Peter Eisentraut
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Don't override arguments set via options with positional argumen