Re: Improving our clauseless-join heuristics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Improving our clauseless-join heuristics
Date
Msg-id 4920.1334608527@sss.pgh.pa.us
Whole thread Raw
In response to Re: Improving our clauseless-join heuristics  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Improving our clauseless-join heuristics  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
Amit Kapila <amit.kapila@huawei.com> writes:
> For this kind of query, currently (referring 9.0.3 code) also it considers
> join of b,c and b,d. 
> As there is no join clause between b,c,d so it will go in path of
> make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

> In this kind of query, if the suggestion by me in below mail is followed,
> then it will consider joining a,b a,c a,d at level-2 in function
> make_rels_by_clause_joins() which it currently doesn't do which may generate
> useless join paths.
> However in real-world scenario's this kind of queries where 2 cols of
> different tables are
> used in one side expression (b.y + c.z) of where clause may be less.

> On the other hand, when we come to consider d, it will have no join
> clauses so we will consider joining it to each other rel in turn.  

> When it come to consider d, as at level -2 it only consider later rels. So
> it should not consider joining with each other rel.

I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

I can't really get excited about doing it that way instead of the
current way.  In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old "SELECT * FROM a, b". So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there.  In the second
place, instead of N tests to see whether a rel lacks any join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.
So it seems to make things slower for little obvious benefit.

In general, queries with join-clause-less rels are pretty uncommon,
so I don't want to introduce extra work into make_rels_by_clause_joins
to handle the case.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Last gasp
Next
From: Jeff Davis
Date:
Subject: Re: 9.3 Pre-proposal: Range Merge Join