Re: Planner creating ineffective plans on LEFT OUTER joins - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Planner creating ineffective plans on LEFT OUTER joins
Date
Msg-id 200806270321.54235.andres@anarazel.de
Whole thread Raw
In response to Re: Planner creating ineffective plans on LEFT OUTER joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Planner creating ineffective plans on LEFT OUTER joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On Thursday 26 June 2008 04:36:09 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > SELECT *
> > FROM
> >     ab LEFT OUTER JOIN (
> >         bc JOIN cd
> >         ON bc.c = cd.d
> >     )
> >     ON ab.b = bc.b
> >
> > WHERE
> >     ab.a = 20000
> >
> > As ab.a = 20000 occurs only once in ab one would expect that it just does
> > an index scan on bc for ab.b = bc.b.
There was a typo in here (ON bc.c = cd.d should be ON bc.c = cd.c):
http://anarazel.de/postgres/testtable_query4.plan
Better query plan, but it still not optimal - interestingly the query plan
works out perfecty for ab.a = 10:
http://anarazel.de/postgres/testtable_query3.plan
....

> The only way it could do that would be by interchanging the order of the
> left and inner joins, ie (ab left join bc) join cd; which would change
> the results.
My knowledge about the implementation side of relational databases is quite
limited, so my ideas may be quite flawed:
The planner already recognizes that the left side of the join is quite small
and the right side will be very big.
Why cant it optimize the query the same way it does for a inner join, namely
doing an index lookup on bc?
I dont see the fundamental problem?

> I believe it could interchange the joins if they were both LEFT or
> both INNER.  Do you really need exactly these semantics?
I don't see an easy/effective way to express it:
I need all data belonging left side of the join (proband) through a series
(participation -> answer_group -> answer -> data) of
inner joins and NULL if there is no data.
If there would be only one such join it wouldn't be a problem - but a normal
query has around 20 such LEFT JOINS.
Currently I solve this through separately inserting the data for each join
into a temporary table which is still way much faster. But not having the
statistics the planner has selecting a good order isn't that easy. Besides its
not very elegant.
So, if somebody has a better idea...

If I can use my time to improve pg instead of working around the problem on
clientside both me and my employer will be happy...



Thanks,

Andres

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Removal of the patches email list
Next
From: Tom Lane
Date:
Subject: Re: Planner creating ineffective plans on LEFT OUTER joins