Re: Outer Join performance in PostgreSQL - Mailing list pgsql-performance

From Michael Alan Dorman
Subject Re: Outer Join performance in PostgreSQL
Date
Msg-id 87d5l9r1mm.fsf@hero.mallet-assembly.org
Whole thread Raw
In response to Outer Join performance in PostgreSQL  (Ashok Agrawal <Ashok.Agrawal@Sun.COM>)
Responses Re: Outer Join performance in PostgreSQL  (Ashok Agrawal <Ashok.Agrawal@Sun.COM>)
List pgsql-performance
Ashok Agrawal <Ashok.Agrawal@Sun.COM> writes:
> I noticed outer join is very very slow in postgresql as compared
> to Oracle.

I think the three things the people best able to help you are going to
ask for are 1) what version of PostgreSQL, 2) what are the tables, and
how many rows in each, and 3) output from 'explain analyze' rather
than just 'explain'.

That said, I'm willing to take an amateurish stab at it even without
that.

In fact, I don't think the outer joins are the issue at all.  I see
that you're forcing a right join from ecms_certificate_types to
ecms_cert_headers.  This seems to be causing postgresql to think it
must (unnecessarily) consider three quarters of a billion rows, which,
if I'm reading right, seems to be producing the majority of the
estimated cost:

>  Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
>    Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
>    ->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 width=826)

In fact, looking at the fact that you're doing a COALESCE on a column
from b, it seems to me that doing a right join from ecms_cert_headers
to ecms_certificate_types is just wrong.  It seems to me that that
should be a left join as well.

With that in mind, I would rewrite the whole FROM clause as:

           FROM ecms_cert_headers a
LEFT OUTER JOIN ecms_certificate_types b
             ON (a.certificate_type_id = b.certificate_type_id)
LEFT OUTER JOIN taxpack_user c
             ON (a.created_by = c.emp_no)
LEFT OUTER JOIN taxpack_user d
             ON (a.updated_by = d.emp_no)
LEFT OUTER JOIN taxpack_user e
             ON (a.approved_by = e.emp_no)
          WHERE a.dln_code = '17319'

It seems to me that this more reflects the intent of the data that is
being retrieved.  I would also expect it to be a boatload faster.

Assuming I've understood the intent correctly, I would guess that the
difference is the result of the Oracle planner being able to eliminate
the right join or something.

Mike

pgsql-performance by date:

Previous
From: Ron Peacetree
Date:
Subject: Re: Some help on buffers and other performance tricks
Next
From: Ron Peacetree
Date:
Subject: Re: Sort performance on large tables