Thread: Outer Join performance in PostgreSQL

Outer Join performance in PostgreSQL

From
Ashok Agrawal
Date:
I noticed outer join is very very slow in postgresql as compared
to Oracle.

SELECT a.dln_code, a.company_name,
to_char(a.certificate_date,'DD-MON-YYYY'),
to_char(a.certificate_type_id, '99'),
COALESCE(b.certificate_type_description,'None') ,
a.description, a.blanket_single, a.certificate_status,
COALESCE(a.sun_legal_entity, 'None'),
COALESCE(a.other_entity_name, 'None'),
COALESCE(a.notes, 'None'),COALESCE(c.name, NULL),
COALESCE(to_char(a.created_date,'DD-MON-YYYY'), 'N/A'),
COALESCE(c.name, NULL),
COALESCE(to_char(a.updated_date,'DD-MON-YYYY'), 'N/A'),
COALESCE(e.name, NULL),
COALESCE(to_char(a.approved_date,'DD-MON-YYYY'), 'N/A')
  FROM ((((ecms_cert_headers a
        LEFT OUTER JOIN taxpack_user c ON (a.created_by = c.emp_no))
        LEFT OUTER JOIN taxpack_user d ON (a.updated_by = d.emp_no))
        LEFT OUTER JOIN taxpack_user e ON (a.approved_by = e.emp_no))
        INNER JOIN ecms_certificate_types b ON
      (a.certificate_type_id= b.certificate_type_id ))
 WHERE a.dln_code = '17319'


This query return only 1 record but take 25 second to execute in postgreSQL
as compared to 1.3 second in Oracle. Any suggestion ? Below is explain output.


 Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
   Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
   ->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 width=826)
         Merge Cond: ("outer"."?column3?" = "inner"."?column16?")
         ->  Sort  (cost=30776.19..31207.80 rows=172645 width=64)
               Sort Key: (e.emp_no)::text
               ->  Seq Scan on taxpack_user e  (cost=0.00..4898.45 rows=172645
width=64)
         ->  Sort  (cost=1635271.94..1637136.51 rows=745827 width=811)
               Sort Key: (a.approved_by)::text
               ->  Merge Left Join  (cost=25230.45..36422.18 rows=745827 width=811)
                     Merge Cond: ("outer"."?column17?" = "inner"."?column2?")
                     ->  Sort  (cost=3117.35..3119.51 rows=864 width=844)
                           Sort Key: (a.updated_by)::text
                           ->  Nested Loop Left Join  (cost=0.00..3075.21
rows=864 width=844)
                                 ->  Index Scan using pk_ecms_cert_headers on
ecms_cert_headers a  (cost=0.00..6.01 rows=1 width=829)
                                       Index Cond: ((dln_code)::text =
'17319'::text)
                                 ->  Index Scan using ash_n1 on taxpack_user c
(cost=0.00..3058.40 rows=864 width=64)
                                       Index Cond: (("outer".created_by)::text =
(c.emp_no)::text)
                     ->  Sort  (cost=22113.10..22544.71 rows=172645 width=16)
                           Sort Key: (d.emp_no)::text
                           ->  Seq Scan on taxpack_user d  (cost=0.00..4898.45
rows=172645 width=16)
   ->  Hash  (cost=1.49..1.49 rows=49 width=50)
         ->  Seq Scan on ecms_certificate_types b  (cost=0.00..1.49 rows=49
width=50)
(23 rows)

Thanks
Ashok


Re: Outer Join performance in PostgreSQL

From
Michael Alan Dorman
Date:
Ashok Agrawal <Ashok.Agrawal@Sun.COM> writes:
> I noticed outer join is very very slow in postgresql as compared
> to Oracle.

I think the three things the people best able to help you are going to
ask for are 1) what version of PostgreSQL, 2) what are the tables, and
how many rows in each, and 3) output from 'explain analyze' rather
than just 'explain'.

That said, I'm willing to take an amateurish stab at it even without
that.

In fact, I don't think the outer joins are the issue at all.  I see
that you're forcing a right join from ecms_certificate_types to
ecms_cert_headers.  This seems to be causing postgresql to think it
must (unnecessarily) consider three quarters of a billion rows, which,
if I'm reading right, seems to be producing the majority of the
estimated cost:

>  Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
>    Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
>    ->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 width=826)

In fact, looking at the fact that you're doing a COALESCE on a column
from b, it seems to me that doing a right join from ecms_cert_headers
to ecms_certificate_types is just wrong.  It seems to me that that
should be a left join as well.

With that in mind, I would rewrite the whole FROM clause as:

           FROM ecms_cert_headers a
LEFT OUTER JOIN ecms_certificate_types b
             ON (a.certificate_type_id = b.certificate_type_id)
LEFT OUTER JOIN taxpack_user c
             ON (a.created_by = c.emp_no)
LEFT OUTER JOIN taxpack_user d
             ON (a.updated_by = d.emp_no)
LEFT OUTER JOIN taxpack_user e
             ON (a.approved_by = e.emp_no)
          WHERE a.dln_code = '17319'

It seems to me that this more reflects the intent of the data that is
being retrieved.  I would also expect it to be a boatload faster.

Assuming I've understood the intent correctly, I would guess that the
difference is the result of the Oracle planner being able to eliminate
the right join or something.

Mike

Re: Outer Join performance in PostgreSQL

From
Stephan Szabo
Date:
On Wed, 9 Nov 2005, Ashok Agrawal wrote:

> I noticed outer join is very very slow in postgresql as compared
> to Oracle.
>
> SELECT a.dln_code, a.company_name,
> to_char(a.certificate_date,'DD-MON-YYYY'),
> to_char(a.certificate_type_id, '99'),
> COALESCE(b.certificate_type_description,'None') ,
> a.description, a.blanket_single, a.certificate_status,
> COALESCE(a.sun_legal_entity, 'None'),
> COALESCE(a.other_entity_name, 'None'),
> COALESCE(a.notes, 'None'),COALESCE(c.name, NULL),
> COALESCE(to_char(a.created_date,'DD-MON-YYYY'), 'N/A'),
> COALESCE(c.name, NULL),
> COALESCE(to_char(a.updated_date,'DD-MON-YYYY'), 'N/A'),
> COALESCE(e.name, NULL),
> COALESCE(to_char(a.approved_date,'DD-MON-YYYY'), 'N/A')
>   FROM ((((ecms_cert_headers a
>         LEFT OUTER JOIN taxpack_user c ON (a.created_by = c.emp_no))
>         LEFT OUTER JOIN taxpack_user d ON (a.updated_by = d.emp_no))
>         LEFT OUTER JOIN taxpack_user e ON (a.approved_by = e.emp_no))
>         INNER JOIN ecms_certificate_types b ON
>       (a.certificate_type_id= b.certificate_type_id ))
>  WHERE a.dln_code = '17319'

I think in the above it's safe to do the inner join first, although
PostgreSQL won't determine that currently and that could have something to
do with the difference in performance if Oracle did reorder the joins.
If you were to run the query doing the INNER JOIN first, does that give
the correct results and run more quickly in PostgreSQL?  In either case,
explain analyze output would be handy to find the actual times taken by
the steps.

Re: Outer Join performance in PostgreSQL

From
Ashok Agrawal
Date:
Hello Michael,

Here is the information : I had executed explain analyze with modified
FROM clause.

Oops forgot to mention the version earlier.

Using postgres 8.0.0 on Solaris 9.

Rows Count :

cic=# select count(*) from taxpack_user;
 count
--------
 172645
(1 row)

cic=# select count(*) from ecms_certificate_types;
 count
-------
    10
(1 row)

cic=# select count(*) from ecms_cert_headers;
 count
-------
 17913
(1 row)

Table Information :

                  Table "ecms.ecms_certificate_types"
            Column            |            Type             | Modifiers
------------------------------+-----------------------------+-----------
 certificate_type_id          | smallint                    | not null
 certificate_type_description | character varying(60)       |
 created_by                   | character varying(30)       |
 created_date                 | timestamp without time zone |
 updated_by                   | character varying(30)       |
 updated_date                 | timestamp without time zone |
Indexes:
    "sys_c003733" PRIMARY KEY, btree (certificate_type_id)
    "pk_ecms_certificate_types" UNIQUE, btree (certificate_type_id)

 Table "ecms.ecms_cert_headers"
       Column        |            Type             | Modifiers
---------------------+-----------------------------+-----------
 dln_code            | character varying(10)       | not null
 sun_legal_entity    | character varying(12)       | not null
 other_entity_name   | character varying(20)       |
 company_name        | character varying(80)       | not null
 certificate_date    | timestamp without time zone | not null
 certificate_type_id | smallint                    | not null
 description         | character varying(80)       | not null
 blanket_single      | character(1)                | not null
 notes               | character varying(4000)     |
 certificate_status  | character(1)                | not null
 approved_by         | character varying(30)       |
 approved_date       | timestamp without time zone |
 created_by          | character varying(30)       |
 created_date        | timestamp without time zone |
 updated_by          | character varying(30)       |
 updated_date        | timestamp without time zone |
Indexes:
    "pk_ecms_cert_headers" UNIQUE, btree (dln_code)
    "ecms_cert_headers_idx1" btree (certificate_type_id)
    "ecms_cert_headers_idx2" btree (company_name)
    "ecms_cert_headers_idx3" btree (description)
Foreign-key constraints:
    "sys_c003754" FOREIGN KEY (certificate_type_id) REFERENCES
ecms_certificate_types(certificate_type_id)

  Table "ecms.taxpack_user"
   Column   |         Type          | Modifiers
------------+-----------------------+-----------
 emp_no     | character varying(12) | not null
 name       | character varying(60) | not null
 manager_id | character varying(12) |
 dept_no    | character varying(12) |
 mailstop   | character varying(12) |
 phone      | character varying(60) |
 email      | character varying(60) |
 active     | character varying(3)  | not null
 admin      | smallint              | not null
 super_user | smallint              | not null


Merge Right Join  (cost=1757437.54..21072796.15 rows=643816513 width=874)
(actual time=27800.250..27800.256 rows=1 loops=1)
   Merge Cond: ("outer"."?column3?" = "inner"."?column17?")
   ->  Sort  (cost=30776.19..31207.80 rows=172645 width=64) (actual
time=12229.482..12791.468 rows=172645 loops=1)
         Sort Key: (e.emp_no)::text
         ->  Seq Scan on taxpack_user e  (cost=0.00..4898.45 rows=172645
width=64) (actual time=0.050..1901.218 rows=172645 loops=1)
   ->  Sort  (cost=1726661.35..1728525.92 rows=745827 width=859) (actual
time=12675.899..12675.901 rows=1 loops=1)
         Sort Key: (a.approved_by)::text
         ->  Merge Left Join  (cost=29219.87..40411.59 rows=745827 width=859)
(actual time=12675.815..12675.830 rows=1 loops=1)
               Merge Cond: ("outer"."?column18?" = "inner"."?column2?")
               ->  Sort  (cost=7106.77..7108.93 rows=864 width=892) (actual
time=1441.644..1441.646 rows=1 loops=1)
                     Sort Key: (a.updated_by)::text
                     ->  Nested Loop Left Join  (cost=0.00..7064.62 rows=864
width=892) (actual time=435.864..1441.465 rows=1 loops=1)
                           Join Filter: (("outer".created_by)::text =
("inner".emp_no)::text)
                           ->  Nested Loop Left Join  (cost=0.00..8.11 rows=1
width=877) (actual time=0.251..0.361 rows=1 loops=1)
                                 Join Filter: ("outer".certificate_type_id =
"inner".certificate_type_id)
                                 ->  Index Scan using pk_ecms_cert_headers on
ecms_cert_headers a  (cost=0.00..6.01 rows=1 width=829) (actual
time=0.113..0.136 rows=1 loops=1)
                                       Index Cond: ((dln_code)::text =
'17319'::text)
                                 ->  Seq Scan on ecms_certificate_types b
(cost=0.00..1.49 rows=49 width=50) (actual time=0.018..0.059 rows=10 loops=1)
                        ->  Seq Scan on taxpack_user c  (cost=0.00..4898.45
rows=172645 width=64) (actual time=0.014..674.881 rows=172645 loops=1)
               ->  Sort  (cost=22113.10..22544.71 rows=172645 width=16) (actual
time=10689.742..10885.155 rows=71665 loops=1)
                     Sort Key: (d.emp_no)::text
                     ->  Seq Scan on taxpack_user d  (cost=0.00..4898.45
rows=172645 width=16) (actual time=0.031..1791.036 rows=172645 loops=1)
 Total runtime: 27802.014 ms
(23 rows)



Michael Alan Dorman wrote On 11/09/05 12:45,:
> Ashok Agrawal <Ashok.Agrawal@Sun.COM> writes:
>
>>I noticed outer join is very very slow in postgresql as compared
>>to Oracle.
>
>
> I think the three things the people best able to help you are going to
> ask for are 1) what version of PostgreSQL, 2) what are the tables, and
> how many rows in each, and 3) output from 'explain analyze' rather
> than just 'explain'.
>
> That said, I'm willing to take an amateurish stab at it even without
> that.
>
> In fact, I don't think the outer joins are the issue at all.  I see
> that you're forcing a right join from ecms_certificate_types to
> ecms_cert_headers.  This seems to be causing postgresql to think it
> must (unnecessarily) consider three quarters of a billion rows, which,
> if I'm reading right, seems to be producing the majority of the
> estimated cost:
>
>
>> Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
>>   Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
>>   ->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 width=826)
>
>
> In fact, looking at the fact that you're doing a COALESCE on a column
> from b, it seems to me that doing a right join from ecms_cert_headers
> to ecms_certificate_types is just wrong.  It seems to me that that
> should be a left join as well.
>
> With that in mind, I would rewrite the whole FROM clause as:
>
>            FROM ecms_cert_headers a
> LEFT OUTER JOIN ecms_certificate_types b
>              ON (a.certificate_type_id = b.certificate_type_id)
> LEFT OUTER JOIN taxpack_user c
>              ON (a.created_by = c.emp_no)
> LEFT OUTER JOIN taxpack_user d
>              ON (a.updated_by = d.emp_no)
> LEFT OUTER JOIN taxpack_user e
>              ON (a.approved_by = e.emp_no)
>           WHERE a.dln_code = '17319'
>
> It seems to me that this more reflects the intent of the data that is
> being retrieved.  I would also expect it to be a boatload faster.
>
> Assuming I've understood the intent correctly, I would guess that the
> difference is the result of the Oracle planner being able to eliminate
> the right join or something.
>
> Mike
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information.  Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy
all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~