Re: Outer Join performance in PostgreSQL - Mailing list pgsql-performance

From Ashok Agrawal
Subject Re: Outer Join performance in PostgreSQL
Date
Msg-id 43738EBD.1090507@Sun.COM
Whole thread Raw
In response to Re: Outer Join performance in PostgreSQL  (Michael Alan Dorman <mdorman@tendentious.org>)
List pgsql-performance
Hello Michael,

Here is the information : I had executed explain analyze with modified
FROM clause.

Oops forgot to mention the version earlier.

Using postgres 8.0.0 on Solaris 9.

Rows Count :

cic=# select count(*) from taxpack_user;
 count
--------
 172645
(1 row)

cic=# select count(*) from ecms_certificate_types;
 count
-------
    10
(1 row)

cic=# select count(*) from ecms_cert_headers;
 count
-------
 17913
(1 row)

Table Information :

                  Table "ecms.ecms_certificate_types"
            Column            |            Type             | Modifiers
------------------------------+-----------------------------+-----------
 certificate_type_id          | smallint                    | not null
 certificate_type_description | character varying(60)       |
 created_by                   | character varying(30)       |
 created_date                 | timestamp without time zone |
 updated_by                   | character varying(30)       |
 updated_date                 | timestamp without time zone |
Indexes:
    "sys_c003733" PRIMARY KEY, btree (certificate_type_id)
    "pk_ecms_certificate_types" UNIQUE, btree (certificate_type_id)

 Table "ecms.ecms_cert_headers"
       Column        |            Type             | Modifiers
---------------------+-----------------------------+-----------
 dln_code            | character varying(10)       | not null
 sun_legal_entity    | character varying(12)       | not null
 other_entity_name   | character varying(20)       |
 company_name        | character varying(80)       | not null
 certificate_date    | timestamp without time zone | not null
 certificate_type_id | smallint                    | not null
 description         | character varying(80)       | not null
 blanket_single      | character(1)                | not null
 notes               | character varying(4000)     |
 certificate_status  | character(1)                | not null
 approved_by         | character varying(30)       |
 approved_date       | timestamp without time zone |
 created_by          | character varying(30)       |
 created_date        | timestamp without time zone |
 updated_by          | character varying(30)       |
 updated_date        | timestamp without time zone |
Indexes:
    "pk_ecms_cert_headers" UNIQUE, btree (dln_code)
    "ecms_cert_headers_idx1" btree (certificate_type_id)
    "ecms_cert_headers_idx2" btree (company_name)
    "ecms_cert_headers_idx3" btree (description)
Foreign-key constraints:
    "sys_c003754" FOREIGN KEY (certificate_type_id) REFERENCES
ecms_certificate_types(certificate_type_id)

  Table "ecms.taxpack_user"
   Column   |         Type          | Modifiers
------------+-----------------------+-----------
 emp_no     | character varying(12) | not null
 name       | character varying(60) | not null
 manager_id | character varying(12) |
 dept_no    | character varying(12) |
 mailstop   | character varying(12) |
 phone      | character varying(60) |
 email      | character varying(60) |
 active     | character varying(3)  | not null
 admin      | smallint              | not null
 super_user | smallint              | not null


Merge Right Join  (cost=1757437.54..21072796.15 rows=643816513 width=874)
(actual time=27800.250..27800.256 rows=1 loops=1)
   Merge Cond: ("outer"."?column3?" = "inner"."?column17?")
   ->  Sort  (cost=30776.19..31207.80 rows=172645 width=64) (actual
time=12229.482..12791.468 rows=172645 loops=1)
         Sort Key: (e.emp_no)::text
         ->  Seq Scan on taxpack_user e  (cost=0.00..4898.45 rows=172645
width=64) (actual time=0.050..1901.218 rows=172645 loops=1)
   ->  Sort  (cost=1726661.35..1728525.92 rows=745827 width=859) (actual
time=12675.899..12675.901 rows=1 loops=1)
         Sort Key: (a.approved_by)::text
         ->  Merge Left Join  (cost=29219.87..40411.59 rows=745827 width=859)
(actual time=12675.815..12675.830 rows=1 loops=1)
               Merge Cond: ("outer"."?column18?" = "inner"."?column2?")
               ->  Sort  (cost=7106.77..7108.93 rows=864 width=892) (actual
time=1441.644..1441.646 rows=1 loops=1)
                     Sort Key: (a.updated_by)::text
                     ->  Nested Loop Left Join  (cost=0.00..7064.62 rows=864
width=892) (actual time=435.864..1441.465 rows=1 loops=1)
                           Join Filter: (("outer".created_by)::text =
("inner".emp_no)::text)
                           ->  Nested Loop Left Join  (cost=0.00..8.11 rows=1
width=877) (actual time=0.251..0.361 rows=1 loops=1)
                                 Join Filter: ("outer".certificate_type_id =
"inner".certificate_type_id)
                                 ->  Index Scan using pk_ecms_cert_headers on
ecms_cert_headers a  (cost=0.00..6.01 rows=1 width=829) (actual
time=0.113..0.136 rows=1 loops=1)
                                       Index Cond: ((dln_code)::text =
'17319'::text)
                                 ->  Seq Scan on ecms_certificate_types b
(cost=0.00..1.49 rows=49 width=50) (actual time=0.018..0.059 rows=10 loops=1)
                        ->  Seq Scan on taxpack_user c  (cost=0.00..4898.45
rows=172645 width=64) (actual time=0.014..674.881 rows=172645 loops=1)
               ->  Sort  (cost=22113.10..22544.71 rows=172645 width=16) (actual
time=10689.742..10885.155 rows=71665 loops=1)
                     Sort Key: (d.emp_no)::text
                     ->  Seq Scan on taxpack_user d  (cost=0.00..4898.45
rows=172645 width=16) (actual time=0.031..1791.036 rows=172645 loops=1)
 Total runtime: 27802.014 ms
(23 rows)



Michael Alan Dorman wrote On 11/09/05 12:45,:
> Ashok Agrawal <Ashok.Agrawal@Sun.COM> writes:
>
>>I noticed outer join is very very slow in postgresql as compared
>>to Oracle.
>
>
> I think the three things the people best able to help you are going to
> ask for are 1) what version of PostgreSQL, 2) what are the tables, and
> how many rows in each, and 3) output from 'explain analyze' rather
> than just 'explain'.
>
> That said, I'm willing to take an amateurish stab at it even without
> that.
>
> In fact, I don't think the outer joins are the issue at all.  I see
> that you're forcing a right join from ecms_certificate_types to
> ecms_cert_headers.  This seems to be causing postgresql to think it
> must (unnecessarily) consider three quarters of a billion rows, which,
> if I'm reading right, seems to be producing the majority of the
> estimated cost:
>
>
>> Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
>>   Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
>>   ->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 width=826)
>
>
> In fact, looking at the fact that you're doing a COALESCE on a column
> from b, it seems to me that doing a right join from ecms_cert_headers
> to ecms_certificate_types is just wrong.  It seems to me that that
> should be a left join as well.
>
> With that in mind, I would rewrite the whole FROM clause as:
>
>            FROM ecms_cert_headers a
> LEFT OUTER JOIN ecms_certificate_types b
>              ON (a.certificate_type_id = b.certificate_type_id)
> LEFT OUTER JOIN taxpack_user c
>              ON (a.created_by = c.emp_no)
> LEFT OUTER JOIN taxpack_user d
>              ON (a.updated_by = d.emp_no)
> LEFT OUTER JOIN taxpack_user e
>              ON (a.approved_by = e.emp_no)
>           WHERE a.dln_code = '17319'
>
> It seems to me that this more reflects the intent of the data that is
> being retrieved.  I would also expect it to be a boatload faster.
>
> Assuming I've understood the intent correctly, I would guess that the
> difference is the result of the Oracle planner being able to eliminate
> the right join or something.
>
> Mike
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information.  Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy
all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index Scan Costs versus Sort
Next
From: Charlie Savage
Date:
Subject: Re: Index Scan Costs versus Sort