Re: Outer Join performance in PostgreSQL - Mailing list pgsql-performance
From | Ashok Agrawal |
---|---|
Subject | Re: Outer Join performance in PostgreSQL |
Date | |
Msg-id | 43738EBD.1090507@Sun.COM Whole thread Raw |
In response to | Re: Outer Join performance in PostgreSQL (Michael Alan Dorman <mdorman@tendentious.org>) |
List | pgsql-performance |
Hello Michael, Here is the information : I had executed explain analyze with modified FROM clause. Oops forgot to mention the version earlier. Using postgres 8.0.0 on Solaris 9. Rows Count : cic=# select count(*) from taxpack_user; count -------- 172645 (1 row) cic=# select count(*) from ecms_certificate_types; count ------- 10 (1 row) cic=# select count(*) from ecms_cert_headers; count ------- 17913 (1 row) Table Information : Table "ecms.ecms_certificate_types" Column | Type | Modifiers ------------------------------+-----------------------------+----------- certificate_type_id | smallint | not null certificate_type_description | character varying(60) | created_by | character varying(30) | created_date | timestamp without time zone | updated_by | character varying(30) | updated_date | timestamp without time zone | Indexes: "sys_c003733" PRIMARY KEY, btree (certificate_type_id) "pk_ecms_certificate_types" UNIQUE, btree (certificate_type_id) Table "ecms.ecms_cert_headers" Column | Type | Modifiers ---------------------+-----------------------------+----------- dln_code | character varying(10) | not null sun_legal_entity | character varying(12) | not null other_entity_name | character varying(20) | company_name | character varying(80) | not null certificate_date | timestamp without time zone | not null certificate_type_id | smallint | not null description | character varying(80) | not null blanket_single | character(1) | not null notes | character varying(4000) | certificate_status | character(1) | not null approved_by | character varying(30) | approved_date | timestamp without time zone | created_by | character varying(30) | created_date | timestamp without time zone | updated_by | character varying(30) | updated_date | timestamp without time zone | Indexes: "pk_ecms_cert_headers" UNIQUE, btree (dln_code) "ecms_cert_headers_idx1" btree (certificate_type_id) "ecms_cert_headers_idx2" btree (company_name) "ecms_cert_headers_idx3" btree (description) Foreign-key constraints: "sys_c003754" FOREIGN KEY (certificate_type_id) REFERENCES ecms_certificate_types(certificate_type_id) Table "ecms.taxpack_user" Column | Type | Modifiers ------------+-----------------------+----------- emp_no | character varying(12) | not null name | character varying(60) | not null manager_id | character varying(12) | dept_no | character varying(12) | mailstop | character varying(12) | phone | character varying(60) | email | character varying(60) | active | character varying(3) | not null admin | smallint | not null super_user | smallint | not null Merge Right Join (cost=1757437.54..21072796.15 rows=643816513 width=874) (actual time=27800.250..27800.256 rows=1 loops=1) Merge Cond: ("outer"."?column3?" = "inner"."?column17?") -> Sort (cost=30776.19..31207.80 rows=172645 width=64) (actual time=12229.482..12791.468 rows=172645 loops=1) Sort Key: (e.emp_no)::text -> Seq Scan on taxpack_user e (cost=0.00..4898.45 rows=172645 width=64) (actual time=0.050..1901.218 rows=172645 loops=1) -> Sort (cost=1726661.35..1728525.92 rows=745827 width=859) (actual time=12675.899..12675.901 rows=1 loops=1) Sort Key: (a.approved_by)::text -> Merge Left Join (cost=29219.87..40411.59 rows=745827 width=859) (actual time=12675.815..12675.830 rows=1 loops=1) Merge Cond: ("outer"."?column18?" = "inner"."?column2?") -> Sort (cost=7106.77..7108.93 rows=864 width=892) (actual time=1441.644..1441.646 rows=1 loops=1) Sort Key: (a.updated_by)::text -> Nested Loop Left Join (cost=0.00..7064.62 rows=864 width=892) (actual time=435.864..1441.465 rows=1 loops=1) Join Filter: (("outer".created_by)::text = ("inner".emp_no)::text) -> Nested Loop Left Join (cost=0.00..8.11 rows=1 width=877) (actual time=0.251..0.361 rows=1 loops=1) Join Filter: ("outer".certificate_type_id = "inner".certificate_type_id) -> Index Scan using pk_ecms_cert_headers on ecms_cert_headers a (cost=0.00..6.01 rows=1 width=829) (actual time=0.113..0.136 rows=1 loops=1) Index Cond: ((dln_code)::text = '17319'::text) -> Seq Scan on ecms_certificate_types b (cost=0.00..1.49 rows=49 width=50) (actual time=0.018..0.059 rows=10 loops=1) -> Seq Scan on taxpack_user c (cost=0.00..4898.45 rows=172645 width=64) (actual time=0.014..674.881 rows=172645 loops=1) -> Sort (cost=22113.10..22544.71 rows=172645 width=16) (actual time=10689.742..10885.155 rows=71665 loops=1) Sort Key: (d.emp_no)::text -> Seq Scan on taxpack_user d (cost=0.00..4898.45 rows=172645 width=16) (actual time=0.031..1791.036 rows=172645 loops=1) Total runtime: 27802.014 ms (23 rows) Michael Alan Dorman wrote On 11/09/05 12:45,: > Ashok Agrawal <Ashok.Agrawal@Sun.COM> writes: > >>I noticed outer join is very very slow in postgresql as compared >>to Oracle. > > > I think the three things the people best able to help you are going to > ask for are 1) what version of PostgreSQL, 2) what are the tables, and > how many rows in each, and 3) output from 'explain analyze' rather > than just 'explain'. > > That said, I'm willing to take an amateurish stab at it even without > that. > > In fact, I don't think the outer joins are the issue at all. I see > that you're forcing a right join from ecms_certificate_types to > ecms_cert_headers. This seems to be causing postgresql to think it > must (unnecessarily) consider three quarters of a billion rows, which, > if I'm reading right, seems to be producing the majority of the > estimated cost: > > >> Hash Join (cost=1666049.74..18486619.37 rows=157735046 width=874) >> Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id) >> -> Merge Right Join (cost=1666048.13..11324159.05 rows=643816513 width=826) > > > In fact, looking at the fact that you're doing a COALESCE on a column > from b, it seems to me that doing a right join from ecms_cert_headers > to ecms_certificate_types is just wrong. It seems to me that that > should be a left join as well. > > With that in mind, I would rewrite the whole FROM clause as: > > FROM ecms_cert_headers a > LEFT OUTER JOIN ecms_certificate_types b > ON (a.certificate_type_id = b.certificate_type_id) > LEFT OUTER JOIN taxpack_user c > ON (a.created_by = c.emp_no) > LEFT OUTER JOIN taxpack_user d > ON (a.updated_by = d.emp_no) > LEFT OUTER JOIN taxpack_user e > ON (a.approved_by = e.emp_no) > WHERE a.dln_code = '17319' > > It seems to me that this more reflects the intent of the data that is > being retrieved. I would also expect it to be a boatload faster. > > Assuming I've understood the intent correctly, I would guess that the > difference is the result of the Oracle planner being able to eliminate > the right join or something. > > Mike > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgsql-performance by date: