Thread: left outer join terrible slow compared to inner join
Hi, i've a speed problem withe the following statement: SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ot_adresse AS a, ot_produkt AS p LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; This is terrible slow compared to the inner join: SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ot_adresse AS a, ot_produkt AS p, ot_kat_prod AS pz WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37' AND p.p_id = pz.p_id; These are the EXPLAIN ANALYZE output of both statements on postgres 7.2.4: EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ot_adresse AS a, ot_produkt AS p LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; NOTICE: QUERY PLAN: Unique (cost=22061.28..22061.30 rows=1 width=272) (actual time=13332.01..13332.97 rows=11 loops=1) -> Sort (cost=22061.28..22061.28 rows=2 width=272) (actual time=13332.00..13332.03 rows=46 loops=1) -> Nested Loop (cost=21627.92..22061.27 rows=2 width=272) (actual time=13303.51..13328.98 rows=46 loops=1) -> Index Scan using o_adresse_id_uidx on o_adresse (cost=0.00..5.96 rows=1 width=34) (actual time=0.16..0.19rows=1 loops=1) -> Materialize (cost=21900.98..21900.98 rows=12347 width=238) (actual time=13071.53..13111.92 rows=51394loops=1) -> Merge Join (cost=21627.92..21900.98 rows=12347 width=238) (actual time=11724.45..12908.46 rows=51394loops=1) -> Sort (cost=16815.61..16815.61 rows=6640 width=68) (actual time=4283.02..4307.07 rows=26049loops=1) -> Seq Scan on o_produkt (cost=0.00..16394.06 rows=6640 width=68) (actual time=0.06..1126.96rows=26049 loops=1) -> Sort (cost=4812.31..4812.31 rows=40851 width=170) (actual time=7441.36..7481.73 rows=51521loops=1) -> Subquery Scan pz (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.14..1161.81rows=40896 loops=1) -> Seq Scan on o_kat_prod (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.13..419.07rows=40896 loops=1) Total runtime: 13377.02 msec EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ot_adresse AS a, ot_produkt AS p, ot_kat_prodAS pz WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37' AND p.p_id = pz.p_id; NOTICE: QUERY PLAN: Unique (cost=41.29..41.31 rows=1 width=272) (actual time=6.67..7.64 rows=11 loops=1) -> Sort (cost=41.29..41.29 rows=2 width=272) (actual time=6.67..6.71 rows=46 loops=1) -> Nested Loop (cost=0.00..41.28 rows=2 width=272) (actual time=0.68..3.73 rows=46 loops=1) -> Nested Loop (cost=0.00..23.80 rows=1 width=102) (actual time=0.46..0.87 rows=11 loops=1) -> Index Scan using o_adresse_id_uidx on o_adresse (cost=0.00..5.96 rows=1 width=34) (actual time=0.16..0.17rows=1 loops=1) -> Index Scan using o_produkt_a_id_idx on o_produkt (cost=0.00..17.83 rows=1 width=68) (actual time=0.29..0.65rows=11 loops=1) -> Index Scan using o_kat_prod_p_id_idx on o_kat_prod (cost=0.00..17.42 rows=5 width=170) (actual time=0.16..0.24rows=4 loops=11) Total runtime: 7.96 msec Do i've any chance to get the indexes used in the OUTER JOIN? Thanks for any hints! -tb -- Thomas Beutin tb@laokoon.IN-Berlin.DE Beam me up, Scotty. There is no intelligent live down in Redmond.
Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: > i've a speed problem withe the following statement: > SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id > FROM ot_adresse AS a, ot_produkt AS p > LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) > WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; In 7.2 (and 7.3), this syntax forces the planner to join ot_produkt to ot_kat_prod first, which is terribly inefficient because the WHERE constraints don't constrain that join at all. You could work around this by writing instead FROM (ot_adresse AS a CROSS JOIN ot_produkt AS p) LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; See http://www.postgresql.org/docs/7.3/static/explicit-joins.html for discussion. (Note: 7.4 will be less rigid about this issue.) regards, tom lane
On Thu, Aug 28, 2003 at 11:42:00AM -0400, Tom Lane wrote: > Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: > > i've a speed problem withe the following statement: > > SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id > > FROM ot_adresse AS a, ot_produkt AS p > > LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) > > WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; > > In 7.2 (and 7.3), this syntax forces the planner to join ot_produkt to > ot_kat_prod first, which is terribly inefficient because the WHERE > constraints don't constrain that join at all. You could work around > this by writing instead > > FROM (ot_adresse AS a CROSS JOIN ot_produkt AS p) > LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) > WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; > Thanks for the suggestion, but the result is close to the original outer join without the explicit cross join but far away from the speed of the inner join. This uses the index o_produkt_a_id_idx on o_produkt, but the index o_kat_prod_p_id_idx on o_kat_prod is still not used: EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ( ot_adresse AS a CROSS JOIN ot_produkt AS p)LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id= '37'; NOTICE: QUERY PLAN: Unique (cost=2217.96..2217.98 rows=1 width=272) (actual time=6776.21..6777.17 rows=11 loops=1) -> Sort (cost=2217.96..2217.96 rows=2 width=272) (actual time=6776.20..6776.24 rows=46 loops=1) -> Nested Loop (cost=0.00..2217.95 rows=2 width=272) (actual time=721.82..6773.09 rows=46 loops=1) -> Nested Loop (cost=0.00..23.80 rows=1 width=102) (actual time=0.69..1.74 rows=11 loops=1) -> Index Scan using o_adresse_id_uidx on o_adresse (cost=0.00..5.96 rows=1 width=34) (actual time=0.29..0.31rows=1 loops=1) -> Index Scan using o_produkt_a_id_idx on o_produkt (cost=0.00..17.83 rows=1 width=68) (actual time=0.38..1.31rows=11 loops=1) -> Subquery Scan pz (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.03..485.07 rows=40917 loops=11) -> Seq Scan on o_kat_prod (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.02..281.77 rows=40917loops=11) Total runtime: 6777.55 msec Is there any chance to use an index on the joined table o_kat_prod? Thanks for any hints! -tb -- Thomas Beutin tb@laokoon.IN-Berlin.DE Beam me up, Scotty. There is no intelligent live down in Redmond.
Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: > Thanks for the suggestion, but the result is close to the original outer > join without the explicit cross join but far away from the speed of the > inner join. > EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ( ot_adresse AS a CROSS JOIN ot_produkt ASp) LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id= '37'; > NOTICE: QUERY PLAN: > -> Subquery Scan pz (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.03..485.07 rows=40917 loops=11) > -> Seq Scan on o_kat_prod (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.02..281.77 rows=40917loops=11) Hmm, I don't understand why ot_kat_prod is being treated as a subquery here. It isn't a view or something is it? regards, tom lane
Actually, I was about to post some problems we have with large left outer joins as well we've discovered in a porting projectfrom NT/SQL Server -> Linux/Postgres. We have a particular query that is rather large, left outer joining across several tables. Under SQL Server, with identicaldata and schema, this particular query takes 2 seconds. Under PostgreSQL, this same query takes 90 seconds -- that's right, 90 seconds. 45x longer than SQL Server. This was quitea shock to us (we'd not seen such a performance deficit between the two dbs until this) and could, in fact, force usaway from Postgres. I'd be happy to forward the explain to anyone who'd care to look at it... cwl > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Thursday, August 28, 2003 1:10 PM > To: Thomas Beutin > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] left outer join terrible slow compared to inner > join > > > Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: > > Thanks for the suggestion, but the result is close to the > original outer > > join without the explicit cross join but far away from the > speed of the > > inner join. > > > EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, > pz.l3_id, pz.l4_id FROM ( ot_adresse AS a CROSS JOIN > ot_produkt AS p) LEFT OUTER JOIN ot_kat_prod AS pz ON ( > p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = > '105391105424941' AND a.m_id = '37'; > > NOTICE: QUERY PLAN: > > > -> Subquery Scan pz (cost=0.00..1683.51 > rows=40851 width=170) (actual time=0.03..485.07 rows=40917 loops=11) > > -> Seq Scan on o_kat_prod > (cost=0.00..1683.51 rows=40851 width=170) (actual > time=0.02..281.77 rows=40917 loops=11) > > Hmm, I don't understand why ot_kat_prod is being treated as a subquery > here. It isn't a view or something is it? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
> Actually, I was about to post some problems we have with large left > outer joins as well we've discovered in a porting project from > NT/SQL Server -> Linux/Postgres. > > We have a particular query that is rather large, left outer joining > across several tables. Under SQL Server, with identical data and > schema, this particular query takes 2 seconds. > > Under PostgreSQL, this same query takes 90 seconds -- that's right, > 90 seconds. 45x longer than SQL Server. This was quite a shock to > us (we'd not seen such a performance deficit between the two dbs > until this) and could, in fact, force us away from Postgres. > > I'd be happy to forward the explain to anyone who'd care to look at > it... Post an EXPLAIN ANALYZE of the query... -sc -- Sean Chittenden
Interstingly enough, the EXPLAIN ANALYZE itself took 90+ seconds: claycle@gkar:.../PhoneInserter/PhoneInserter > time psql -f e_stationd.sql ccm > analyzed.txt real 1m46.770s user 0m0.010s sys 0m0.010s Here is the output of the above explain analyze execution: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=17157472946.49..17157472949.56 rows=1229 width=2066) (actual time=106513.59..106515.67 rows=1780 loops=1) Sort Key: d.name -> Nested Loop (cost=1.15..17157472883.44 rows=1229 width=2066) (actual time=2724.18..106407.50 rows=1780 loops=1) Join Filter: ("outer".fkcallingsearchspace = "inner".pkid) -> Nested Loop (cost=1.15..17157470797.64 rows=1229 width=1976) (actual time=2723.56..104804.55 rows=1780 loops=1) Join Filter: (("inner".tkservice = 11) AND ("inner".paramname = 'DefaultNetworkLocale'::character varying)) -> Nested Loop (cost=1.15..17157452120.74 rows=1229 width=1941) (actual time=2721.32..97560.43 rows=1780loops=1) Join Filter: (("inner".tkservice = 11) AND ("inner".paramname = 'DefaultUserLocale'::character varying)) -> Nested Loop (cost=1.15..17157433443.83 rows=1229 width=1906) (actual time=2719.93..90288.85 rows=1780loops=1) Join Filter: (("inner".tkservice = 11) AND ("inner".paramname = 'MLPPDomainIdentifier'::charactervarying)) -> Nested Loop (cost=1.15..17157414766.93 rows=1229 width=1871) (actual time=2717.94..83141.71rows=1780 loops=1) Join Filter: (("inner".tkservice = 11) AND ("inner".paramname = 'MLPPPreemptionSetting'::charactervarying)) -> Nested Loop (cost=1.15..17157396090.02 rows=1229 width=1836) (actual time=2714.90..76002.08rows=1780 loops=1) Join Filter: (("inner".tkservice = 11) AND ("inner".paramname = 'MLPPIndicationStatus'::charactervarying)) -> Nested Loop (cost=1.15..17157377413.11 rows=1229 width=1801) (actual time=2712.05..68810.87rows=1780 loops=1) Join Filter: ((("outer".fkvoicemessagingpilot IS NULL) OR ("outer".fkvoicemessagingpilot= "inner".pkid)) AND (("inner".isdefault = true) OR ("outer".fkvoicemessagingpilot = "inner".pkid))) -> Nested Loop (cost=1.15..17157376083.00 rows=1229 width=1759) (actual time=2711.44..68020.45rows=1780 loops=1) Join Filter: ((("outer".fkvoicemessagingprofile IS NULL) OR ("outer".fkvoicemessagingprofile= "inner".pkid)) AND (("inner".isdefault = true) OR ("outer".fkvoicemessagingprofile = "inner".pkid))) -> Nested Loop (cost=1.15..17157375032.94 rows=946 width=1722) (actualtime=2710.83..67240.60 rows=1780 loops=1) Join Filter: ("outer".fkroutefilter = "inner".pkid) -> Nested Loop (cost=1.15..17157373703.81 rows=946 width=1623)(actual time=2710.21..66328.33 rows=1780 loops=1) Join Filter: ("outer".fkdialplan = "inner".pkid) -> Nested Loop (cost=1.15..17157372736.52 rows=946 width=1599)(actual time=2709.74..65695.13 rows=1780 loops=1) Join Filter: ("outer".fkdigitdiscardinstruction = "inner".pkid) -> Nested Loop (cost=1.15..17157370184.69 rows=946width=1579) (actual time=2709.01..64628.50 rows=1780 loops=1) Join Filter: ("outer".fkcallmanager = "inner".pkid) -> Nested Loop (cost=1.15..17157369196.12 rows=946width=1563) (actual time=2708.58..63948.35 rows=1780 loops=1) Join Filter: ("inner".fkcallmanagergroup= "outer".pkid) -> Nested Loop (cost=1.15..17157368659.26rows=473 width=1531) (actual time=2708.13..63271.44 rows=890 loops=1) Join Filter: ("outer".fkcallmanagergroup= "inner".pkid) -> Nested Loop (cost=1.15..17157368154.34rows=473 width=1515) (actual time=2707.69..62936.58 rows=890 loops=1) Join Filter: ("outer".fkmediaresourcelist= "inner".pkid) -> Nested Loop (cost=1.15..17157367660.05rows=473 width=1499) (actual time=2707.29..62613.05 rows=890 loops=1) Join Filter: ("outer".fkdevicepool= "inner".pkid) -> Nested Loop (cost=1.15..17157366235.14rows=473 width=1431) (actual time=2706.54..61664.68 rows=890 loops=1) Join Filter: ("outer".fkcallingsearchspace_translation= "inner".pkid) -> Nested Loop (cost=1.15..17157365432.22 rows=473 width=1341) (actual time=2705.92..61180.46 rows=890 loops=1) Join Filter:("outer".fkcallingsearchspace_sharedlineappear = "inner".pkid) -> NestedLoop (cost=1.15..17157364629.30 rows=473 width=1251) (actual time=2705.19..60567.86 rows=890 loops=1) JoinFilter: ("outer".fkroutepartition = "inner".pkid) -> Nested Loop (cost=1.15..17157363751.89 rows=473 width=1221) (actual time=2704.59..59883.81 rows=890 loops=1) Join Filter: ("outer".fknumplan = "inner".pkid) -> Nested Loop (cost=1.15..17157345954.08 rows=473 width=605) (actual time=2691.46..32912.02 rows=890 loops=1) Join Filter: ("outer".pkid = "inner".fkdevice) -> Nested Loop (cost=1.15..17157317358.59 rows=462 width=504) (actual time=2657.87..5239.00 rows=808 loops=1) Join Filter: ("outer".fkmediaresourcelist = "inner".pkid) -> Hash Join (cost=1.15..17157316875.80 rows=462 width=472) (actual time=2657.66..5139.98 rows=808 loops=1) Hash Cond: ("outer".tkdeviceprotocol = "inner".enum) -> Nested Loop (cost=0.00..17157316866.57 rows=462 width=452) (actual time=2656.96..5062.74 rows=808loops=1) Join Filter: ("outer".fklocation = "inner".pkid) -> Nested Loop (cost=0.00..17157316860.79 rows=462 width=432) (actual time=2656.88..4992.47 rows=808loops=1) Join Filter: ("outer".fkcallingsearchspace_aar = "inner".pkid) -> Nested Loop (cost=0.00..17157316076.55 rows=462 width=342) (actual time=2656.46..4706.90rows=808 loops=1) Join Filter: ("outer".fkcallingsearchspace = "inner".pkid) -> Seq Scan on device d (cost=0.00..17157315292.30 rows=462 width=252) (actual time=2655.97..4380.22rows=808 loops=1) Filter: ((tkdeviceprofile = 0) AND (subplan)) SubPlan -> Materialize (cost=12307973.61..12307973.61 rows=40 width=145) (actualtime=2.87..3.44 rows=455 loops=924) -> Unique (cost=0.00..12307973.61 rows=40 width=145) (actual time=803.51..2650.40rows=808 loops=1) -> Nested Loop (cost=0.00..12307972.62 rows=396 width=145) (actualtime=803.50..2645.21 rows=808 loops=1) Join Filter: ("outer".fkcallmanager = "inner".pkid) Filter: ("inner".name = 'DAL-CCM2'::character varying) -> Nested Loop (cost=0.00..12307556.90 rows=396 width=116)(actual time=803.38..2603.11 rows=808 loops=1) Join Filter: ("inner".fkcallmanagergroup = "outer".pkid) Filter: ("inner".priority = 0) -> Nested Loop (cost=0.00..12307101.59 rows=396 width=80)(actual time=803.23..2540.22 rows=808 loops=1) Join Filter: ("outer".fkcallmanagergroup = "inner".pkid) -> Nested Loop (cost=0.00..12306678.94 rows=396width=64) (actual time=803.12..2505.12 rows=808 loops=1) Join Filter: ("outer".fkdevicepool = "inner".pkid) -> Index Scan using device_pkey on deviced (cost=0.00..12305486.23 rows=396 width=32) (actual time=802.92..2160.98 rows=808 loops=1) Filter: (((tkclass = 1) OR (tkclass= 10)) AND (subplan)) SubPlan -> Materialize (cost=8827.26..8827.26rows=1394 width=145) (actual time=0.99..1.55 rows=451 loops=810) -> Nested Loop (cost=0.00..8827.26rows=1394 width=145) (actual time=1.44..799.37 rows=898 loops=1) Join Filter: ("outer".fkcallmanager= "inner".pkid) Filter: ("inner".name= 'DAL-SJ-CCM3'::character varying) -> Nested Loop (cost=0.00..7363.56rows=1394 width=116) (actual time=1.30..750.11 rows=898 loops=1) Join Filter:("inner".fkcallmanagergroup = "outer".pkid) Filter: ("inner".priority<> 0) -> Nested Loop (cost=0.00..5760.46 rows=1394 width=80) (actual time=0.76..655.61 rows=1394 loops=1) Join Filter:("outer".fkcallmanagergroup = "inner".pkid) -> NestedLoop (cost=0.00..4272.37 rows=1394 width=64) (actual time=0.62..597.83 rows=1394 loops=1) JoinFilter: ("outer".fkdevicepool = "inner".pkid) -> Seq Scan on device d (cost=0.00..72.94 rows=1394 width=32) (actual time=0.02..11.42 rows=1394 loops=1) -> Seq Scan on devicepool dp (cost=0.00..2.45 rows=45 width=32) (actual time=0.01..0.28 rows=45 loops=1394) -> SeqScan on callmanagergroup cmg (cost=0.00..1.03 rows=3 width=16) (actual time=0.01..0.02 rows=3 loops=1394) -> Seq Scanon callmanagergroupmember gm (cost=0.00..1.06 rows=6 width=36) (actual time=0.01..0.04 rows=6 loops=1394) -> Seq Scan on callmanagercm (cost=0.00..1.02 rows=2 width=29) (actual time=0.01..0.02 rows=2 loops=898) -> Seq Scan on devicepool dp (cost=0.00..2.45rows=45 width=32) (actual time=0.01..0.28 rows=45 loops=808) -> Seq Scan on callmanagergroup cmg (cost=0.00..1.03rows=3 width=16) (actual time=0.01..0.02 rows=3 loops=808) -> Seq Scan on callmanagergroupmember gm (cost=0.00..1.06rows=6 width=36) (actual time=0.01..0.04 rows=6 loops=808) -> Seq Scan on callmanager cm (cost=0.00..1.02 rows=2 width=29)(actual time=0.01..0.02 rows=2 loops=808) -> Seq Scan on callingsearchspace cs (cost=0.00..1.31 rows=31 width=90) (actual time=0.01..0.21rows=31 loops=808) -> Seq Scan on callingsearchspace aarcs (cost=0.00..1.31 rows=31 width=90) (actual time=0.01..0.18rows=31 loops=808) -> Seq Scan on "location" l (cost=0.00..0.00 rows=1 width=20) (actual time=0.00..0.00 rows=0loops=808) -> Hash (cost=1.12..1.12 rows=12 width=20) (actual time=0.16..0.16 rows=0 loops=1) -> Seq Scan on typedeviceprotocol p (cost=0.00..1.12 rows=12 width=20) (actual time=0.05..0.11rows=12 loops=1) -> Seq Scan on mediaresourcelist ol (cost=0.00..1.02 rows=2 width=32) (actual time=0.01..0.02 rows=2 loops=808) -> Seq Scan on devicenumplanmap dnp (cost=0.00..43.62 rows=1462 width=101) (actual time=0.03..27.03 rows=1462 loops=808) -> Seq Scan on numplan np (cost=0.00..28.39 rows=739 width=616) (actual time=0.05..23.73 rows=739 loops=890) -> Seq Scan on routepartition rp (cost=0.00..1.38 rows=38 width=30) (actual time=0.02..0.24 rows=38 loops=890) -> Seq Scanon callingsearchspace css (cost=0.00..1.31 rows=31 width=90) (actual time=0.01..0.21 rows=31 loops=890) -> Seq Scan oncallingsearchspace cst (cost=0.00..1.31 rows=31 width=90) (actual time=0.01..0.19 rows=31 loops=890) -> Seq Scan on devicepooldp (cost=0.00..2.45 rows=45 width=68) (actual time=0.03..0.61 rows=45 loops=890) -> Seq Scan on mediaresourcelistdl (cost=0.00..1.02 rows=2 width=16) (actual time=0.01..0.02 rows=2 loops=890) -> Seq Scan on callmanagergroup cmg (cost=0.00..1.03 rows=3 width=16) (actual time=0.01..0.03 rows=3 loops=890) -> Seq Scan on callmanagergroupmember gm (cost=0.00..1.06 rows=6 width=32) (actual time=0.01..0.05 rows=6 loops=890) -> Seq Scan on callmanager cm (cost=0.00..1.02rows=2 width=16) (actual time=0.01..0.02 rows=2 loops=1780) -> Seq Scan on digitdiscardinstruction ddi (cost=0.00..2.31rows=31 width=20) (actual time=0.01..0.18 rows=31 loops=1780) -> Seq Scan on dialplan dlp (cost=0.00..1.01 rows=1 width=24)(actual time=0.01..0.02 rows=1 loops=1780) -> Seq Scan on routefilter rf (cost=0.00..1.18 rows=18 width=99)(actual time=0.01..0.12 rows=18 loops=1780) -> Seq Scan on voicemessagingprofile vm (cost=0.00..1.04 rows=4 width=37)(actual time=0.02..0.05 rows=4 loops=1780) -> Seq Scan on voicemessagingpilot vmp (cost=0.00..1.03 rows=3 width=42) (actualtime=0.02..0.04 rows=3 loops=1780) -> Seq Scan on processconfig pc (cost=0.00..9.68 rows=368 width=35) (actual time=0.02..2.49rows=368 loops=1780) -> Seq Scan on processconfig pcc (cost=0.00..9.68 rows=368 width=35) (actual time=0.02..2.45rows=368 loops=1780) -> Seq Scan on processconfig pcd (cost=0.00..9.68 rows=368 width=35) (actual time=0.02..2.42rows=368 loops=1780) -> Seq Scan on processconfig pcf (cost=0.00..9.68 rows=368 width=35) (actual time=0.02..2.45 rows=368loops=1780) -> Seq Scan on processconfig pcg (cost=0.00..9.68 rows=368 width=35) (actual time=0.02..2.43 rows=368 loops=1780) -> Seq Scan on callingsearchspace cssv (cost=0.00..1.31 rows=31 width=90) (actual time=0.02..0.24 rows=31 loops=1780) Total runtime: 106530.61 msec (120 rows) > -----Original Message----- > From: Sean Chittenden [mailto:sean@chittenden.org] > Sent: Thursday, August 28, 2003 1:32 PM > To: Clay Luther > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] left outer join terrible slow compared to inner > join > > > > Actually, I was about to post some problems we have with large left > > outer joins as well we've discovered in a porting project from > > NT/SQL Server -> Linux/Postgres. > > > > We have a particular query that is rather large, left outer joining > > across several tables. Under SQL Server, with identical data and > > schema, this particular query takes 2 seconds. > > > > Under PostgreSQL, this same query takes 90 seconds -- that's right, > > 90 seconds. 45x longer than SQL Server. This was quite a shock to > > us (we'd not seen such a performance deficit between the two dbs > > until this) and could, in fact, force us away from Postgres. > > > > I'd be happy to forward the explain to anyone who'd care to look at > > it... > > Post an EXPLAIN ANALYZE of the query... > > -sc > > -- > Sean Chittenden >
* Clay Luther (claycle@cisco.com) wrote: > I'd be happy to forward the explain to anyone who'd care to look at it... Could you try 7.4b1? It sounds like maybe things are better there from what Tom was saying for left outer joins.. Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Clay Luther (claycle@cisco.com) wrote: >> I'd be happy to forward the explain to anyone who'd care to look at it... > Could you try 7.4b1? It sounds like maybe things are better there from > what Tom was saying for left outer joins.. If it's not better in 7.4, please send me the details. regards, tom lane
On Thu, Aug 28, 2003 at 02:10:15PM -0400, Tom Lane wrote: > Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: > > Thanks for the suggestion, but the result is close to the original outer > > join without the explicit cross join but far away from the speed of the > > inner join. > > > EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ( ot_adresse AS a CROSS JOIN ot_produkt ASp) LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id= '37'; > > NOTICE: QUERY PLAN: > > > -> Subquery Scan pz (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.03..485.07 rows=40917 loops=11) > > -> Seq Scan on o_kat_prod (cost=0.00..1683.51 rows=40851 width=170) (actual time=0.02..281.77 rows=40917loops=11) > > Hmm, I don't understand why ot_kat_prod is being treated as a subquery > here. It isn't a view or something is it? Sorry, You're correct, actually this is a view of an other table called o_kat_prod defined as SELECT * FROM o_kat_prod; but supposed to be in my next step something like SELECT * FROM o_kat_prod WHERE <a_single_field> IS NOT NULL; And doing the explicit cross join statement on o_kat_prod instead of ot_kat_prod gives the expected performance to me ( 7.42 msec instead of 7324.49 msec with EXPLAIN ANALYZE). Do i've any chance to get the same performance on the view? Thanks for any help! -tb -- Thomas Beutin tb@laokoon.IN-Berlin.DE Beam me up, Scotty. There is no intelligent live down in Redmond.
Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: >> Hmm, I don't understand why ot_kat_prod is being treated as a subquery >> here. It isn't a view or something is it? > Sorry, You're correct, actually this is a view of an other table called > o_kat_prod defined as > SELECT * FROM o_kat_prod; > but supposed to be in my next step something like > SELECT * FROM o_kat_prod WHERE <a_single_field> IS NOT NULL; Still curious, since I'd expect 7.2 to flatten a simple view into the upper query. > Do i've any chance to get the same performance on the view? Try 7.3, it may be better. Or consider 7.4 beta. regards, tom lane
"Clay Luther" <claycle@cisco.com> writes: > Interstingly enough, the EXPLAIN ANALYZE itself took 90+ seconds: "explain" would produce just the plan, normally in a few ms. This query might take a while though. "explain analyze" says to produce the plan and then actually run the query and annotate the plan with the actual timing results at each node. Note the "actual time" labels on each row. So it's not suprising that it took 90s. Now, uh, there are 37 tables involved in this query. That's kind of a lot. Like, really, a lot. It's possible this is a sane, if extremely normalized design, but well, still. 37 is a big number. Postgres has to consider 37 factorial different ways of combining these tables. or about 13,763,750,000,000,000,000,000,000,000,000,000,000,000,000 different combinations. That makes it harder for it to come up with the best combination. You might consider rewriting it to use the ANSI join syntax "LEFT JOIN" and "RIGHT JOIN" if you haven't already. That might help it out. That said. My first guess as to the real problem. Of the 37 tables 36 of them aren't being accessed using indexes. Do you have indexes on the join columns? Perhaps you should? Postgres performs better when it has indexes. I'm a bit puzzled how one could get up to 37 tables in a single query other than just having taken normalization a bit too far. But if that was the thinking then I would expect the joins to be on the primary keys of all the tables, which would presumably have indexes. So, well, I guess I'll just stay puzzled. -- greg
"Clay Luther" <claycle@cisco.com> writes: > Here is the output of the above explain analyze execution: Given the presence of this sort of thing: > -> Nested Loop (cost=1.15..17157470797.64 rows=1229 width=1976) (actual time=2723.56..104804.55 rows=1780 loops=1) > Join Filter: (("inner".tkservice = 11) AND ("inner".paramname = 'DefaultNetworkLocale'::character varying)) I think you are running into 7.3's inability to determine that certain outer joins can be simplified to regular joins (specifically, if there is a strict operator above the OUTER JOIN that will reject null-extended rows, then there's no need to generate null-extended rows at all). 7.4 should do better. regards, tom lane
You can easily get 37 tables if you use a lot of views, especially those involving unions and unions of views or joins of views. Jon
> That said. My first guess as to the real problem. Of the 37 > tables 36 of them > aren't being accessed using indexes. Do you have indexes on > the join columns? > Perhaps you should? Postgres performs better when it has indexe Heh...well, first let me say: 1) Our database is highly normalized. 2) All joins in the query are performed across indeces. 3) It IS a huge query. There is a reason to the madness. Namely, this query was driven by a client application requirement. In a particular operationalcase, the application needed to be able to say "give me all this information now!" without making round tripsto the database. The query itself has grown over the years (last time I looked at it, it was only 24 joins, not 37). But, as I said before, 1) It works 2) It works VERY fast (in SQLServer) 3) It works in production and has for years now When I looked at the query for the first time in years last week, here is an email I sent to other people on my team: --snip-- Here's the ugly one, the station-d statement: How or why we ever wrote such a monster is beyond me :-) --snip-- But, it exists, it works, and it currently not scheduled to be changed. Such is the nature of inertia. :-) cwl > -----Original Message----- > From: Greg Stark [mailto:gsstark@mit.edu] > Sent: Thursday, August 28, 2003 4:32 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] left outer join terrible slow compared to inner > join > > > "Clay Luther" <claycle@cisco.com> writes: > > > Interstingly enough, the EXPLAIN ANALYZE itself took 90+ seconds: > > "explain" would produce just the plan, normally in a few ms. > This query might > take a while though. > > "explain analyze" says to produce the plan and then actually > run the query and > annotate the plan with the actual timing results at each > node. Note the > "actual time" labels on each row. So it's not suprising that > it took 90s. > > > Now, uh, there are 37 tables involved in this query. That's > kind of a lot. > Like, really, a lot. It's possible this is a sane, if > extremely normalized > design, but well, still. 37 is a big number. > > Postgres has to consider 37 factorial different ways of > combining these > tables. or about > 13,763,750,000,000,000,000,000,000,000,000,000,000,000,000 > different combinations. That makes it harder for it to come > up with the best > combination. You might consider rewriting it to use the ANSI > join syntax "LEFT > JOIN" and "RIGHT JOIN" if you haven't already. That might help it out. > > > That said. My first guess as to the real problem. Of the 37 > tables 36 of them > aren't being accessed using indexes. Do you have indexes on > the join columns? > Perhaps you should? Postgres performs better when it has indexes. > > > I'm a bit puzzled how one could get up to 37 tables in a > single query other > than just having taken normalization a bit too far. But if > that was the > thinking then I would expect the joins to be on the primary > keys of all the > tables, which would presumably have indexes. So, well, I > guess I'll just stay > puzzled. > > -- > greg > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org >
Clay Luther wrote: > Heh...well, first let me say: > > 1) Our database is highly normalized. Excellent. When faced with the choice of ensuring integrity myself in the face of redundancy vs. Tom Lane's ability to improve the planner, optimizer, and executor, I always vote for the latter! > 2) All joins in the query are performed across indeces. > 3) It IS a huge query. > > There is a reason to the madness. Namely, this query was driven by a client application requirement. In a particularoperational case, the application needed to be able to say "give me all this information now!" without making roundtrips to the database. The query itself has grown over the years (last time I looked at it, it was only 24 joins, not37). But, as I said before, > > 1) It works > 2) It works VERY fast (in SQLServer) > 3) It works in production and has for years now I have faced these issues before in older versions of PostgreSQL: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=67a713f0107dc77a&seekm=01C0CF88.292AB320.mascarm%40mascari.com#link1 I'd suggest a few things: 1) How long does it take to execute just a plain EXPLAIN? I suspect it might be spending more time planning than actually executing 2) You might be able to play around with explicit join syntax in part of your queries: http://www.postgresql.org/docs/7.3/static/explicit-joins.html 3) I've found the GEQO threshold to be way too low: http://www.postgresql.org/docs/7.3/static/runtime-config.html#RUNTIME-CONFIG-OPTIMIZER 4) If you have any UDF's used in the WHERE clause, attempt to rewrite the query without them or use #2 to defer their evaluation if they are costly. I've found that PostgreSQL, when left to its own devices, can often choose to evaluate a UDF before a join, where the join would have been far less costly to evaluate first. I haven't tried 7.4beta though. It may solve all your problems and answer all your questions. For me, each release has reduced planning time by an order of magnitude. Hopefully, that trend will continue ad infinitum. :-) Mike Mascari mascarm@mascari.com
Mike Mascari <mascarm@mascari.com> writes: > > 1) Our database is highly normalized. If anything I was worried it was "excessively" normalized. Sometimes people go overboard, taking columns that really could be simple attributes and make them reference tables. But that usually doesn't cause performance problems, just programmer headaches. It was just a first impression, the simple number of tables isn't evidence. > > 2) All joins in the query are performed across indeces. Ok, well only one of the table accesses is actually using an index in that plan. I don't understand what's going on in enough detail to explain why. Perhaps the optimizer thinks the tables are just small enough to not make them worthwhile. Which columns actually have indexes, are they all the same data type as the value they're being compared against? -- greg
> Perhaps the optimizer thinks the tables are just small enough > to not make them > worthwhile. > > Which columns actually have indexes, are they all the same > data type as the > value they're being compared against? Yup, this was my guess. Many of the tables being joined in are not excessively large (10s or 100s of records), while tableslike device and numplan are VERY large (or can be). Most, if not all of the joins -- if memory serves -- are being made across foreign keys of either uuid type or simple ints. The uuid type I created (implements libuuid as a type for postgresql), and, yes, I defined all the operator classes for btree-ing(it does work nicely). I'm going to follow Tom's suggestion at try it against 7.4 next week. Until then... cwl > -----Original Message----- > From: Greg Stark [mailto:gsstark@mit.edu] > Sent: Thursday, August 28, 2003 8:20 PM > To: Mike Mascari > Cc: Clay Luther; Greg Stark; pgsql-general@postgresql.org > Subject: Re: [GENERAL] left outer join terrible slow compared to inner > join > > > > Mike Mascari <mascarm@mascari.com> writes: > > > > 1) Our database is highly normalized. > > If anything I was worried it was "excessively" normalized. > Sometimes people go > overboard, taking columns that really could be simple > attributes and make them > reference tables. But that usually doesn't cause performance > problems, just > programmer headaches. It was just a first impression, the > simple number of > tables isn't evidence. > > > > 2) All joins in the query are performed across indeces. > > Ok, well only one of the table accesses is actually using an > index in that > plan. I don't understand what's going on in enough detail to > explain why. > > Perhaps the optimizer thinks the tables are just small enough > to not make them > worthwhile. > > Which columns actually have indexes, are they all the same > data type as the > value they're being compared against? > > -- > greg > >
Greg Stark <gsstark@mit.edu> writes: > Now, uh, there are 37 tables involved in this query. That's kind of a lot. > Postgres has to consider 37 factorial different ways of combining these > tables. or about 13,763,750,000,000,000,000,000,000,000,000,000,000,000,000 > different combinations. Of course, we don't do that ... long before you get to 37 tables, the optimizer gives up on exhaustive search and goes for approximate answers. So part of the problem might be that the partial GEQO plan search isn't finding an especially good plan. But it looked to me like this is basically a star schema and the exact order in which we join the detail tables isn't much of an issue. I thought the problem is more likely to be that 7.3 isn't doing enough analysis of when it can simplify outer joins --- there are conditions in the join steps of Clay's example that really should have been pushed down to the scan steps. We'll have to await his report of whether 7.4 does better. regards, tom lane
I have a basic SQL call that takes forever because Postgresql seems to want to use a Seq row scan on the table Products which has around 41k rows. Below is the sql call and the explain.
explain select * from ChargeCodes where AccountID = '{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' and ChargeCodeID IN (Select ChargeCodeID from Products where ProductID in (select ProductID from OrderRules where WebUserRoleID in (Select WebUserRoleID from WebUsers where WebUserID = '{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}')))
Seq Scan on chargecodes (cost=0.00..19217292988.42 rows=36 width=108)
Filter: ((accountid = '{58C215AA-2C71-446F-88F3-BC2C0D23EF28}'::text) AND (subplan))
SubPlan
-> Materialize (cost=263250588.84..263250588.84 rows=20535 width=42)
-> Seq Scan on products (cost=0.00..263250588.84 rows=20535 width=42)
Filter: (subplan)
SubPlan
-> Materialize (cost=6409.75..6409.75 rows=554 width=42)
-> Seq Scan on orderrules (cost=0.00..6409.75 rows=554 width=42)
Filter: (subplan)
SubPlan
-> Materialize (cost=5.75..5.75 rows=1 width=42)
-> Index Scan using webusers_pkey on webusers (cost=0.00..5.75 rows=1 width=42)
Index Cond: (webuserid = '{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}'::text)
All the above tables have proper index's, all of them excuse their individual calls quickly, but when put together it's slow as can be. In the cases of where it chose to do a Seq scan, all except the one for products are correct (tables with 10-100 rows at most).
Can anyone help and make a suggestion as to why it's doing a sequence scan, what gets even more interesting is that even if I turn force index scans on, it still seq scan's products yet when I make any individual calls to products outside of the above context it always uses the index.
Thanks,
Steven
explain select * from ChargeCodes where AccountID = '{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' and ChargeCodeID IN (Select ChargeCodeID from Products where ProductID in (select ProductID from OrderRules where WebUserRoleID in (Select WebUserRoleID from WebUsers where WebUserID = '{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}')))
Seq Scan on chargecodes (cost=0.00..19217292988.42 rows=36 width=108)
Filter: ((accountid = '{58C215AA-2C71-446F-88F3-BC2C0D23EF28}'::text) AND (subplan))
SubPlan
-> Materialize (cost=263250588.84..263250588.84 rows=20535 width=42)
-> Seq Scan on products (cost=0.00..263250588.84 rows=20535 width=42)
Filter: (subplan)
SubPlan
-> Materialize (cost=6409.75..6409.75 rows=554 width=42)
-> Seq Scan on orderrules (cost=0.00..6409.75 rows=554 width=42)
Filter: (subplan)
SubPlan
-> Materialize (cost=5.75..5.75 rows=1 width=42)
-> Index Scan using webusers_pkey on webusers (cost=0.00..5.75 rows=1 width=42)
Index Cond: (webuserid = '{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}'::text)
All the above tables have proper index's, all of them excuse their individual calls quickly, but when put together it's slow as can be. In the cases of where it chose to do a Seq scan, all except the one for products are correct (tables with 10-100 rows at most).
Can anyone help and make a suggestion as to why it's doing a sequence scan, what gets even more interesting is that even if I turn force index scans on, it still seq scan's products yet when I make any individual calls to products outside of the above context it always uses the index.
Thanks,
Steven
On 29 Aug 2003 at 7:04, Steven Tower wrote: > > I have a basic SQL call that takes forever because Postgresql seems to want to > use a Seq row scan on the table Products which has around 41k rows. Below is > the sql call and the explain. > > explain select * from ChargeCodes where AccountID = '{58C215AA-2C71-446F-88F3- > BC2C0D23EF28}' and ChargeCodeID IN (Select ChargeCodeID from Products where > ProductID in (select ProductID from OrderRules where WebUserRoleID > in (Select WebUserRoleID from WebUsers where WebUserID = '{3CD5D4F5- > 448B-11D5-83DB-0001023EA2FA}'))) So many in's nested? I doubt a 7.3.x would survive that. Upgrade to 7.4 beta2. The 'in' problem is taken care of, in there.. Can you somehow rewrite that as nested joins? They could be much better.. Bye Shridhar -- There comes to all races an ultimate crisis which you have yet to face.... One day our minds became so powerful we daredthink of ourselves as gods. -- Sargon, "Return to Tomorrow", stardate 4768.3
I will give 7.4 a try. Actually I have played around with rewriting this, it's not my select statement, personally I would have designed the tables a little differently in order to facilitate better statements. I agree, the IN's are a little much.
Steven
On Fri, 2003-08-29 at 07:14, Shridhar Daithankar wrote:
Steven
On Fri, 2003-08-29 at 07:14, Shridhar Daithankar wrote:
On 29 Aug 2003 at 7:04, Steven Tower wrote: > > I have a basic SQL call that takes forever because Postgresql seems to want to > use a Seq row scan on the table Products which has around 41k rows. Below is > the sql call and the explain. > > explain select * from ChargeCodes where AccountID = '{58C215AA-2C71-446F-88F3- > BC2C0D23EF28}' and ChargeCodeID IN (Select ChargeCodeID from Products where > ProductID in (select ProductID from OrderRules where WebUserRoleID > in (Select WebUserRoleID from WebUsers where WebUserID = '{3CD5D4F5- > 448B-11D5-83DB-0001023EA2FA}'))) So many in's nested? I doubt a 7.3.x would survive that. Upgrade to 7.4 beta2. The 'in' problem is taken care of, in there.. Can you somehow rewrite that as nested joins? They could be much better.. Bye Shridhar -- There comes to all races an ultimate crisis which you have yet to face.... One day our minds became so powerful we dared think of ourselves as gods. -- Sargon, "Return to Tomorrow", stardate 4768.3 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
Attachment
Steven Tower wrote: > I have a basic SQL call that takes forever because Postgresql seems to > want to use a Seq row scan on the table Products which has around 41k > rows. Below is the sql call and the explain. > > explain select * from ChargeCodes where AccountID = > '{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' and ChargeCodeID IN (Select > ChargeCodeID from Products where ProductID in (select ProductID from > OrderRules where WebUserRoleID in (Select WebUserRoleID from > WebUsers where WebUserID = '{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}'))) > [snip] Anything before version 7.4 does not handle the IN statement very well, and the recomendation from the people in the know is to use EXISTS. See http://www.postgresql.org/docs/7.3/static/functions-subquery.html#AEN10407 for more detials. So either upgrade to 7.4, although its still in beta at the moment, or change the query to something like: SELECT * FROM ChargeCodes t1 WHERE AccountID='{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' AND EXISTS ( SELECT 1 FROM Products t2 WHERE t2.ChargeCodeID=t1.ChargeCodeID AND EXISTS ( SELECT 1 FROM OrderRules t3 WHERE t3.ProductID=t2.ProductID AND EXISTS ( SELECT 1 FROM WebUsers t4 WHERE t4.WebUserId='{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}' AND t4.WebUserRoleID=t3.WebUserRoleID ) or perhaps simpler but you will have to compare outputs..... SELECT * FROM ChargeCodes t1 WHERE AccountID='{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' AND EXISTS (SELECT 1 FROM Products t2, OrderRules t3, WebUsers t4 WHERE t1.ChargeCodeID=t2.ChargeCodeID AND t2.ProductID=t3.ProductID AND t3.WebUserRoleID=t4.WebUserRoleID AND t4.WebUserId='{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}') Nick
Nick,
Thanks much. I haven't been in the SQL frame of mind and sure enough as soon as you said it, I laughed because I personally have never used in, I have always used exists.
I will give some revision on the below a try, uh, sometimes the brain just doesn't want to help you. Glad you are all here.
Steven
On Fri, 2003-08-29 at 08:17, Nick Barr wrote:
Thanks much. I haven't been in the SQL frame of mind and sure enough as soon as you said it, I laughed because I personally have never used in, I have always used exists.
I will give some revision on the below a try, uh, sometimes the brain just doesn't want to help you. Glad you are all here.
Steven
On Fri, 2003-08-29 at 08:17, Nick Barr wrote:
Steven Tower wrote: > I have a basic SQL call that takes forever because Postgresql seems to > want to use a Seq row scan on the table Products which has around 41k > rows. Below is the sql call and the explain. > > explain select * from ChargeCodes where AccountID = > '{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' and ChargeCodeID IN (Select > ChargeCodeID from Products where ProductID in (select ProductID from > OrderRules where WebUserRoleID in (Select WebUserRoleID from > WebUsers where WebUserID = '{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}'))) > [snip] Anything before version 7.4 does not handle the IN statement very well, and the recomendation from the people in the know is to use EXISTS. See http://www.postgresql.org/docs/7.3/static/functions-subquery.html#AEN10407 for more detials. So either upgrade to 7.4, although its still in beta at the moment, or change the query to something like: SELECT * FROM ChargeCodes t1 WHERE AccountID='{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' AND EXISTS ( SELECT 1 FROM Products t2 WHERE t2.ChargeCodeID=t1.ChargeCodeID AND EXISTS ( SELECT 1 FROM OrderRules t3 WHERE t3.ProductID=t2.ProductID AND EXISTS ( SELECT 1 FROM WebUsers t4 WHERE t4.WebUserId='{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}' AND t4.WebUserRoleID=t3.WebUserRoleID ) or perhaps simpler but you will have to compare outputs..... SELECT * FROM ChargeCodes t1 WHERE AccountID='{58C215AA-2C71-446F-88F3-BC2C0D23EF28}' AND EXISTS (SELECT 1 FROM Products t2, OrderRules t3, WebUsers t4 WHERE t1.ChargeCodeID=t2.ChargeCodeID AND t2.ProductID=t3.ProductID AND t3.WebUserRoleID=t4.WebUserRoleID AND t4.WebUserId='{3CD5D4F5-448B-11D5-83DB-0001023EA2FA}') Nick ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Attachment
> And doing the explicit cross join statement on o_kat_prod instead of > ot_kat_prod gives the expected performance to me ( 7.42 msec instead > of 7324.49 msec with EXPLAIN ANALYZE). > > Do i've any chance to get the same performance on the view? I've had this problem and it was due to improper typing - It was a join on an int8 column, which has to be explicitly cast if it's a parameter to the query. Jon
I had an opportunity to test this massive left outer join this with 7.4b2 today. It took <3 seconds on an untuned, new install...which is MUCH better. In fact, it performed as well as Oracle 9i. cwl > -----Original Message----- > From: Clay Luther > Sent: Thursday, August 28, 2003 1:26 PM > To: 'Tom Lane'; Thomas Beutin > Cc: pgsql-general@postgresql.org > Subject: RE: [GENERAL] left outer join terrible slow compared to inner > join > > > Actually, I was about to post some problems we have with > large left outer joins as well we've discovered in a porting > project from NT/SQL Server -> Linux/Postgres. > > We have a particular query that is rather large, left outer > joining across several tables. Under SQL Server, with > identical data and schema, this particular query takes 2 seconds. > > Under PostgreSQL, this same query takes 90 seconds -- that's > right, 90 seconds. 45x longer than SQL Server. This was > quite a shock to us (we'd not seen such a performance deficit > between the two dbs until this) and could, in fact, force us > away from Postgres. > > I'd be happy to forward the explain to anyone who'd care to > look at it... > > cwl > > > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: Thursday, August 28, 2003 1:10 PM > > To: Thomas Beutin > > Cc: pgsql-general@postgresql.org > > Subject: Re: [GENERAL] left outer join terrible slow > compared to inner > > join > > > > > > Thomas Beutin <tyrone@laokoon.IN-Berlin.DE> writes: > > > Thanks for the suggestion, but the result is close to the > > original outer > > > join without the explicit cross join but far away from the > > speed of the > > > inner join. > > > > > EXPLAIN ANALYZE SELECT DISTINCT pz.l1_id, pz.l2_id, > > pz.l3_id, pz.l4_id FROM ( ot_adresse AS a CROSS JOIN > > ot_produkt AS p) LEFT OUTER JOIN ot_kat_prod AS pz ON ( > > p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = > > '105391105424941' AND a.m_id = '37'; > > > NOTICE: QUERY PLAN: > > > > > -> Subquery Scan pz (cost=0.00..1683.51 > > rows=40851 width=170) (actual time=0.03..485.07 rows=40917 loops=11) > > > -> Seq Scan on o_kat_prod > > (cost=0.00..1683.51 rows=40851 width=170) (actual > > time=0.02..281.77 rows=40917 loops=11) > > > > Hmm, I don't understand why ot_kat_prod is being treated as > a subquery > > here. It isn't a view or something is it? > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faqs/FAQ.html > > >