Thread: Strange query plan invloving a view
i have the following query involving a view that i really need to optimise: SELECT * FROM tokens.ta_tokenhist h INNER JOIN tokens.vw_tokens t ON h.token_id = t.token_id WHERE h.sarreport_id = 9 ; where vw_tokens is defined as CREATE VIEW tokens.vw_tokens AS SELECT -- too many columns to mention FROM tokens.ta_tokens t LEFT JOIN tokens.ta_tokenhist i ON t.token_id = i.token_id AND i.status = 'issued' LEFT JOIN tokens.ta_tokenhist s ON t.token_id = s.token_id AND s.status = 'sold' LEFT JOIN tokens.ta_tokenhist r ON t.token_id = r.token_id AND r.status = 'redeemed' ; this gives me the following query plan: Merge Join (cost=18276278.45..31793043.16 rows=55727 width=322) Merge Cond: (("outer".token_id)::integer = "inner"."?column23?") -> Merge Left Join (cost=18043163.64..31639175.71 rows=4228018 width=76) Merge Cond: (("outer".token_id)::integer = "inner"."?column3?") -> Merge Left Join (cost=13649584.94..27194793.37 rows=4228018 width=48) Merge Cond: (("outer".token_id)::integer = "inner"."?column3?") -> Merge Left Join (cost=7179372.62..20653326.29 rows=4228018 width=44) Merge Cond: (("outer".token_id)::integer = "inner"."?column3?") -> Index Scan using ta_tokens_pkey on ta_tokens t (cost=0.00..13400398.89 rows=4053805 width=27) -> Sort (cost=7179372.62..7189942.67 rows=4228018 width=21) Sort Key: (i.token_id)::integer -> Index Scan using fkx_tokenhist__status on ta_tokenhist i (cost=0.00..6315961.47 rows=4228018width=21) Index Cond: ((status)::text = 'issued'::text) -> Sort (cost=6470212.32..6479909.69 rows=3878949 width=8) Sort Key: (s.token_id)::integer -> Index Scan using fkx_tokenhist__status on ta_tokenhist s (cost=0.00..5794509.99 rows=3878949 width=8) Index Cond: ((status)::text = 'sold'::text) -> Sort (cost=4393578.70..4400008.00 rows=2571718 width=32) Sort Key: (r.token_id)::integer -> Index Scan using fkx_tokenhist__status on ta_tokenhist r (cost=0.00..3841724.02 rows=2571718 width=32) Index Cond: ((status)::text = 'redeemed'::text) -> Sort (cost=233114.81..233248.38 rows=53430 width=246) Sort Key: (h.token_id)::integer -> Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h (cost=0.00..213909.12 rows=53430 width=246) Index Cond: ((sarreport_id)::integer = 9) However, the following query (which i believe should be equivalent) SELECT * FROM tokens.ta_tokenhist h INNER JOIN tokens.ta_tokens t ON h.token_id = t.token_id LEFT JOIN tokens.ta_tokenhist i ON t.token_id = i.token_id AND i.status = 'issued' LEFT JOIN tokens.ta_tokenhist s ON t.token_id = s.token_id AND s.status = 'sold' LEFT JOIN tokens.ta_tokenhist r ON t.token_id = r.token_id AND r.status = 'redeemed' WHERE h.sarreport_id = 9 ; gives the following query plan: Nested Loop Left Join (cost=0.00..3475785.52 rows=55727 width=1011) -> Nested Loop Left Join (cost=0.00..2474425.17 rows=55727 width=765) -> Nested Loop Left Join (cost=0.00..1472368.23 rows=55727 width=519) -> Nested Loop (cost=0.00..511614.87 rows=53430 width=273) -> Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h (cost=0.00..213909.12 rows=53430width=246) Index Cond: ((sarreport_id)::integer = 9) -> Index Scan using ta_tokens_pkey on ta_tokens t (cost=0.00..5.56 rows=1 width=27) Index Cond: (("outer".token_id)::integer = (t.token_id)::integer) -> Index Scan using fkx_tokenhist__tokens on ta_tokenhist i (cost=0.00..17.96 rows=2 width=246) Index Cond: (("outer".token_id)::integer = (i.token_id)::integer) Filter: ((status)::text = 'issued'::text) -> Index Scan using fkx_tokenhist__tokens on ta_tokenhist s (cost=0.00..17.96 rows=2 width=246) Index Cond: (("outer".token_id)::integer = (s.token_id)::integer) Filter: ((status)::text = 'sold'::text) -> Index Scan using fkx_tokenhist__tokens on ta_tokenhist r (cost=0.00..17.96 rows=1 width=246) Index Cond: (("outer".token_id)::integer = (r.token_id)::integer) Filter: ((status)::text = 'redeemed'::text) This query returns a lot quicker than the plan would suggest, as the planner is over-estimating the amount of rows where ((sarreport_id)::integer = 9). it thinks there are 53430 when in fact there are only 7 (despite a vacuum and analyse). Can anyone give me any suggestions? are the index stats the cause of my problem, or is it the rewrite of the query? Cheers Version: PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 (prerelease) (Debian 4.0.1-6) -- - Rich Doughty
Rich Doughty <rich@opusvl.com> writes: > However, the following query (which i believe should be equivalent) > SELECT * > FROM > tokens.ta_tokenhist h INNER JOIN > tokens.ta_tokens t ON h.token_id = t.token_id LEFT JOIN > tokens.ta_tokenhist i ON t.token_id = i.token_id AND > i.status = 'issued' LEFT JOIN > tokens.ta_tokenhist s ON t.token_id = s.token_id AND > s.status = 'sold' LEFT JOIN > tokens.ta_tokenhist r ON t.token_id = r.token_id AND > r.status = 'redeemed' > WHERE > h.sarreport_id = 9 > ; No, that's not equivalent at all, because the implicit parenthesization is left-to-right; therefore you've injected the constraint to a few rows of ta_tokenhist (and therefore only a few rows of ta_tokens) into the bottom of the LEFT JOIN stack. In the other case the constraint is at the wrong end of the join stack, and so the full view output gets formed before anything gets thrown away. Some day the Postgres planner will probably be smart enough to rearrange the join order despite the presence of outer joins ... but today is not that day. regards, tom lane
Tom Lane wrote: > Rich Doughty <rich@opusvl.com> writes: > >>However, the following query (which i believe should be equivalent) > > >>SELECT * >>FROM >> tokens.ta_tokenhist h INNER JOIN >> tokens.ta_tokens t ON h.token_id = t.token_id LEFT JOIN >> tokens.ta_tokenhist i ON t.token_id = i.token_id AND >> i.status = 'issued' LEFT JOIN >> tokens.ta_tokenhist s ON t.token_id = s.token_id AND >> s.status = 'sold' LEFT JOIN >> tokens.ta_tokenhist r ON t.token_id = r.token_id AND >> r.status = 'redeemed' >>WHERE >> h.sarreport_id = 9 >>; > > > No, that's not equivalent at all, because the implicit parenthesization > is left-to-right; therefore you've injected the constraint to a few rows > of ta_tokenhist (and therefore only a few rows of ta_tokens) into the > bottom of the LEFT JOIN stack. In the other case the constraint is at > the wrong end of the join stack, and so the full view output gets formed > before anything gets thrown away. > > Some day the Postgres planner will probably be smart enough to rearrange > the join order despite the presence of outer joins ... but today is not > that day. thanks for the reply. is there any way i can achieve what i need to by using views, or should i just use a normal query? i'd prefer to use a view but i just can't get round the performance hit. -- - Rich Doughty