Thread: left join + case - how is it processed?
Hi all, I have a view that looks like this: SELECT CASE WHEN r.assetid IS NULL THEN p.assetid ELSE r.assetid END AS assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p."granted", p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; It was pointed out to me that the first CASE is useless (since r.assetid will always be the same as p.assetid because of the left join condition) so I'm looking at that to see if it'll make much of a difference and it does. I won't post the whole lot but the first line is the most interesting. # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9529.34..13823.76 rows=75721 width=102) (actual time=284.371..341.536 rows=1 loops=1) (The row count is right - it's the total # of rows from sq_ast_perm). When I change the view to be: SELECT p.assetid, CASE WHEN r.userid IS NULL THEN p.userid ELSE r.userid END AS userid, p.permission, p."granted", p.cascades FROM sq_ast_perm p LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text; The Merge left join only returns 3 rows: # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; Merge Left Join (cost=9507.18..9508.23 rows=3 width=70) (actual time=11.544..11.549 rows=1 loops=1) I thought the where condition would cut down on the rows returned, then the case statement would take effect to do the null check. It seems to be doing it in reverse ?? Recently analyzed, only just imported so free of bloat. Running 8.1.11. Thanks! -- Postgresql & php tutorials http://www.designmagick.com/
On Sun, Jan 18, 2009 at 10:30 PM, Chris <dmagick@gmail.com> wrote: > Hi all, > > I have a view that looks like this: > > SELECT > CASE > WHEN r.assetid IS NULL THEN p.assetid > ELSE r.assetid > END AS assetid, > CASE > WHEN r.userid IS NULL THEN p.userid > ELSE r.userid > END AS userid, p.permission, p."granted", p.cascades > FROM sq_ast_perm p > LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND > r.assetid::text = p.assetid::text; > > > It was pointed out to me that the first CASE is useless (since r.assetid > will always be the same as p.assetid because of the left join condition) so > I'm looking at that to see if it'll make much of a difference and it does. > > I won't post the whole lot but the first line is the most interesting. > > # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; > > > Merge Left Join (cost=9529.34..13823.76 rows=75721 width=102) (actual > time=284.371..341.536 rows=1 loops=1) > > (The row count is right - it's the total # of rows from sq_ast_perm). The row count is VERY WRONG. Apparently the actual number of rows is 1 and the estimate is 75721: that's bad. > When I change the view to be: > > SELECT p.assetid, > CASE > WHEN r.userid IS NULL THEN p.userid > ELSE r.userid > END AS userid, p.permission, p."granted", p.cascades > FROM sq_ast_perm p > LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND > r.assetid::text = p.assetid::text; > > > The Merge left join only returns 3 rows: > > # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; > > Merge Left Join (cost=9507.18..9508.23 rows=3 width=70) (actual > time=11.544..11.549 rows=1 loops=1) > > I thought the where condition would cut down on the rows returned, then the > case statement would take effect to do the null check. It seems to be doing > it in reverse ?? The ESTIMATE is 3 rows - the actual rows are 1, just as before. Notice this is a much more accurate estimate: that's good. The reason why the CASE is affecting your query planning is because you are using a query that compares assetid to a constant: SELECT * from sq_vw_ast_perm where assetid='30748'; When PostgreSQL evaluates this statement, assetid gets expanded either into a case statement (with your first view definition) or into sq_ast_perm.assetid (with your second view definition). The latter definition allows PostgreSQL to make use of the column statistics (which are pretty accurate) whereas the former is probably leading to a SWAG, because PostgreSQL isn't very good at estimating the selectivity of CASE. The bad selectivity estimate, in turn, is leading to a poor plan choice... ...Robert
On Sun, Jan 18, 2009 at 9:30 PM, Chris <dmagick@gmail.com> wrote: > Hi all, > > I have a view that looks like this: > > SELECT > CASE > WHEN r.assetid IS NULL THEN p.assetid > ELSE r.assetid > END AS assetid, > CASE > WHEN r.userid IS NULL THEN p.userid > ELSE r.userid > END AS userid, p.permission, p."granted", p.cascades > FROM sq_ast_perm p > LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND > r.assetid::text = p.assetid::text; > > > It was pointed out to me that the first CASE is useless (since r.assetid > will always be the same as p.assetid because of the left join condition) so > I'm looking at that to see if it'll make much of a difference and it does. Lets assume it wasn't useless because of that, it would still be obfuscated and probably slower because it is an explicit coalesce() SELECT coalesce( r.assetid, p.assetid ) , coalesce( r.userid , p.userid ) > I thought the where condition would cut down on the rows returned, then the > case statement would take effect to do the null check. It seems to be doing > it in reverse ?? # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; It aperas to me that both of your statements have where clauses, but I believe where isn't that explicit. I'm not sure the nature of your problem permits the query optimizer to eliminate rows at all, even with the where statement. "assetid" is probably not known when the query optimizer hits, because it is computed based on the nullness of the columns. I'd assume that the optimizer *could* more easily optimize this if you had used coalesce rather than an ad-hoc method with CASE. My guess is you can exclude rows with WHERE if the the column used is an run-time computation involving an ad-hoc CASE. -- Evan Carroll System Lord of the Internets
> The reason why the CASE is affecting your query planning is because > you are using a query that compares assetid to a constant: > > SELECT * from sq_vw_ast_perm where assetid='30748'; > > When PostgreSQL evaluates this statement, assetid gets expanded either > into a case statement (with your first view definition) or into > sq_ast_perm.assetid (with your second view definition). The latter > definition allows PostgreSQL to make use of the column statistics > (which are pretty accurate) whereas the former is probably leading to > a SWAG, because PostgreSQL isn't very good at estimating the > selectivity of CASE. The bad selectivity estimate, in turn, is > leading to a poor plan choice... If I take it out of the view, it's fine: # SELECT # CASE # WHEN r.assetid IS NULL THEN p.assetid # ELSE r.assetid # END AS assetid, # CASE # WHEN r.userid IS NULL THEN p.userid # ELSE r.userid # END AS userid, p.permission, p."granted", p.cascades # FROM sq_ast_perm p # LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND r.assetid::text = p.assetid::text # where p.assetid='30748'; QUERY PLAN --------------------------------------------- Merge Left Join (cost=9459.89..9463.13 rows=3 width=102) (actual time=0.096..0.098 rows=1 loops=1) In this case I assume the planner is doing the 'WHERE' first to cut down the rows, then applying the CASE at the end. The view it seems to be the opposite - I still don't understand why that's the case. Though I do get the same behaviour as the view when I do it as a subselect. -- Postgresql & php tutorials http://www.designmagick.com/
> My guess is you can exclude rows with WHERE if the the > column used is an run-time computation involving an ad-hoc CASE. * that you can't -- Evan Carroll System Lord of the Internets
> >> I thought the where condition would cut down on the rows returned, then the >> case statement would take effect to do the null check. It seems to be doing >> it in reverse ?? > # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748'; > > It aperas to me that both of your statements have where clauses, but I > believe where isn't that explicit. I'm not sure the nature of your > problem permits the query optimizer to eliminate rows at all, even > with the where statement. "assetid" is probably not known when the > query optimizer hits, because it is computed based on the nullness of > the columns. I'd assume that the optimizer *could* more easily > optimize this if you had used coalesce rather than an ad-hoc method > with CASE. My guess is you can exclude rows with WHERE if the the > column used is an run-time computation involving an ad-hoc CASE. No difference. Full explain plan here: http://explain-analyze.info/query_plans/2725-query-plan-1447 I can see it's doing the extra filter step at the start (4th line) which is not present without the coalesce/case statement. I just don't understand why it's being done at that stage. -- Postgresql & php tutorials http://www.designmagick.com/
Chris <dmagick@gmail.com> writes: > I can see it's doing the extra filter step at the start (4th line) which > is not present without the coalesce/case statement. I just don't > understand why it's being done at that stage. It's not that hard to understand. With the original view formulation (or the COALESCE version), the fully expanded form of the query looks like select ... from p left join r ... where expression_involving_both_p_and_r = constant If you make the view output be just p.assetid then you have select ... from p left join r ... where p.assetid = constant In the first case the planner cannot apply the WHERE restriction until it's formed the p+r join; so you see the condition applied as a filter on the join node's output. In the second case, the planner can push the WHERE restriction down into the scan of p, since the left join doesn't affect it. (If a p row doesn't pass the restriction, then no join row formed from it can either; ergo there is no need to form those join rows at all.) In general a WHERE or JOIN/ON clause cannot be applied below the point at which all the relations mentioned in it have been joined. There are a few special cases where the planner can transform clauses into some other form that's more optimizable, but you can pretty much bet that a CASE will never be one of them --- CASE is more or less *defined* to defeat optimization. regards, tom lane
On Sun, Jan 18, 2009 at 7:30 PM, Chris <dmagick@gmail.com> wrote: > I have a view that looks like this: > > SELECT > CASE > WHEN r.assetid IS NULL THEN p.assetid > ELSE r.assetid > END AS assetid, > CASE > WHEN r.userid IS NULL THEN p.userid > ELSE r.userid > END AS userid, p.permission, p."granted", p.cascades > FROM sq_ast_perm p > LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND > r.assetid::text = p.assetid::text; The effect that you are trying to achieve with CASE statements is better suited to the COALESCE(...) function. http://www.postgresql.org/docs/8.3/interactive/functions-conditional.html#AEN14484 -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
Tom Lane wrote: > Chris <dmagick@gmail.com> writes: >> I can see it's doing the extra filter step at the start (4th line) which >> is not present without the coalesce/case statement. I just don't >> understand why it's being done at that stage. > > It's not that hard to understand. With the original view formulation > (or the COALESCE version), the fully expanded form of the query looks > like > > select ... from p left join r ... > where expression_involving_both_p_and_r = constant > > If you make the view output be just p.assetid then you have > > select ... from p left join r ... > where p.assetid = constant > > In the first case the planner cannot apply the WHERE restriction until > it's formed the p+r join; so you see the condition applied as a filter > on the join node's output. In the second case, the planner can push the > WHERE restriction down into the scan of p, since the left join doesn't > affect it. (If a p row doesn't pass the restriction, then no join row > formed from it can either; ergo there is no need to form those join rows > at all.) So because the CASE is on (some of) the fields I'm joining on, in effect it's made part of the join condition. If the fields are outside that (r.userid/p.userid), then it's evaluated after. Thanks! -- Postgresql & php tutorials http://www.designmagick.com/