Thread: left join + case - how is it processed?

left join + case - how is it processed?

From
Chris
Date:
Hi all,

I have a view that looks like this:

  SELECT
         CASE
             WHEN r.assetid IS NULL THEN p.assetid
             ELSE r.assetid
         END AS assetid,
         CASE
             WHEN r.userid IS NULL THEN p.userid
             ELSE r.userid
         END AS userid, p.permission, p."granted", p.cascades
    FROM sq_ast_perm p
    LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
r.assetid::text = p.assetid::text;


It was pointed out to me that the first CASE is useless (since r.assetid
will always be the same as p.assetid because of the left join condition)
so I'm looking at that to see if it'll make much of a difference and it
does.

I won't post the whole lot but the first line is the most interesting.

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';


  Merge Left Join  (cost=9529.34..13823.76 rows=75721 width=102) (actual
time=284.371..341.536 rows=1 loops=1)

(The row count is right - it's the total # of rows from sq_ast_perm).


When I change the view to be:

  SELECT p.assetid,
         CASE
             WHEN r.userid IS NULL THEN p.userid
             ELSE r.userid
         END AS userid, p.permission, p."granted", p.cascades
    FROM sq_ast_perm p
    LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
r.assetid::text = p.assetid::text;


The Merge left join only returns 3 rows:

# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

            Merge Left Join  (cost=9507.18..9508.23 rows=3 width=70)
(actual time=11.544..11.549 rows=1 loops=1)

I thought the where condition would cut down on the rows returned, then
the case statement would take effect to do the null check. It seems to
be doing it in reverse ??

Recently analyzed, only just imported so free of bloat. Running 8.1.11.

Thanks!
--
Postgresql & php tutorials
http://www.designmagick.com/


Re: left join + case - how is it processed?

From
Robert Haas
Date:
On Sun, Jan 18, 2009 at 10:30 PM, Chris <dmagick@gmail.com> wrote:
> Hi all,
>
> I have a view that looks like this:
>
>  SELECT
>        CASE
>            WHEN r.assetid IS NULL THEN p.assetid
>            ELSE r.assetid
>        END AS assetid,
>        CASE
>            WHEN r.userid IS NULL THEN p.userid
>            ELSE r.userid
>        END AS userid, p.permission, p."granted", p.cascades
>   FROM sq_ast_perm p
>   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> It was pointed out to me that the first CASE is useless (since r.assetid
> will always be the same as p.assetid because of the left join condition) so
> I'm looking at that to see if it'll make much of a difference and it does.
>
> I won't post the whole lot but the first line is the most interesting.
>
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
>
>  Merge Left Join  (cost=9529.34..13823.76 rows=75721 width=102) (actual
> time=284.371..341.536 rows=1 loops=1)
>
> (The row count is right - it's the total # of rows from sq_ast_perm).

The row count is VERY WRONG.   Apparently the actual number of rows is
1 and the estimate is 75721: that's bad.

> When I change the view to be:
>
>  SELECT p.assetid,
>        CASE
>            WHEN r.userid IS NULL THEN p.userid
>            ELSE r.userid
>        END AS userid, p.permission, p."granted", p.cascades
>   FROM sq_ast_perm p
>   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> The Merge left join only returns 3 rows:
>
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
>           Merge Left Join  (cost=9507.18..9508.23 rows=3 width=70) (actual
> time=11.544..11.549 rows=1 loops=1)
>
> I thought the where condition would cut down on the rows returned, then the
> case statement would take effect to do the null check. It seems to be doing
> it in reverse ??

The ESTIMATE is 3 rows - the actual rows are 1, just as before.
Notice this is a much more accurate estimate: that's good.

The reason why the CASE is affecting your query planning is because
you are using a query that compares assetid to a constant:

SELECT * from sq_vw_ast_perm where assetid='30748';

When PostgreSQL evaluates this statement, assetid gets expanded either
into a case statement (with your first view definition) or into
sq_ast_perm.assetid (with your second view definition).  The latter
definition allows PostgreSQL to make use of the column statistics
(which are pretty accurate) whereas the former is probably leading to
a SWAG, because PostgreSQL isn't very good at estimating the
selectivity of CASE.  The bad selectivity estimate, in turn, is
leading to a poor plan choice...

...Robert

Re: left join + case - how is it processed?

From
Evan Carroll
Date:
On Sun, Jan 18, 2009 at 9:30 PM, Chris <dmagick@gmail.com> wrote:
> Hi all,
>
> I have a view that looks like this:
>
>  SELECT
>        CASE
>            WHEN r.assetid IS NULL THEN p.assetid
>            ELSE r.assetid
>        END AS assetid,
>        CASE
>            WHEN r.userid IS NULL THEN p.userid
>            ELSE r.userid
>        END AS userid, p.permission, p."granted", p.cascades
>   FROM sq_ast_perm p
>   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> It was pointed out to me that the first CASE is useless (since r.assetid
> will always be the same as p.assetid because of the left join condition) so
> I'm looking at that to see if it'll make much of a difference and it does.

Lets assume it wasn't useless because of that, it would still be
obfuscated and probably slower because it is an explicit coalesce()

SELECT coalesce( r.assetid, p.assetid ) , coalesce( r.userid , p.userid )


> I thought the where condition would cut down on the rows returned, then the
> case statement would take effect to do the null check. It seems to be doing
> it in reverse ??
# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
# explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';

It aperas to me that both of your statements have where clauses, but I
believe where isn't that explicit.  I'm not sure the nature of your
problem permits the query optimizer to eliminate rows at all, even
with the where statement. "assetid" is probably not known when the
query optimizer hits, because it is computed based on the nullness of
the columns.  I'd assume that the optimizer *could* more easily
optimize this if you had used coalesce rather than an ad-hoc method
with CASE. My guess is you can exclude rows with WHERE if the the
column used is an run-time computation involving an ad-hoc CASE.


--
Evan Carroll
System Lord of the Internets

Re: left join + case - how is it processed?

From
Chris
Date:
> The reason why the CASE is affecting your query planning is because
> you are using a query that compares assetid to a constant:
>
> SELECT * from sq_vw_ast_perm where assetid='30748';
>
> When PostgreSQL evaluates this statement, assetid gets expanded either
> into a case statement (with your first view definition) or into
> sq_ast_perm.assetid (with your second view definition).  The latter
> definition allows PostgreSQL to make use of the column statistics
> (which are pretty accurate) whereas the former is probably leading to
> a SWAG, because PostgreSQL isn't very good at estimating the
> selectivity of CASE.  The bad selectivity estimate, in turn, is
> leading to a poor plan choice...

If I take it out of the view, it's fine:

#  SELECT
#         CASE
#             WHEN r.assetid IS NULL THEN p.assetid
#             ELSE r.assetid
#         END AS assetid,
#         CASE
#             WHEN r.userid IS NULL THEN p.userid
#             ELSE r.userid
#         END AS userid, p.permission, p."granted", p.cascades
#    FROM sq_ast_perm p
#    LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
r.assetid::text = p.assetid::text
# where p.assetid='30748';

           QUERY PLAN

---------------------------------------------
  Merge Left Join  (cost=9459.89..9463.13 rows=3 width=102) (actual
time=0.096..0.098 rows=1 loops=1)

In this case I assume the planner is doing the 'WHERE' first to cut down
the rows, then applying the CASE at the end.

The view it seems to be the opposite - I still don't understand why
that's the case.


Though I do get the same behaviour as the view when I do it as a subselect.

--
Postgresql & php tutorials
http://www.designmagick.com/


Re: left join + case - how is it processed?

From
Evan Carroll
Date:
> My guess is you can exclude rows with WHERE if the the
> column used is an run-time computation involving an ad-hoc CASE.


* that you can't


--
Evan Carroll
System Lord of the Internets

Re: left join + case - how is it processed?

From
Chris
Date:
>
>> I thought the where condition would cut down on the rows returned, then the
>> case statement would take effect to do the null check. It seems to be doing
>> it in reverse ??
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
> It aperas to me that both of your statements have where clauses, but I
> believe where isn't that explicit.  I'm not sure the nature of your
> problem permits the query optimizer to eliminate rows at all, even
> with the where statement. "assetid" is probably not known when the
> query optimizer hits, because it is computed based on the nullness of
> the columns.  I'd assume that the optimizer *could* more easily
> optimize this if you had used coalesce rather than an ad-hoc method
> with CASE. My guess is you can exclude rows with WHERE if the the
> column used is an run-time computation involving an ad-hoc CASE.

No difference.

Full explain plan here:

http://explain-analyze.info/query_plans/2725-query-plan-1447

I can see it's doing the extra filter step at the start (4th line) which
is not present without the coalesce/case statement. I just don't
understand why it's being done at that stage.

--
Postgresql & php tutorials
http://www.designmagick.com/


Re: left join + case - how is it processed?

From
Tom Lane
Date:
Chris <dmagick@gmail.com> writes:
> I can see it's doing the extra filter step at the start (4th line) which
> is not present without the coalesce/case statement. I just don't
> understand why it's being done at that stage.

It's not that hard to understand.  With the original view formulation
(or the COALESCE version), the fully expanded form of the query looks
like

    select ... from p left join r ...
      where expression_involving_both_p_and_r = constant

If you make the view output be just p.assetid then you have

    select ... from p left join r ...
      where p.assetid = constant

In the first case the planner cannot apply the WHERE restriction until
it's formed the p+r join; so you see the condition applied as a filter
on the join node's output.  In the second case, the planner can push the
WHERE restriction down into the scan of p, since the left join doesn't
affect it.  (If a p row doesn't pass the restriction, then no join row
formed from it can either; ergo there is no need to form those join rows
at all.)

In general a WHERE or JOIN/ON clause cannot be applied below the point
at which all the relations mentioned in it have been joined.  There are
a few special cases where the planner can transform clauses into some
other form that's more optimizable, but you can pretty much bet that a
CASE will never be one of them --- CASE is more or less *defined* to
defeat optimization.

            regards, tom lane

Re: left join + case - how is it processed?

From
"Richard Broersma"
Date:
On Sun, Jan 18, 2009 at 7:30 PM, Chris <dmagick@gmail.com> wrote:

> I have a view that looks like this:
>
>  SELECT
>        CASE
>            WHEN r.assetid IS NULL THEN p.assetid
>            ELSE r.assetid
>        END AS assetid,
>        CASE
>            WHEN r.userid IS NULL THEN p.userid
>            ELSE r.userid
>        END AS userid, p.permission, p."granted", p.cascades
>   FROM sq_ast_perm p
>   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;

The effect that you are trying to achieve with CASE statements is
better suited to the COALESCE(...) function.
http://www.postgresql.org/docs/8.3/interactive/functions-conditional.html#AEN14484

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: left join + case - how is it processed?

From
Chris
Date:
Tom Lane wrote:
> Chris <dmagick@gmail.com> writes:
>> I can see it's doing the extra filter step at the start (4th line) which
>> is not present without the coalesce/case statement. I just don't
>> understand why it's being done at that stage.
>
> It's not that hard to understand.  With the original view formulation
> (or the COALESCE version), the fully expanded form of the query looks
> like
>
>     select ... from p left join r ...
>       where expression_involving_both_p_and_r = constant
>
> If you make the view output be just p.assetid then you have
>
>     select ... from p left join r ...
>       where p.assetid = constant
>
> In the first case the planner cannot apply the WHERE restriction until
> it's formed the p+r join; so you see the condition applied as a filter
> on the join node's output.  In the second case, the planner can push the
> WHERE restriction down into the scan of p, since the left join doesn't
> affect it.  (If a p row doesn't pass the restriction, then no join row
> formed from it can either; ergo there is no need to form those join rows
> at all.)

So because the CASE is on (some of) the fields I'm joining on, in effect
it's made part of the join condition. If the fields are outside that
(r.userid/p.userid), then it's evaluated after.

Thanks!

--
Postgresql & php tutorials
http://www.designmagick.com/