Re: left join + case - how is it processed? - Mailing list pgsql-performance

From Robert Haas
Subject Re: left join + case - how is it processed?
Date
Msg-id 603c8f070901182017h788b5964y94ef9955e9db60b4@mail.gmail.com
Whole thread Raw
In response to left join + case - how is it processed?  (Chris <dmagick@gmail.com>)
Responses Re: left join + case - how is it processed?
List pgsql-performance
On Sun, Jan 18, 2009 at 10:30 PM, Chris <dmagick@gmail.com> wrote:
> Hi all,
>
> I have a view that looks like this:
>
>  SELECT
>        CASE
>            WHEN r.assetid IS NULL THEN p.assetid
>            ELSE r.assetid
>        END AS assetid,
>        CASE
>            WHEN r.userid IS NULL THEN p.userid
>            ELSE r.userid
>        END AS userid, p.permission, p."granted", p.cascades
>   FROM sq_ast_perm p
>   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> It was pointed out to me that the first CASE is useless (since r.assetid
> will always be the same as p.assetid because of the left join condition) so
> I'm looking at that to see if it'll make much of a difference and it does.
>
> I won't post the whole lot but the first line is the most interesting.
>
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
>
>  Merge Left Join  (cost=9529.34..13823.76 rows=75721 width=102) (actual
> time=284.371..341.536 rows=1 loops=1)
>
> (The row count is right - it's the total # of rows from sq_ast_perm).

The row count is VERY WRONG.   Apparently the actual number of rows is
1 and the estimate is 75721: that's bad.

> When I change the view to be:
>
>  SELECT p.assetid,
>        CASE
>            WHEN r.userid IS NULL THEN p.userid
>            ELSE r.userid
>        END AS userid, p.permission, p."granted", p.cascades
>   FROM sq_ast_perm p
>   LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> The Merge left join only returns 3 rows:
>
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
>           Merge Left Join  (cost=9507.18..9508.23 rows=3 width=70) (actual
> time=11.544..11.549 rows=1 loops=1)
>
> I thought the where condition would cut down on the rows returned, then the
> case statement would take effect to do the null check. It seems to be doing
> it in reverse ??

The ESTIMATE is 3 rows - the actual rows are 1, just as before.
Notice this is a much more accurate estimate: that's good.

The reason why the CASE is affecting your query planning is because
you are using a query that compares assetid to a constant:

SELECT * from sq_vw_ast_perm where assetid='30748';

When PostgreSQL evaluates this statement, assetid gets expanded either
into a case statement (with your first view definition) or into
sq_ast_perm.assetid (with your second view definition).  The latter
definition allows PostgreSQL to make use of the column statistics
(which are pretty accurate) whereas the former is probably leading to
a SWAG, because PostgreSQL isn't very good at estimating the
selectivity of CASE.  The bad selectivity estimate, in turn, is
leading to a poor plan choice...

...Robert

pgsql-performance by date:

Previous
From: Chris
Date:
Subject: left join + case - how is it processed?
Next
From: Evan Carroll
Date:
Subject: Re: left join + case - how is it processed?