bad selectivity estimates for CASE - Mailing list pgsql-performance

From Robert Haas
Subject bad selectivity estimates for CASE
Date
Msg-id 603c8f070901051915p15b3d6a7jbe6f6fdd178143e@mail.gmail.com
Whole thread Raw
Responses Re: bad selectivity estimates for CASE  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
While looking at a complex query that is being poorly planned by
PostgreSQL 8.2.9, I discovered that any non-trivial CASE...WHEN
expression seems to produce a selectivity estimate of 0.005.  This
also happens on HEAD.

psql (8.4devel)
Type "help" for help.

head=# create table tenk (c) as select generate_series(1,10000);
SELECT
head=# alter table tenk alter column c set statistics 100;
ALTER TABLE
head=# analyze tenk;
ANALYZE
head=# explain select * from tenk where c in (1,2,3,4);
                      QUERY PLAN
------------------------------------------------------
 Seq Scan on tenk  (cost=0.00..190.00 rows=4 width=4)
   Filter: (c = ANY ('{1,2,3,4}'::integer[]))
(2 rows)

head=# explain select * from tenk where case when c in (1,2,3,4) then 1 end = 1;
                                         QUERY PLAN

--------------------------------------------------------------------------------
------------
 Seq Scan on tenk  (cost=0.00..215.00 rows=50 width=4)
   Filter: (CASE WHEN (c = ANY ('{1,2,3,4}'::integer[])) THEN 1 ELSE NULL::integ
er END = 1)
(2 rows)

head=# explain select * from tenk where case when c in (1,2,3,4) then 2 end = 1;
                                         QUERY PLAN

--------------------------------------------------------------------------------
------------
 Seq Scan on tenk  (cost=0.00..215.00 rows=50 width=4)
   Filter: (CASE WHEN (c = ANY ('{1,2,3,4}'::integer[])) THEN 2 ELSE NULL::integ
er END = 1)
(2 rows)

head=# \q

The last example is particularly egregious, since it can never return
true, but the previous example is not much better, since in my actual
query the actual selectivity (against a CASE with multiple WHEN
branches) can be as high as ~0.8, so a value of 0.005 isn't close.  It
ends up causing a very expensive nested loop plan when something else
would be better.

Any suggestions would be appreciated.

...Robert

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Poor plan choice in prepared statement
Next
From: Tom Lane
Date:
Subject: Re: bad selectivity estimates for CASE