On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh@agliodbs.com>
wrote:
>That version is only more efficient for small data sets. I've generally
>found that case statements are slower than subselects for large data sets.
I'd be honestly interested in the circumstances where you made that
observation.
>YMMV.
Yes, it does :-) Out of curiosity I did a few tests with PG 7.2 on my
old notebook:
CREATE TABLE baz (event int, level int);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
...
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
CREATE INDEX baz_event ON baz(event);
ANALYSE baz;
SELECT event,
SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones,
SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos,
SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes
FROM baz GROUP BY event;
SELECT event,
(SELECT count(*) FROM baz a
WHERE level = 1 AND a.event=baz.event) AS ones,
(SELECT count(*) FROM baz a
WHERE level = 2 and a.event=baz.event) AS twos,
(SELECT count(*) FROM baz a
WHERE level = 3 and a.event=baz.event) AS threes
FROM baz GROUP BY event;
tuples case subselect
8K 718.48 msec 16199.88 msec
32K 6168.18 msec 74742.85 msec
128K 25072.34 msec 304585.61 msec
CLUSTER baz_event ON baz; ANALYSE baz;
This changes the subselect plan from seq scan to index scan.
128K 12116.07 msec 17530.85 msec
Add 128K more tuples, so that only the first half of the relation is
clustered.
256K 45663.35 msec 117748.23 msec
CLUSTER baz_event ON baz; ANALYSE baz;
256K 23691.81 msec 35138.26 msec
Maybe it is just the data distribution (100 events, 3 levels,
thousands of tuples) that makes CASE look faster than subselects ...
Servus
Manfred