Re: [SQL] can i make this sql query more efficiant? - Mailing list pgsql-performance

From Manfred Koizar
Subject Re: [SQL] can i make this sql query more efficiant?
Date
Msg-id d0ir8vgn2sp6uh2h61nkm4tdasqqn4o18l@4ax.com
Whole thread Raw
Responses Re: [SQL] can i make this sql query more efficiant?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh@agliodbs.com>
wrote:
>That version is only more efficient for small data sets.    I've generally
>found that case statements are slower than subselects for large data sets.

I'd be honestly interested in the circumstances where you made that
observation.

>YMMV.

Yes, it does :-)  Out of curiosity I did a few tests with PG 7.2 on my
old notebook:

CREATE TABLE baz (event int, level int);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
...
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
CREATE INDEX baz_event ON baz(event);
ANALYSE baz;

SELECT event,
       SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones,
       SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos,
       SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes
  FROM baz GROUP BY event;

SELECT event,
  (SELECT count(*) FROM baz a
    WHERE level = 1 AND a.event=baz.event) AS ones,
  (SELECT count(*) FROM baz a
    WHERE level = 2 and a.event=baz.event) AS twos,
  (SELECT count(*) FROM baz a
    WHERE level = 3 and a.event=baz.event) AS threes
  FROM baz GROUP BY event;

tuples   case         subselect
  8K    718.48 msec   16199.88 msec
 32K   6168.18 msec   74742.85 msec
128K  25072.34 msec  304585.61 msec

CLUSTER baz_event ON baz; ANALYSE baz;
This changes the subselect plan from seq scan to index scan.

128K  12116.07 msec   17530.85 msec

Add 128K more tuples, so that only the first half of the relation is
clustered.

256K  45663.35 msec  117748.23 msec

CLUSTER baz_event ON baz; ANALYSE baz;

256K  23691.81 msec   35138.26 msec

Maybe it is just the data distribution (100 events, 3 levels,
thousands of tuples) that makes CASE look faster than subselects ...

Servus
 Manfred


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: OSS database needed for testing
Next
From: Josh Berkus
Date:
Subject: Re: [SQL] can i make this sql query more efficiant?