Thread: Re: [SQL] can i make this sql query more efficiant?

Re: [SQL] can i make this sql query more efficiant?

From

Manfred Koizar

Date:

04 April 2003, 14:03:18

On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh@agliodbs.com>
wrote:
>That version is only more efficient for small data sets.    I've generally
>found that case statements are slower than subselects for large data sets.

I'd be honestly interested in the circumstances where you made that
observation.

>YMMV.

Yes, it does :-)  Out of curiosity I did a few tests with PG 7.2 on my
old notebook:

CREATE TABLE baz (event int, level int);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
...
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
CREATE INDEX baz_event ON baz(event);
ANALYSE baz;

SELECT event,
       SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones,
       SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos,
       SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes
  FROM baz GROUP BY event;

SELECT event,
  (SELECT count(*) FROM baz a
    WHERE level = 1 AND a.event=baz.event) AS ones,
  (SELECT count(*) FROM baz a
    WHERE level = 2 and a.event=baz.event) AS twos,
  (SELECT count(*) FROM baz a
    WHERE level = 3 and a.event=baz.event) AS threes
  FROM baz GROUP BY event;

tuples   case         subselect
  8K    718.48 msec   16199.88 msec
 32K   6168.18 msec   74742.85 msec
128K  25072.34 msec  304585.61 msec

CLUSTER baz_event ON baz; ANALYSE baz;
This changes the subselect plan from seq scan to index scan.

128K  12116.07 msec   17530.85 msec

Add 128K more tuples, so that only the first half of the relation is
clustered.

256K  45663.35 msec  117748.23 msec

CLUSTER baz_event ON baz; ANALYSE baz;

256K  23691.81 msec   35138.26 msec

Maybe it is just the data distribution (100 events, 3 levels,
thousands of tuples) that makes CASE look faster than subselects ...

Servus
 Manfred

Re: [SQL] can i make this sql query more efficiant?

From

Josh Berkus

Date:

04 April 2003, 14:26:21

Manfred,

> I'd be honestly interested in the circumstances where you made that
> observation.

Hmmmm ... one of my database involves a "crosstab" converstion where there
were 13 possible values, and the converted table is heavily indexed.  For
that case, I found using CASE statements to be slower.

For your example, how do the statistics change if you increase the number of
levels to 15 and put an index on them?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: [SQL] can i make this sql query more efficiant?

From

Manfred Koizar

Date:

04 April 2003, 21:08:23

On Fri, 4 Apr 2003 11:26:14 -0800, Josh Berkus <josh@agliodbs.com>
wrote:
>For your example, how do the statistics change if you increase the number of
>levels to 15 and put an index on them?

CREATE TABLE baz (event int, level int);

INSERT INTO baz SELECT (100*random()+0.5), (15*random()+0.5);
INSERT INTO baz SELECT (100*random()+0.5), (15*random()+0.5) FROM baz;
...
INSERT INTO baz SELECT (100*random()+0.5), (15*random()+0.5) FROM baz;
ANALYSE baz;
CREATE INDEX baz_e ON baz(event);
CREATE INDEX baz_l ON baz(level);
CREATE INDEX baz_el ON baz(event, level);
CREATE INDEX baz_le ON baz(level, event);

tup  cluster     case          subsel
  8K    -     1219.90 msec   70605.93 msec (seq scan)
  8K    -                     3087.30 msec (seq scan off)

 16K    -     3861.87 msec  161902.36 msec (seq scan)
 16K    -                    31498.76 msec (seq scan off)
 16K  event   2407.72 msec    5773.12 msec
 16K  level   2298.08 msec   32752.43 msec
 16K  l, e    2318.60 msec    3184.84 msec

 32K    -     6571.57 msec    7381.22 msec
 32K  e, l    4584.97 msec    3429.94 msec
 32K  l, e    4552.00 msec   64782.59 msec
 32K  l, e    4552.98 msec    3544.32 msec (baz_l dropped)

 64K    -    17275.73 msec   26525.24 msec
 64K    -    17150.16 msec   26195.87 msec (baz_le dropped)
 64K    -    17286.29 msec  656046.24 msec (baz_el dropped)
 64K  e, l    9137.88 msec   21809.52 msec
 64K  e, l    9183.25 msec    6412.97 msec (baz_e dropped)
 64K  e, l   11690.28 msec   10022.44 msec (baz_el dropped)
 64K  e, l   11740.54 msec  643046.39 msec (baz_le dropped)
 64K  l, e    9437.65 msec  133368.20 msec
 64K  l, e    9119.48 msec    6722.00 msec (baz_l dropped)
 64K  l, e    9294.68 msec    6663.15 msec (baz_le dropped)
 64K  l, e    9259.35 msec  639754.27 msec (baz_el dropped)

256K    -    59809.69 msec  120755.78 msec
256K    -    59809.69 msec  114133.34 msec (baz_le dropped)
256K  e, l   38506.41 msec   88531.54 msec
256K  e, l   49427.43 msec   43544.03 msec (baz_e dropped)
256K  l, e   56821.23 msec  575850.14 msec
256K  l, e   57462.78 msec   67911.41 msec (baz_l dropped)

So yes, there are cases where subselect is faster than case, but case
is much more robust regarding correlation and indices.

Servus
 Manfred