Thread: Re: [SQL] can i make this sql query more efficiant?
On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh@agliodbs.com> wrote: >That version is only more efficient for small data sets. I've generally >found that case statements are slower than subselects for large data sets. I'd be honestly interested in the circumstances where you made that observation. >YMMV. Yes, it does :-) Out of curiosity I did a few tests with PG 7.2 on my old notebook: CREATE TABLE baz (event int, level int); INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5); INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz; ... INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz; CREATE INDEX baz_event ON baz(event); ANALYSE baz; SELECT event, SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones, SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos, SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes FROM baz GROUP BY event; SELECT event, (SELECT count(*) FROM baz a WHERE level = 1 AND a.event=baz.event) AS ones, (SELECT count(*) FROM baz a WHERE level = 2 and a.event=baz.event) AS twos, (SELECT count(*) FROM baz a WHERE level = 3 and a.event=baz.event) AS threes FROM baz GROUP BY event; tuples case subselect 8K 718.48 msec 16199.88 msec 32K 6168.18 msec 74742.85 msec 128K 25072.34 msec 304585.61 msec CLUSTER baz_event ON baz; ANALYSE baz; This changes the subselect plan from seq scan to index scan. 128K 12116.07 msec 17530.85 msec Add 128K more tuples, so that only the first half of the relation is clustered. 256K 45663.35 msec 117748.23 msec CLUSTER baz_event ON baz; ANALYSE baz; 256K 23691.81 msec 35138.26 msec Maybe it is just the data distribution (100 events, 3 levels, thousands of tuples) that makes CASE look faster than subselects ... Servus Manfred
Manfred, > I'd be honestly interested in the circumstances where you made that > observation. Hmmmm ... one of my database involves a "crosstab" converstion where there were 13 possible values, and the converted table is heavily indexed. For that case, I found using CASE statements to be slower. For your example, how do the statistics change if you increase the number of levels to 15 and put an index on them? -- -Josh Berkus Aglio Database Solutions San Francisco
On Fri, 4 Apr 2003 11:26:14 -0800, Josh Berkus <josh@agliodbs.com> wrote: >For your example, how do the statistics change if you increase the number of >levels to 15 and put an index on them? CREATE TABLE baz (event int, level int); INSERT INTO baz SELECT (100*random()+0.5), (15*random()+0.5); INSERT INTO baz SELECT (100*random()+0.5), (15*random()+0.5) FROM baz; ... INSERT INTO baz SELECT (100*random()+0.5), (15*random()+0.5) FROM baz; ANALYSE baz; CREATE INDEX baz_e ON baz(event); CREATE INDEX baz_l ON baz(level); CREATE INDEX baz_el ON baz(event, level); CREATE INDEX baz_le ON baz(level, event); tup cluster case subsel 8K - 1219.90 msec 70605.93 msec (seq scan) 8K - 3087.30 msec (seq scan off) 16K - 3861.87 msec 161902.36 msec (seq scan) 16K - 31498.76 msec (seq scan off) 16K event 2407.72 msec 5773.12 msec 16K level 2298.08 msec 32752.43 msec 16K l, e 2318.60 msec 3184.84 msec 32K - 6571.57 msec 7381.22 msec 32K e, l 4584.97 msec 3429.94 msec 32K l, e 4552.00 msec 64782.59 msec 32K l, e 4552.98 msec 3544.32 msec (baz_l dropped) 64K - 17275.73 msec 26525.24 msec 64K - 17150.16 msec 26195.87 msec (baz_le dropped) 64K - 17286.29 msec 656046.24 msec (baz_el dropped) 64K e, l 9137.88 msec 21809.52 msec 64K e, l 9183.25 msec 6412.97 msec (baz_e dropped) 64K e, l 11690.28 msec 10022.44 msec (baz_el dropped) 64K e, l 11740.54 msec 643046.39 msec (baz_le dropped) 64K l, e 9437.65 msec 133368.20 msec 64K l, e 9119.48 msec 6722.00 msec (baz_l dropped) 64K l, e 9294.68 msec 6663.15 msec (baz_le dropped) 64K l, e 9259.35 msec 639754.27 msec (baz_el dropped) 256K - 59809.69 msec 120755.78 msec 256K - 59809.69 msec 114133.34 msec (baz_le dropped) 256K e, l 38506.41 msec 88531.54 msec 256K e, l 49427.43 msec 43544.03 msec (baz_e dropped) 256K l, e 56821.23 msec 575850.14 msec 256K l, e 57462.78 msec 67911.41 msec (baz_l dropped) So yes, there are cases where subselect is faster than case, but case is much more robust regarding correlation and indices. Servus Manfred