Thread: can i make this sql query more efficiant?
create table baz (event text, level int); insert into baz values ('x',1); insert into baz values ('x',2); insert into baz values ('x',3); insert into baz values ('y',2); insert into baz values ('y',3); insert into baz values ('y',3); select * from baz; event | level -------+-------x | 1x | 2x | 3y | 2y | 3y | 3 (6 rows) I want to know how many ones, twos, and threes there are for each event: select event, (select count(*) from baz a where level = 1 and a.event=baz.event) as ones, (select count(*) from baz a where level = 2 and a.event=baz.event) as twos, (select count(*) from baz a where level = 3 and a.event=baz.event)as threes from baz group by event; which gives me: event | ones | twos | threes -------+------+------+--------x | 1 | 1 | 1y | 0 | 1 | 2 (2 rows) which is fine, but I am wondering if there is a better way to do this? I'd mainly like to reduce the number of subqueries involved. Another improvement would be to not have to explicitly query for each level, though this isn't as big since I know the range of levels in advance (famous last words for a dba :-) Thanks in advance, Robert Treat
if you're allowed to change the resultset structure, you could do: SELECT event, level, count(*) FROM baz GROUP BY event, level; event | level | count -------+-------+-------x | 1 | 1x | 2 | 1x | 3 | 1y | 2 | 1y | 3| 2 (5 rows) of course it doesn't show you the rows where the count is zero. if you need the zeros, do this SELECT EL.event, EL.level, count(baz.*) FROM ( SELECT DISTINCT B1.event, B2.level FROM baz B1 CROSS JOIN baz B2 ) EL LEFT JOINbaz ON (baz.event=EL.event AND baz.level=EL.level) GROUP BY EL.event, EL.level; event | level | count -------+-------+-------x | 1 | 1x | 2 | 1x | 3 | 1y | 1 | 0y | 2| 1y | 3 | 2 (6 rows) hope it helps. On Thursday 03 April 2003 18:02, Robert Treat wrote: > create table baz (event text, level int); > > insert into baz values ('x',1); > insert into baz values ('x',2); > insert into baz values ('x',3); > insert into baz values ('y',2); > insert into baz values ('y',3); > insert into baz values ('y',3); > > select * from baz; > > event | level > -------+------- > x | 1 > x | 2 > x | 3 > y | 2 > y | 3 > y | 3 > (6 rows) > > > I want to know how many ones, twos, and threes there are for each event: > > select > event, > (select count(*) from baz a > where level = 1 and a.event=baz.event) as ones, > (select count(*) from baz a > where level = 2 and a.event=baz.event) as twos, > (select count(*) from baz a > where level = 3 and a.event=baz.event) as threes > from > baz > group by > event; > > which gives me: > > event | ones | twos | threes > -------+------+------+-------- > x | 1 | 1 | 1 > y | 0 | 1 | 2 > (2 rows) > > > which is fine, but I am wondering if there is a better way to do this? > I'd mainly like to reduce the number of subqueries involved. Another > improvement would be to not have to explicitly query for each level, > though this isn't as big since I know the range of levels in advance > (famous last words for a dba :-) > > Thanks in advance, > > Robert Treat > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On 03 Apr 2003 16:02:04 -0500, Robert Treat <xzilla@users.sourceforge.net> wrote: >select > event, > (select count(*) from baz a > where level = 1 and a.event=baz.event) as ones, > (select count(*) from baz a > where level = 2 and a.event=baz.event) as twos, > (select count(*) from baz a > where level = 3 and a.event=baz.event) as threes >from > baz >group by > event; >which is fine, but I am wondering if there is a better way to do this? >I'd mainly like to reduce the number of subqueries involved. SELECT event, SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones, SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos, SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes FROM bazGROUP BY event; > Another >improvement would be to not have to explicitly query for each level, This might be a case for a clever set returning function, but that's not my realm. Wait for Joe to jump in ;-) ServusManfred
<cut> > select > event, > (select count(*) from baz a > where level = 1 and a.event=baz.event) as ones, > (select count(*) from baz a > where level = 2 and a.event=baz.event) as twos, > (select count(*) from baz a > where level = 3 and a.event=baz.event) as threes > from > baz > group by > event; > > which gives me: > > event | ones | twos | threes > -------+------+------+-------- > x | 1 | 1 | 1 > y | 0 | 1 | 2 > (2 rows) <cut> What about this: select event, sum(case when level=1 then 1 else 0 end) as ones, sum(case when level=2 then 1 else 0 end) as twos, sum(casewhen level=3 then 1 else 0 end) as threes from baz group by event; Regards, Tomasz Myrta
Tomasz, > What about this: > select > event, > sum(case when level=1 then 1 else 0 end) as ones, > sum(case when level=2 then 1 else 0 end) as twos, > sum(case when level=3 then 1 else 0 end) as threes > from baz > group by event; That version is only more efficient for small data sets. I've generally found that case statements are slower than subselects for large data sets. YMMV. BTW, while it won't be faster, Joe Conway's crosstab function in /tablefunc does this kind of transformation. -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh@agliodbs.com> wrote: >That version is only more efficient for small data sets. I've generally >found that case statements are slower than subselects for large data sets. I'd be honestly interested in the circumstances where you made that observation. >YMMV. Yes, it does :-) Out of curiosity I did a few tests with PG 7.2 on my old notebook: CREATE TABLE baz (event int, level int); INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5); INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz; ... INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz; CREATE INDEX baz_event ON baz(event); ANALYSE baz; SELECT event, SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones, SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos, SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes FROM baz GROUP BY event; SELECT event, (SELECT count(*) FROM baz a WHERE level = 1 AND a.event=baz.event) AS ones, (SELECT count(*) FROM baz a WHERE level = 2 and a.event=baz.event) AS twos, (SELECT count(*) FROM baz a WHERE level = 3 and a.event=baz.event) AS threes FROM baz GROUP BY event; tuples case subselect 8K 718.48 msec 16199.88 msec 32K 6168.18 msec 74742.85 msec 128K 25072.34 msec 304585.61 msec CLUSTER baz_event ON baz; ANALYSE baz; This changes the subselect plan from seq scan to index scan. 128K 12116.07 msec 17530.85 msec Add 128K more tuples, so that only the first half of the relation is clustered. 256K 45663.35 msec 117748.23 msec CLUSTER baz_event ON baz; ANALYSE baz; 256K 23691.81 msec 35138.26 msec Maybe it is just the data distribution (100 events, 3 levels, thousands of tuples) that makes CASE look faster than subselects ... Servus Manfred
Manfred, > I'd be honestly interested in the circumstances where you made that > observation. Hmmmm ... one of my database involves a "crosstab" converstion where there were 13 possible values, and the converted table is heavily indexed. For that case, I found using CASE statements to be slower. For your example, how do the statistics change if you increase the number of levels to 15 and put an index on them? -- -Josh Berkus Aglio Database Solutions San Francisco