Thread: can i make this sql query more efficiant?

can i make this sql query more efficiant?

From
Robert Treat
Date:
create table  baz (event text, level int);

insert into baz values ('x',1);
insert into baz values ('x',2);
insert into baz values ('x',3);
insert into baz values ('y',2);
insert into baz values ('y',3);
insert into baz values ('y',3);

select * from baz;
event | level 
-------+-------x     |     1x     |     2x     |     3y     |     2y     |     3y     |     3
(6 rows)


I want to know how many ones, twos, and threes there are for each event:

select event, (select count(*) from baz a     where level = 1 and a.event=baz.event) as ones, (select count(*) from baz
a    where level = 2 and a.event=baz.event) as twos, (select count(*) from baz a     where level = 3 and
a.event=baz.event)as threes
 
from baz
group by event;

which gives me:
event | ones | twos | threes 
-------+------+------+--------x     |    1 |    1 |      1y     |    0 |    1 |      2
(2 rows)


which is fine, but I am wondering if there is a better way to do this?
I'd mainly like to reduce the number of subqueries involved. Another
improvement would be to not have to explicitly query for each level,
though this isn't as big since I know the range of levels in advance
(famous last words for a dba :-) 

Thanks in advance,

Robert Treat



Re: can i make this sql query more efficiant?

From
Franco Bruno Borghesi
Date:
if you're allowed to change the resultset structure, you could do:
SELECT  event,  level,   count(*)
FROM  baz
GROUP BY  event,  level;
event | level | count
-------+-------+-------x     |     1 |     1x     |     2 |     1x     |     3 |     1y     |     2 |     1y     |
3|     2 
(5 rows)

of course it doesn't show you the rows where the count is zero.
if you need the zeros, do this

SELECT     EL.event,  EL.level,   count(baz.*)
FROM  (     SELECT DISTINCT        B1.event, B2.level      FROM         baz B1         CROSS JOIN baz B2  ) EL  LEFT
JOINbaz ON (baz.event=EL.event AND baz.level=EL.level)   
GROUP BY  EL.event,  EL.level;
event | level | count
-------+-------+-------x     |     1 |     1x     |     2 |     1x     |     3 |     1y     |     1 |     0y     |
2|     1y     |     3 |     2 
(6 rows)

hope it helps.

On Thursday 03 April 2003 18:02, Robert Treat wrote:
> create table  baz (event text, level int);
>
> insert into baz values ('x',1);
> insert into baz values ('x',2);
> insert into baz values ('x',3);
> insert into baz values ('y',2);
> insert into baz values ('y',3);
> insert into baz values ('y',3);
>
> select * from baz;
>
>  event | level
> -------+-------
>  x     |     1
>  x     |     2
>  x     |     3
>  y     |     2
>  y     |     3
>  y     |     3
> (6 rows)
>
>
> I want to know how many ones, twos, and threes there are for each event:
>
> select
>     event,
>     (select count(*) from baz a
>         where level = 1 and a.event=baz.event) as ones,
>     (select count(*) from baz a
>         where level = 2 and a.event=baz.event) as twos,
>     (select count(*) from baz a
>         where level = 3 and a.event=baz.event) as threes
> from
>      baz
> group by
>     event;
>
> which gives me:
>
>  event | ones | twos | threes
> -------+------+------+--------
>  x     |    1 |    1 |      1
>  y     |    0 |    1 |      2
> (2 rows)
>
>
> which is fine, but I am wondering if there is a better way to do this?
> I'd mainly like to reduce the number of subqueries involved. Another
> improvement would be to not have to explicitly query for each level,
> though this isn't as big since I know the range of levels in advance
> (famous last words for a dba :-)
>
> Thanks in advance,
>
> Robert Treat
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: can i make this sql query more efficiant?

From
Manfred Koizar
Date:
On 03 Apr 2003 16:02:04 -0500, Robert Treat
<xzilla@users.sourceforge.net> wrote:
>select 
>    event, 
>    (select count(*) from baz a 
>        where level = 1 and a.event=baz.event) as ones, 
>    (select count(*) from baz a 
>        where level = 2 and a.event=baz.event) as twos, 
>    (select count(*) from baz a 
>        where level = 3 and a.event=baz.event) as threes
>from
>     baz
>group by 
>    event;

>which is fine, but I am wondering if there is a better way to do this?
>I'd mainly like to reduce the number of subqueries involved.

SELECT event,      SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones,      SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS
twos,     SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes FROM bazGROUP BY event;
 

> Another
>improvement would be to not have to explicitly query for each level,

This might be a case for a clever set returning function, but that's
not my realm.  Wait for Joe to jump in  ;-)

ServusManfred



Re: can i make this sql query more efficiant?

From
"Tomasz Myrta"
Date:
<cut>
> select 
>     event, 
>     (select count(*) from baz a 
>         where level = 1 and a.event=baz.event) as ones, 
>     (select count(*) from baz a 
>         where level = 2 and a.event=baz.event) as twos, 
>     (select count(*) from baz a 
>         where level = 3 and a.event=baz.event) as threes
> from
>      baz
> group by 
>     event;
> 
> which gives me:
> 
>  event | ones | twos | threes 
> -------+------+------+--------
>  x     |    1 |    1 |      1
>  y     |    0 |    1 |      2
> (2 rows)
<cut>
What about this:
select event, sum(case when level=1 then 1 else 0 end) as ones, sum(case when level=2 then 1 else 0 end) as twos,
sum(casewhen level=3 then 1 else 0 end) as threes
 
from baz
group by event;

Regards,
Tomasz Myrta



Re: can i make this sql query more efficiant?

From
Josh Berkus
Date:
Tomasz,

> What about this:
> select
>   event,
>   sum(case when level=1 then 1 else 0 end) as ones,
>   sum(case when level=2 then 1 else 0 end) as twos,
>   sum(case when level=3 then 1 else 0 end) as threes
> from baz
> group by event;

That version is only more efficient for small data sets.    I've generally
found that case statements are slower than subselects for large data sets.
YMMV.

BTW, while it won't be faster, Joe Conway's crosstab function in /tablefunc
does this kind of transformation.

--
Josh Berkus
Aglio Database Solutions
San Francisco



Re: can i make this sql query more efficiant?

From
Manfred Koizar
Date:
On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh@agliodbs.com>
wrote:
>That version is only more efficient for small data sets.    I've generally
>found that case statements are slower than subselects for large data sets.

I'd be honestly interested in the circumstances where you made that
observation.

>YMMV.

Yes, it does :-)  Out of curiosity I did a few tests with PG 7.2 on my
old notebook:

CREATE TABLE baz (event int, level int);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
...
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
CREATE INDEX baz_event ON baz(event);
ANALYSE baz;

SELECT event,
       SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones,
       SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos,
       SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes
  FROM baz GROUP BY event;

SELECT event,
  (SELECT count(*) FROM baz a
    WHERE level = 1 AND a.event=baz.event) AS ones,
  (SELECT count(*) FROM baz a
    WHERE level = 2 and a.event=baz.event) AS twos,
  (SELECT count(*) FROM baz a
    WHERE level = 3 and a.event=baz.event) AS threes
  FROM baz GROUP BY event;

tuples   case         subselect
  8K    718.48 msec   16199.88 msec
 32K   6168.18 msec   74742.85 msec
128K  25072.34 msec  304585.61 msec

CLUSTER baz_event ON baz; ANALYSE baz;
This changes the subselect plan from seq scan to index scan.

128K  12116.07 msec   17530.85 msec

Add 128K more tuples, so that only the first half of the relation is
clustered.

256K  45663.35 msec  117748.23 msec

CLUSTER baz_event ON baz; ANALYSE baz;

256K  23691.81 msec   35138.26 msec

Maybe it is just the data distribution (100 events, 3 levels,
thousands of tuples) that makes CASE look faster than subselects ...

Servus
 Manfred


Re: [PERFORM] can i make this sql query more efficiant?

From
Josh Berkus
Date:
Manfred,

> I'd be honestly interested in the circumstances where you made that
> observation.

Hmmmm ... one of my database involves a "crosstab" converstion where there
were 13 possible values, and the converted table is heavily indexed.  For
that case, I found using CASE statements to be slower.

For your example, how do the statistics change if you increase the number of
levels to 15 and put an index on them?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco