Thread: Better way to write aggregates?

Better way to write aggregates?

From

Jan Dittmer

Date:

21 April 2006, 08:54:42

Hi,

I more or less often come about the problem of aggregating a
child table counting it's different states. The cleanest solution
I've come up with so far is:

BEGIN;
CREATE TABLE parent (
    id int not null,
      name text not null,
    UNIQUE(id)
);

CREATE TABLE child (
    name text not null,
    state int not null,
    parent int not null references parent(id)
);

CREATE VIEW parent_childs AS
SELECT
    c.parent,
    count(c.state) as childtotal,
    count(c.state) - count(nullif(c.state,1)) as childstate1,
    count(c.state) - count(nullif(c.state,2)) as childstate2,
    count(c.state) - count(nullif(c.state,3)) as childstate3
FROM child c
GROUP BY parent;

CREATE VIEW parent_view AS
SELECT p.*,
pc.*
FROM parent p
LEFT JOIN parent_childs pc ON (p.id = pc.parent);
COMMIT;

Is this the fastest way to build these aggregates (not considering
tricks with triggers, etc)? The count(state) - count(nullif(...)) looks
a bit clumsy.
I also experimented with a pgsql function to sum these up, but considered
it as not-so-nice and it also always forces a sequential scan on the
data.

Thanks for any advice,

Jan

Re: Better way to write aggregates?

From

"Jim Buttafuoco"

Date:

21 April 2006, 12:27:40

Jan,

I write queries like this

CREATE VIEW parent_childs AS
SELECT
     c.parent,
     count(c.state) as childtotal,
     sum(case when c.state = 1 then 1 else 0 end) as childstate1,
     sum(case when c.state = 2 then 1 else 0 end) as childstate2,
     sum(case when c.state = 3 then 1 else 0 end) as childstate3
 FROM child c
 GROUP BY parent;

---------- Original Message -----------
From: Jan Dittmer <jdi@l4x.org>
To: pgsql-performance@postgresql.org
Sent: Fri, 21 Apr 2006 10:37:10 +0200
Subject: [PERFORM] Better way to write aggregates?

> Hi,
>
> I more or less often come about the problem of aggregating a
> child table counting it's different states. The cleanest solution
> I've come up with so far is:
>
> BEGIN;
> CREATE TABLE parent (
>     id int not null,
>       name text not null,
>     UNIQUE(id)
> );
>
> CREATE TABLE child (
>     name text not null,
>     state int not null,
>     parent int not null references parent(id)
> );
>
> CREATE VIEW parent_childs AS
> SELECT
>     c.parent,
>     count(c.state) as childtotal,
>     count(c.state) - count(nullif(c.state,1)) as childstate1,
>     count(c.state) - count(nullif(c.state,2)) as childstate2,
>     count(c.state) - count(nullif(c.state,3)) as childstate3
> FROM child c
> GROUP BY parent;
>
> CREATE VIEW parent_view AS
> SELECT p.*,
> pc.*
> FROM parent p
> LEFT JOIN parent_childs pc ON (p.id = pc.parent);
> COMMIT;
>
> Is this the fastest way to build these aggregates (not considering
> tricks with triggers, etc)? The count(state) - count(nullif(...)) looks
> a bit clumsy.
> I also experimented with a pgsql function to sum these up, but considered
> it as not-so-nice and it also always forces a sequential scan on the
> data.
>
> Thanks for any advice,
>
> Jan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
------- End of Original Message -------

Re: Better way to write aggregates?

From

Jan Dittmer

Date:

21 April 2006, 12:35:47

Jim Buttafuoco wrote:
> Jan,
>
> I write queries like this
>
> CREATE VIEW parent_childs AS
> SELECT
>      c.parent,
>      count(c.state) as childtotal,
>      sum(case when c.state = 1 then 1 else 0 end) as childstate1,
>      sum(case when c.state = 2 then 1 else 0 end) as childstate2,
>      sum(case when c.state = 3 then 1 else 0 end) as childstate3
>  FROM child c
>  GROUP BY parent;

It would help if booleans could be casted to integer 1/0 :-) But
performance wise it should be about the same? I think I'll
run some tests later today with real data.
Would an index on NULLIF(state,1) help count(NULLIF(state,1)) ?
Can one build an index on (case when c.state = 3 then 1 else 0 end)?

Thanks,

Jan

Re: Better way to write aggregates?

From

"Jim Buttafuoco"

Date:

21 April 2006, 12:39:06

I don't think an index will help you with this query.

---------- Original Message -----------
From: Jan Dittmer <jdi@l4x.org>
To: jim@contactbda.com
Cc: pgsql-performance@postgresql.org
Sent: Fri, 21 Apr 2006 14:35:33 +0200
Subject: Re: [PERFORM] Better way to write aggregates?

> Jim Buttafuoco wrote:
> > Jan,
> >
> > I write queries like this
> >
> > CREATE VIEW parent_childs AS
> > SELECT
> >      c.parent,
> >      count(c.state) as childtotal,
> >      sum(case when c.state = 1 then 1 else 0 end) as childstate1,
> >      sum(case when c.state = 2 then 1 else 0 end) as childstate2,
> >      sum(case when c.state = 3 then 1 else 0 end) as childstate3
> >  FROM child c
> >  GROUP BY parent;
>
> It would help if booleans could be casted to integer 1/0 :-) But
> performance wise it should be about the same? I think I'll
> run some tests later today with real data.
> Would an index on NULLIF(state,1) help count(NULLIF(state,1)) ?
> Can one build an index on (case when c.state = 3 then 1 else 0 end)?
>
> Thanks,
>
> Jan
------- End of Original Message -------

Re: Better way to write aggregates?

From

Tom Lane

Date:

21 April 2006, 14:26:27

Jan Dittmer <jdi@l4x.org> writes:
> It would help if booleans could be casted to integer 1/0 :-)

As of 8.1 there is such a cast in the system:

regression=# select 't'::bool::int;
 int4
------
    1
(1 row)

In earlier releases you can make your own.

As for the original question, though: have you looked at the crosstab
functions in contrib/tablefunc?

            regards, tom lane