Thread: Question About Aggregate Functions

Question About Aggregate Functions

From
"Don Parris"
Date:
Greetings,

I'm a DB novice as well as a pgsql novice.  I can manage to run a few basic queries, but that's all I've really done so far.  How do I create a query that (1) evaluates each boolean field for TRUE/FALSE and (2) counts the number of rows where each field is TRUE?  Also, one field is an integer, so I want the average from that field, rather than the count.  Something along the lines of:

SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1, t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE  AND t2.fielda IS TRUE AND t2.fieldb IS TRUE AND t2.fieldc IS NOT NULL

The result is intended to be something of a summary report.  t1 contains basic info about each item, with one field for whether the item is active or not.  The other table contains additional info about whether each item has particular attributes.  I want the query to tell me the average number of years active items have been active, and the number of items where each attribute is true.  I then want to turn the raw attribute counts into percentages, so I can say "for n% of the items these attributes are true".

Pointers to good examples/tutorials are welcome.  Most I have seen are rather simplistic, and what I am reading in the manual isn't coming together very well.

Thanks,
Don

Re: Question About Aggregate Functions

From
"Don Parris"
Date:
On 9/12/06, Brandon Aiken <BAiken@winemantech.com> wrote:

First, aggregate functions always have to have a GROUP BY clause.  If you want everything in a table or join, you use GROUP BY NULL. 


Thanks.  I did not realize that.

Next, IS TRUE statements will select anything that is not NULL, 0, or FALSE, so I'm not sure what you're trying to get because you're getting nearly everything, and count() already ignores NULL values.


I didn't see that in the manual's coverage, but could have overlooked it.  But count() will include the FALSE values along with the TRUE values - ignoring only those that are NULL.  At least, I think that's the case.  So, for each column I select, I need to be sure I am counting only the TRUE values.  I do have NULL, FALSE and TRUE values in each column, since I do not always know for sure whether an attribute is TRUE or FALSE when I record the item.  That may be determined later, but not in all cases.

Next, count(x, y, z) isn't a valid function.  Count() only has one parameter, so you'll have to call it several times.


I knew my syntax was wrong - but wasn't sure about calling multiple functions since I hadn't seen any examples of that in my hunting for info.  I was trying to make a little clearer what I wanted to do.
 

Depending on what you were hoping count(x, y, z) was returning, you do this:

 

SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb), AVG(t2.fieldc)

FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id

GROUP BY NULL;


This one looks more like what I am attempting to do.  However, I do need to be sure my  count() functions are counting the values that are TRUE.  Is this a case where I should run a query to select the records where the values for the desired columns are true, insert that result into a temp table, and then perform the count() function as above on just those records?  Sure seems like that would be the simple route, now that I think about it.


 
<SNIP>

Greetings,

I'm a DB novice as well as a pgsql novice.  I can manage to run a few basic queries, but that's all I've really done so far.  How do I create a query that (1) evaluates each boolean field for TRUE/FALSE and (2) counts the number of rows where each field is TRUE?  Also, one field is an integer, so I want the average from that field, rather than the count.  Something along the lines of:

SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1, t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE  AND t2.fielda IS TRUE AND t2.fieldb IS TRUE AND t2.fieldc IS NOT NULL

The result is intended to be something of a summary report.  t1 contains basic info about each item, with one field for whether the item is active or not.  The other table contains additional info about whether each item has particular attributes.  I want the query to tell me the average number of years active items have been active, and the number of items where each attribute is true.  I then want to turn the raw attribute counts into percentages, so I can say "for n% of the items these attributes are true".

Pointers to good examples/tutorials are welcome.  Most I have seen are rather simplistic, and what I am reading in the manual isn't coming together very well.

Thanks,
Don


Re: Question About Aggregate Functions

From
Oscar Rodriguez Fonseca
Date:
El mié, 13-09-2006 a las 13:16 +0200, Oscar Rodriguez Fonseca escribió:

> A query (UNTESTED) that should solve your problem may be this one:
>
> SELECT     a.count AS "t1fielda",
>     b.count AS "t2fielda",
>     c.count AS "t2fieldb",
>     d.avg AS "t2avgc"
> FROM    (SELECT count(*) FROM t1 WHERE fielda = TRUE) AS a
> JOIN    (SELECT count(*) FROM t2 WHERE fielda = TRUE) AS b ON TRUE
> JOIN    (SELECT count(*) FROM t2 WHERE fieldb = TRUE) AS c ON TRUE
> JOIN     (SELECT avg(fieldc) FROM t2) AS d ON TRUE;
>
> I am also a DB novice so this is correct.

Sorry, I was trying to write:

"I am also a DB novice so I HOPE this is correct".

Best regards.



Re: Question About Aggregate Functions

From
Oscar Rodriguez Fonseca
Date:
El mar, 12-09-2006 a las 21:15 -0400, Don Parris escribió:

>         I'm a DB novice as well as a pgsql novice.  I can manage to
>         run a few basic queries, but that's all I've really done so
>         far.  How do I create a query that (1) evaluates each boolean
>         field for TRUE/FALSE and (2) counts the number of rows where
>         each field is TRUE?  Also, one field is an integer, so I want
>         the average from that field, rather than the count.  Something
>         along the lines of:
>
>         SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc)
>         FROM t1, t2
>         WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE  AND
>         t2.fielda IS TRUE AND t2.fieldb IS TRUE AND t2.fieldc IS NOT
>         NULL
>
>         The result is intended to be something of a summary report.
>         t1
>
>         SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
>         AVG(t2.fieldc)
>         FROM t1 JOIN t2 ON t1.item_id = t2.item_id
>         GROUP BY NULL;

>
> This one looks more like what I am attempting to do.  However, I do
> need to be sure my  count() functions are counting the values that are
> TRUE.  Is this a case where I should run a query to select the records
> where the values for the desired columns are true, insert that result
> into a temp table, and then perform the count() function as above on
> just those records?  Sure seems like that would be the simple route,
> now that I think about it.

You should then use WHERE clause:

    SELECT count(t1.fielda) FROM t1 WHERE t1.fielda = TRUE;

this should report how many rows of t1 are TRUE.

BTW, I cannot understand your query's logic because you want to
aggregate both tables.

Joinning both tables you get a table like:

    item_id, t1.fielda, t2.fielda, t2.fieldb, t2.fieldc

Then you want to aggregate fields _independently_ so you may need to do
four subqueries.

At last, if you use aggregate functions without group by clause,
postgres use data from the whole table and returns only one row with
that result.

A query (UNTESTED) that should solve your problem may be this one:

SELECT     a.count AS "t1fielda",
    b.count AS "t2fielda",
    c.count AS "t2fieldb",
    d.avg AS "t2avgc"
FROM    (SELECT count(*) FROM t1 WHERE fielda = TRUE) AS a
JOIN    (SELECT count(*) FROM t2 WHERE fielda = TRUE) AS b ON TRUE
JOIN    (SELECT count(*) FROM t2 WHERE fieldb = TRUE) AS c ON TRUE
JOIN     (SELECT avg(fieldc) FROM t2) AS d ON TRUE;

I am also a DB novice so this is correct.

Regards

>
>


Re: Question About Aggregate Functions

From
"Brandon Aiken"
Date:

Ah, I did not know what was in your fields, so I did not assume they were Boolean values.  It looked to me like you were trying to use IS TRUE to substitute for the lack of a GROUP BY, so I didn’t know what to do.

 

Yes, count() will include all non-NULL values.  Sorry if I sounded unclear there. 

 

If you do typecasting the value zero is false (and non-zero is true).  NULL in an expression always returns NULL, and many programs will interpret that result as false.  So I’m not sure of what results you might get with a Boolean test against a non-Boolean field, especially if it’s an integer field.

 

postgres=# select 0::boolean = FALSE;

 ?column?

----------

 t

(1 row)

 

You should just be able to take the previous query and add in your WHERE clauses:

 

SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb), AVG(t2.fieldc)

FROM t1 JOIN t2 ON t1.item_id = t2.item_id

WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE

GROUP BY NULL;

 

Now, the INNER JOIN you’re using is only selecting fields where both t1.item_id and t2.item_id exist and the respective fields are TRUE.  That is, it’s only going to run the count and average functions against the results of this query:

SELECT *

FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id

WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE;

 

If that’s what you want, that’s great.  

 

However, you might want a count of each field where that field is TRUE.  In that case, I would use either temporary tables, compound queries and derived tables, or multiple simple queries.

 

It’s also possible that you might want a count of fields where t1.item_id and t2.item_id exist, but where only each respective field is TRUE.  That is, you want a count of t1.fielda where it is TRUE no matter what t2.fielda and t2.fieldb are as long as t1.item_id matches t2.item_id.  In that case you have to do even more joins, and that could take a fair bit of time especially if you haven’t indexed your item_id fields.

 

You really have to look at your result sets.  Sometimes it is better to run multiple simple queries instead of one big complex query to be sure you’re getting the data you want and the query executes in a reasonable amount of time.

 

Also, consider that NULL values are generally considered bad to purposefully enter.  Logically, It would be better to create one table for each field and then create a record for each item_id as you need it so you never have NULLs.  The problem with that is one of performance if you end up doing large number of JOINs.  In that case, it might be better to use integers instead of Boolean fields, since you have three explicit states of TRUE, FALSE, and NOT YET DETERMINED.

 

--

Brandon Aiken

CS/IT Systems Engineer


From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Don Parris
Sent: Tuesday, September 12, 2006 9:16 PM
To: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Question About Aggregate Functions

 

On 9/12/06, Brandon Aiken <BAiken@winemantech.com> wrote:

First, aggregate functions always have to have a GROUP BY clause.  If you want everything in a table or join, you use GROUP BY NULL. 


Thanks.  I did not realize that.

 

Next, IS TRUE statements will select anything that is not NULL, 0, or FALSE, so I'm not sure what you're trying to get because you're getting nearly everything, and count() already ignores NULL values.


I didn't see that in the manual's coverage, but could have overlooked it.  But count() will include the FALSE values along with the TRUE values - ignoring only those that are NULL.  At least, I think that's the case.  So, for each column I select, I need to be sure I am counting only the TRUE values.  I do have NULL, FALSE and TRUE values in each column, since I do not always know for sure whether an attribute is TRUE or FALSE when I record the item.  That may be determined later, but not in all cases.

 

Next, count(x, y, z) isn't a valid function.  Count() only has one parameter, so you'll have to call it several times.


I knew my syntax was wrong - but wasn't sure about calling multiple functions since I hadn't seen any examples of that in my hunting for info.  I was trying to make a little clearer what I wanted to do.
 

 

Depending on what you were hoping count(x, y, z) was returning, you do this:

 

SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb), AVG(t2.fieldc)

FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id

GROUP BY NULL;


This one looks more like what I am attempting to do.  However, I do need to be sure my  count() functions are counting the values that are TRUE.  Is this a case where I should run a query to select the records where the values for the desired columns are true, insert that result into a temp table, and then perform the count() function as above on just those records?  Sure seems like that would be the simple route, now that I think about it.

 


 

<SNIP>

Greetings,

I'm a DB novice as well as a pgsql novice.  I can manage to run a few basic queries, but that's all I've really done so far.  How do I create a query that (1) evaluates each boolean field for TRUE/FALSE and (2) counts the number of rows where each field is TRUE?  Also, one field is an integer, so I want the average from that field, rather than the count.  Something along the lines of:

SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1, t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE  AND t2.fielda IS TRUE AND t2.fieldb IS TRUE AND t2.fieldc IS NOT NULL

The result is intended to be something of a summary report.  t1 contains basic info about each item, with one field for whether the item is active or not.  The other table contains additional info about whether each item has particular attributes.  I want the query to tell me the average number of years active items have been active, and the number of items where each attribute is true.  I then want to turn the raw attribute counts into percentages, so I can say "for n% of the items these attributes are true".

Pointers to good examples/tutorials are welcome.  Most I have seen are rather simplistic, and what I am reading in the manual isn't coming together very well.

Thanks,
Don