Thread: Conditional return of aggregated data

Conditional return of aggregated data

From
Laura Smith
Date:
Hi,

I have some aggregated statistics which are currently being queried as follows:

create view stats_age as
SELECT a.category,
       a.statcount,
       b.a_desc
FROM reg_aggregated_stats a,regs_select_age b where a.category='age' and a.datapoint::smallint=b.a_val order by
a.datapointasc; 

However, as these relate to event registrations, a suggestion has been made that the database really should be
returningnothing until a certain number of registrations has been met (in order to avoid privacy infringing inferrence
fromwhat should be an otherwise fully irreversibly aggregated dataset). 

Now, the queries are wrapped in PL/PGSQL functions anyway, so I could make a second call to Postgres to find out
sum(statcount)and then conditionally return based on that. 

But is there a smarter way to do this out of a single SQL query ?

My initial idea was something along the lines of :
 select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where
aggstat>some_number;

But as I soon discovered that's not valid syntax! Hence ideas welcome from those smarter than me.

Thanks !

Laura



Re: Conditional return of aggregated data

From
Wim Bertels
Date:
Hallo Laura,

i don't know if i understand your question fully,
but this might be helpfull?: FILTER

SELECT
    count(*) AS unfiltered,
    count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

https://www.postgresql.org/docs/current/sql-expressions.html

hth,
Wim

Laura Smith schreef op ma 02-12-2019 om 11:10 [+0000]:
> Hi,
> 
> I have some aggregated statistics which are currently being queried
> as follows:
> 
> create view stats_age as
> SELECT a.category,
>        a.statcount,
>        b.a_desc
> FROM reg_aggregated_stats a,regs_select_age b where a.category='age'
> and a.datapoint::smallint=b.a_val order by a.datapoint asc;
> 
> However, as these relate to event registrations, a suggestion has
> been made that the database really should be returning nothing until
> a certain number of registrations has been met (in order to avoid
> privacy infringing inferrence from what should be an otherwise fully
> irreversibly aggregated dataset).
> 
> Now, the queries are wrapped in PL/PGSQL functions anyway, so I could
> make a second call to Postgres to find out sum(statcount) and then
> conditionally return based on that.
> 
> But is there a smarter way to do this out of a single SQL query ?
> 
> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as
> aggstat,statcount,short_name_en from stats_residence where
> aggstat>some_number;
> 
> But as I soon discovered that's not valid syntax! Hence ideas welcome
> from those smarter than me.
> 
> Thanks !
> 
> Laura
> 
> 
-- 
mvg,
Wim Bertels
--
Lector
UC Leuven-Limburg
--
My only love sprung from my only hate!
Too early seen unknown, and known too late!
        -- William Shakespeare, "Romeo and Juliet"


Re: Conditional return of aggregated data

From
"Ravi Krishna"
Date:
> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where
aggstat>some_number;

Am I missing something basic.  The above can be done using
GROUP BY and HAVING clause.



Re: Conditional return of aggregated data

From
"Ravi Krishna"
Date:
>
> > My initial idea was something along the lines of :
> >  select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where
aggstat>some_number;
>
> Am I missing something basic.  The above can be done using
> GROUP BY and HAVING clause.
or this
with t as
(select (select sum(statcount) from stats_residence) as aggstat,
         statcount,short_name_en from stats_residence
)
select * from t where aggstat > some_number

Apology if I did not understand the question correctly.



Re: Conditional return of aggregated data

From
Laura Smith
Date:
>
> or this
> with t as
> (select (select sum(statcount) from stats_residence) as aggstat,
> statcount,short_name_en from stats_residence
> )
> select * from t where aggstat > some_number
>
> Apology if I did not understand the question correctly.


Hi Ravi,

Thanks for helping show me the way.  You're quite right, a CTE did the trick

Laura



Re: Conditional return of aggregated data

From
Alban Hertroys
Date:

On Mon, 2 Dec 2019 at 12:11, Laura Smith <n5d9xq3ti233xiyif2vp@protonmail.ch> wrote:

My initial idea was something along the lines of :
 select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;
 
One option is to move the aggregate to the where-clause. If you also need the value in your select-list, you can just repeat the subselect there, usually the planner is smart enough to figure out that it can just re-use the result.

select short_name_en from stats_residence where (select sum(statcount) from stats_residence) >some_number;

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.