Thread: [SQL] Problems with PARTITION BY with count() in window-func
Hi all.
I'm trying to count() all log-entries per activity per month in a separate column using count() over(partition by ...) but get an error I don't understand.
Here's the schema:
CREATE TABLE log_entry( entity_id SERIAL PRIMARY KEY, start_date DATE NOT NULL, activity_id BIGINT NOT NULL ); INSERT INTO log_entry(start_date, activity_id) VALUES ('2017-01-01', 1) , ('2017-01-02', 1) , ('2017-01-03', 2) , ('2017-01-04', 2) , ('2017-02-01', 1) , ('2017-02-01', 2) , ('2017-02-01', 1) , ('2017-02-02', 1) , ('2017-02-02', 1) , ('2017-02-03', 1); When issuing this query: select date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) AS num_logs , count(log.entity_id) OVER total_for_month_window AS total_for_month FROM log_entry log WHERE 1 = 1 AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE GROUP BY month, log.activity_id, log.entity_id WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC ;
I get this:
month | activity_id | num_logs | total_for_month |
---|---|---|---|
2017-01-01 00:00:00.000000 | 1 | 1 | 4 |
2017-01-01 00:00:00.000000 | 1 | 1 | 4 |
2017-01-01 00:00:00.000000 | 2 | 1 | 4 |
2017-01-01 00:00:00.000000 | 2 | 1 | 4 |
2017-02-01 00:00:00.000000 | 1 | 1 | 6 |
2017-02-01 00:00:00.000000 | 2 | 1 | 6 |
2017-02-01 00:00:00.000000 | 1 | 1 | 6 |
2017-02-01 00:00:00.000000 | 1 | 1 | 6 |
2017-02-01 00:00:00.000000 | 1 | 1 | 6 |
2017-02-01 00:00:00.000000 | 1 | 1 | 6 |
But what I really want is this result:
month | activity_id | num_logs | total_for_month |
2017-01-01 00:00:00.000000 | 1 | 2 | 4 |
2017-01-01 00:00:00.000000 | 2 | 2 | 4 |
2017-02-01 00:00:00.000000 | 1 | 5 | 6 |
2017-02-01 00:00:00.000000 | 2 | 1 | 6 |
that is, not having to group by log.entity_id, trying like this:
select date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) AS num_logs , count(log.entity_id) OVER total_for_month_window AS total_for_month FROM log_entry log WHERE 1 = 1 AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE GROUP BY month, log.activity_id WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC ;
Note the missing log.entity_id in GROUP BY.
but this gives the error:
[42803] column "log.entity_id" must appear in the GROUP BY clause or be used in an aggregate function
Details
Details
Any idea how to solve this?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
På onsdag 08. november 2017 kl. 11:17:39, skrev Andreas Joseph Krogh <andreas@visena.com>:
Hi all.I'm trying to count() all log-entries per activity per month in a separate column using count() over(partition by ...) but get an error I don't understand.[snip]select date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) AS num_logs , count(log.entity_id) OVER total_for_month_window AS total_for_month FROM log_entry log WHERE 1 = 1 AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE GROUP BY month, log.activity_id WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC ;Note the missing log.entity_id in GROUP BY.but this gives the error:[42803] column "log.entity_id" must appear in the GROUP BY clause or be used in an aggregate function
Details
This query gives the desired results:
SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC ;
month | activity_id | num_logs_per_activity | total_for_month |
---|---|---|---|
2017-01-01 00:00:00.000000 | 1 | 2 | 4 |
2017-01-01 00:00:00.000000 | 2 | 2 | 4 |
2017-02-01 00:00:00.000000 | 1 | 5 | 6 |
2017-02-01 00:00:00.000000 | 2 | 1 | 6 |
But I'd like a solution without the DISTINCT, if one exists?
If I introduce a new column, logged_for, and want to list number of logged entries per person per activity I can use this:
SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.logged_for , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for, log.activity_id) AS num_logs_per_person_for_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for) AS total_for_person_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC ;
Which gives:
month | logged_for | activity_id | num_logs_per_person_for_activity | total_for_person_for_month |
---|---|---|---|---|
2017-01-01 00:00:00.000000 | 5 | 1 | 2 | 4 |
2017-01-01 00:00:00.000000 | 5 | 2 | 2 | 4 |
2017-01-01 00:00:00.000000 | 6 | 1 | 2 | 4 |
2017-01-01 00:00:00.000000 | 6 | 2 | 2 | 4 |
2017-02-01 00:00:00.000000 | 5 | 1 | 5 | 6 |
2017-02-01 00:00:00.000000 | 5 | 2 | 1 | 6 |
2017-02-01 00:00:00.000000 | 6 | 1 | 5 | 6 |
2017-02-01 00:00:00.000000 | 6 | 2 | 1 | 6 |
Is this the recommended way to do this, I mean - having "group by" in the "partition by" belonging to the OVER()-clause of the count-aggregates?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
a solution without distinct:
select * from (
select
date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month,
log.activity_id,
count(log.entity_id) over (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), activity_id),
count(log.entity_id) over (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))
FROM
log_entry log
order by 1,2) as x
group by 1,2,3,4
order by 1,2;
If one think of introducing new column i prefer using grouping sets and rollup.
If you do so, an enclosing query will help to filter just the subtotals you need.
Regards, Martin
select * from (
select
date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month,
log.activity_id,
count(log.entity_id) over (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), activity_id),
count(log.entity_id) over (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))
FROM
log_entry log
order by 1,2) as x
group by 1,2,3,4
order by 1,2;
If one think of introducing new column i prefer using grouping sets and rollup.
If you do so, an enclosing query will help to filter just the subtotals you need.
Regards, Martin
Am 08.11.2017 um 14:50 schrieb Andreas Joseph Krogh:
På onsdag 08. november 2017 kl. 11:17:39, skrev Andreas Joseph Krogh <andreas@visena.com>:Hi all.I'm trying to count() all log-entries per activity per month in a separate column using count() over(partition by ...) but get an error I don't understand.[snip]select date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) AS num_logs , count(log.entity_id) OVER total_for_month_window AS total_for_month FROM log_entry log WHERE 1 = 1 AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE GROUP BY month, log.activity_id WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC ;Note the missing log.entity_id in GROUP BY.but this gives the error:[42803] column "log.entity_id" must appear in the GROUP BY clause or be used in an aggregate function
DetailsThis query gives the desired results:SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC ;
month activity_id num_logs_per_activity total_for_month 2017-01-01 00:00:00.000000 1 2 4 2017-01-01 00:00:00.000000 2 2 4 2017-02-01 00:00:00.000000 1 5 6 2017-02-01 00:00:00.000000 2 1 6 But I'd like a solution without the DISTINCT, if one exists?If I introduce a new column, logged_for, and want to list number of logged entries per person per activity I can use this:SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.logged_for , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for, log.activity_id) AS num_logs_per_person_for_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for) AS total_for_person_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC ;Which gives:
month logged_for activity_id num_logs_per_person_for_activity total_for_person_for_month 2017-01-01 00:00:00.000000 5 1 2 4 2017-01-01 00:00:00.000000 5 2 2 4 2017-01-01 00:00:00.000000 6 1 2 4 2017-01-01 00:00:00.000000 6 2 2 4 2017-02-01 00:00:00.000000 5 1 5 6 2017-02-01 00:00:00.000000 5 2 1 6 2017-02-01 00:00:00.000000 6 1 5 6 2017-02-01 00:00:00.000000 6 2 1 6 Is this the recommended way to do this, I mean - having "group by" in the "partition by" belonging to the OVER()-clause of the count-aggregates?Thanks.--Andreas Joseph KroghCTO / Partner - Visena ASMobile: +47 909 56 963
-- Widdersdorfer Str. 415, 50933 Köln; Tel. +49 / 221 / 9544 010 HRB Köln HRB 75439, Geschäftsführer: S. Böhland, S. Rosenbauer