Thread: [SQL] Problems with PARTITION BY with count() in window-func

[SQL] Problems with PARTITION BY with count() in window-func

From

Andreas Joseph Krogh

Date:

08 November 2017, 13:17:39

Hi all.

I'm trying to count() all log-entries per activity per month in a separate column using count() over(partition by ...) but get an error I don't understand.

Here's the schema:

CREATE TABLE log_entry(   entity_id SERIAL PRIMARY KEY,   start_date DATE NOT NULL,   activity_id BIGINT NOT NULL
);

INSERT INTO log_entry(start_date, activity_id)
VALUES ('2017-01-01', 1)
, ('2017-01-02', 1)
, ('2017-01-03', 2)
, ('2017-01-04', 2)
, ('2017-02-01', 1)
, ('2017-02-01', 2)
, ('2017-02-01', 1)
, ('2017-02-02', 1)
, ('2017-02-02', 1)
, ('2017-02-03', 1);



When issuing this query:


select
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.activity_id   , count(log.entity_id) AS num_logs   , count(log.entity_id) OVER total_for_month_window AS total_for_month
FROM
    log_entry log
WHERE 1 = 1
      AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE
GROUP BY month, log.activity_id, log.entity_id
WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))

ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC
;

I get this:

month	activity_id	num_logs	total_for_month
2017-01-01 00:00:00.000000	1	1	4
2017-01-01 00:00:00.000000	1	1	4
2017-01-01 00:00:00.000000	2	1	4
2017-01-01 00:00:00.000000	2	1	4
2017-02-01 00:00:00.000000	1	1	6
2017-02-01 00:00:00.000000	2	1	6
2017-02-01 00:00:00.000000	1	1	6
2017-02-01 00:00:00.000000	1	1	6
2017-02-01 00:00:00.000000	1	1	6
2017-02-01 00:00:00.000000	1	1	6

But what I really want is this result:

month	activity_id	num_logs	total_for_month
2017-01-01 00:00:00.000000	1	2	4
2017-01-01 00:00:00.000000	2	2	4
2017-02-01 00:00:00.000000	1	5	6
2017-02-01 00:00:00.000000	2	1	6

that is, not having to group by log.entity_id, trying like this:


select
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.activity_id   , count(log.entity_id) AS num_logs   , count(log.entity_id) OVER total_for_month_window AS total_for_month
FROM
    log_entry log
WHERE 1 = 1
      AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE
GROUP BY month, log.activity_id
WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))

ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC
;

Note the missing log.entity_id in GROUP BY.

but this gives the error:

[42803] column "log.entity_id" must appear in the GROUP BY clause or be used in an aggregate function
Details

Any idea how to solve this?

Thanks.

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Re: [SQL] Problems with PARTITION BY with count() in window-func

From

Andreas Joseph Krogh

Date:

08 November 2017, 16:50:20

På onsdag 08. november 2017 kl. 11:17:39, skrev Andreas Joseph Krogh <andreas@visena.com>:

Hi all.

I'm trying to count() all log-entries per activity per month in a separate column using count() over(partition by ...) but get an error I don't understand.
[snip]
select
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.activity_id   , count(log.entity_id) AS num_logs   , count(log.entity_id) OVER total_for_month_window AS total_for_month
FROM
    log_entry log
WHERE 1 = 1
      AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE
GROUP BY month, log.activity_id
WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))

ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC
;
Note the missing log.entity_id in GROUP BY.

but this gives the error:

[42803] column "log.entity_id" must appear in the GROUP BY clause or be used in an aggregate function
Details

This query gives the desired results:

SELECT DISTINCT
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.activity_id   , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity   , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month
FROM
    log_entry log
ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC
;

month	activity_id	num_logs_per_activity	total_for_month
2017-01-01 00:00:00.000000	1	2	4
2017-01-01 00:00:00.000000	2	2	4
2017-02-01 00:00:00.000000	1	5	6
2017-02-01 00:00:00.000000	2	1	6

But I'd like a solution without the DISTINCT, if one exists?

If I introduce a new column, logged_for, and want to list number of logged entries per person per activity I can use this:

SELECT DISTINCT
    date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.logged_for   , log.activity_id   , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for, log.activity_id) AS num_logs_per_person_for_activity   , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for) AS total_for_person_for_month
FROM
    log_entry log
ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC
;

Which gives:

month	logged_for	activity_id	num_logs_per_person_for_activity	total_for_person_for_month
2017-01-01 00:00:00.000000	5	1	2	4
2017-01-01 00:00:00.000000	5	2	2	4
2017-01-01 00:00:00.000000	6	1	2	4
2017-01-01 00:00:00.000000	6	2	2	4
2017-02-01 00:00:00.000000	5	1	5	6
2017-02-01 00:00:00.000000	5	2	1	6
2017-02-01 00:00:00.000000	6	1	5	6
2017-02-01 00:00:00.000000	6	2	1	6

Is this the recommended way to do this, I mean - having "group by" in the "partition by" belonging to the OVER()-clause of the count-aggregates?

Thanks.

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Re: [SQL] Problems with PARTITION BY with count() in window-func

From

"MS (direkt)"

Date:

08 November 2017, 17:51:05

a solution without distinct:

select * from (
select
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month,
    log.activity_id,
    count(log.entity_id) over (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), activity_id),
    count(log.entity_id) over (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))
FROM
    log_entry log
order by 1,2) as x
group by 1,2,3,4
order by 1,2;

If one think of introducing new column i prefer using grouping sets and rollup.
If you do so, an enclosing query will help to filter just the subtotals you need.

Regards, Martin

Am 08.11.2017 um 14:50 schrieb Andreas Joseph Krogh:

På onsdag 08. november 2017 kl. 11:17:39, skrev Andreas Joseph Krogh <andreas@visena.com>:
Hi all.

I'm trying to count() all log-entries per activity per month in a separate column using count() over(partition by ...) but get an error I don't understand.
[snip]
select
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.activity_id   , count(log.entity_id) AS num_logs   , count(log.entity_id) OVER total_for_month_window AS total_for_month
FROM
    log_entry log
WHERE 1 = 1
      AND date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) BETWEEN '2017-01-01'::DATE AND '2017-05-01'::DATE
GROUP BY month, log.activity_id
WINDOW total_for_month_window AS (PARTITION BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE))

ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC
;
Note the missing log.entity_id in GROUP BY.

but this gives the error:

[42803] column "log.entity_id" must appear in the GROUP BY clause or be used in an aggregate function
Details
This query gives the desired results:
SELECT DISTINCT
      date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.activity_id   , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity   , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month
FROM
    log_entry log
ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC
;
month activity_id num_logs_per_activity total_for_month
2017-01-01 00:00:00.000000 1 2 4
2017-01-01 00:00:00.000000 2 2 4
2017-02-01 00:00:00.000000 1 5 6
2017-02-01 00:00:00.000000 2 1 6

But I'd like a solution without the DISTINCT, if one exists?

If I introduce a new column, logged_for, and want to list number of logged entries per person per activity I can use this:
SELECT DISTINCT
    date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
    , log.logged_for   , log.activity_id   , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for, log.activity_id) AS num_logs_per_person_for_activity   , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.logged_for) AS total_for_person_for_month
FROM
    log_entry log
ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC
;
Which gives:
month logged_for activity_id num_logs_per_person_for_activity total_for_person_for_month
2017-01-01 00:00:00.000000 5 1 2 4
2017-01-01 00:00:00.000000 5 2 2 4
2017-01-01 00:00:00.000000 6 1 2 4
2017-01-01 00:00:00.000000 6 2 2 4
2017-02-01 00:00:00.000000 5 1 5 6
2017-02-01 00:00:00.000000 5 2 1 6
2017-02-01 00:00:00.000000 6 1 5 6
2017-02-01 00:00:00.000000 6 2 1 6

Is this the recommended way to do this, I mean - having "group by" in the "partition by" belonging to the OVER()-clause of the count-aggregates?

Thanks.

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com
www.visena.com

-- 
Widdersdorfer Str. 415, 50933 Köln; Tel. +49 / 221 / 9544 010
HRB Köln HRB 75439, Geschäftsführer: S. Böhland, S. Rosenbauer