Thread: Not counting duplicates of declared pratition in OVER()-clause
Hi.
I'm trying to prevent duplicate values from being part of SUM().
(complete schema with INSERTs below)
I have this query to count all log-entries per activity per month in a sub-query, then adding a value from another table in the outer query, which I'd then like to sum but only count values from the same month once:
SELECT info.*, stuff.value + info.total_for_month AS new_value , SUM(stuff.value + info.total_for_month) OVER () AS total_new_value_sum FROM ( SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC ) AS info LEFT OUTER JOIN stuff ON info.month = stuff.month ;
What I want is this result:
month | activity_id | num_logs_per_activity | total_for_month | new_value | total_new_value_sum |
---|---|---|---|---|---|
2017-01-01 00:00:00.000000 | 1 | 4 | 8 | 30 | 141 |
2017-01-01 00:00:00.000000 | 2 | 4 | 8 | 30 | 141 |
2017-02-01 00:00:00.000000 | 1 | 10 | 12 | 111 | 141 |
2017-02-01 00:00:00.000000 | 2 | 2 | 12 | 111 | 141 |
2017-03-01 00:00:00.000000 | 1 | 1 | 1 | NULL | 141 |
But what I get is:
month | activity_id | num_logs_per_activity | total_for_month | new_value | total_new_value_sum |
---|---|---|---|---|---|
2017-01-01 00:00:00.000000 | 1 | 4 | 8 | 30 | 282 |
2017-01-01 00:00:00.000000 | 2 | 4 | 8 | 30 | 282 |
2017-02-01 00:00:00.000000 | 1 | 10 | 12 | 111 | 282 |
2017-02-01 00:00:00.000000 | 2 | 2 | 12 | 111 | 282 |
2017-03-01 00:00:00.000000 | 1 | 1 | 1 | NULL | 282 |
The problem is I don't know how to prevent every values in "new_value"-column from being included in the SUM().
I'd like something like this:
, SUM(stuff.value + info.total_for_month) <distinct by month> AS total_new_value_sum
Any hints on how to accomplish this?
Here is the complete schema:
DROP TABLE IF EXISTS stuff; DROP TABLE IF EXISTS log_entry; CREATE TABLE log_entry( entity_id SERIAL PRIMARY KEY, start_date DATE NOT NULL, activity_id BIGINT NOT NULL, logged_for BIGINT NOT NULL ); CREATE TABLE stuff( entity_id SERIAL PRIMARY KEY, month DATE NOT NULL UNIQUE, value INTEGER NOT NULL ); INSERT INTO log_entry(start_date, activity_id, logged_for) VALUES ('2017-01-01', 1, 5) , ('2017-01-02', 1, 5) , ('2017-01-03', 2, 5) , ('2017-01-04', 2, 5) , ('2017-02-01', 1, 5) , ('2017-02-01', 2, 5) , ('2017-02-01', 1, 5) , ('2017-02-02', 1, 5) , ('2017-02-02', 1, 5) , ('2017-02-03', 1, 5) , ('2017-01-01', 1, 6) , ('2017-01-02', 1, 6) , ('2017-01-03', 2, 6) , ('2017-01-04', 2, 6) , ('2017-02-01', 1, 6) , ('2017-02-01', 2, 6) , ('2017-02-01', 1, 6) , ('2017-02-02', 1, 6) , ('2017-02-02', 1, 6) , ('2017-02-03', 1, 6) , ('2017-03-01', 1, 6); INSERT INTO stuff(month, value) VALUES('2017-01-01', 22),('2017-02-01', 99);
Thanks in advance.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
På tirsdag 28. november 2017 kl. 18:54:47, skrev Andreas Joseph Krogh <andreas@visena.com>:
Hi.I'm trying to prevent duplicate values from being part of SUM().(complete schema with INSERTs below)I have this query to count all log-entries per activity per month in a sub-query, then adding a value from another table in the outer query, which I'd then like to sum but only count values from the same month once:SELECT info.*, stuff.value + info.total_for_month AS new_value , SUM(stuff.value + info.total_for_month) OVER () AS total_new_value_sum FROM ( SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC ) AS info LEFT OUTER JOIN stuff ON info.month = stuff.month ;
It seems I forgot to turn my brain on, sorry.
This query gives me what I want (using row_number() and an outer query with FILTER on rownum=1):
SELECT q.* , SUM(q.new_value) FILTER (WHERE q.rownum = 1) OVER() AS total_new_value_sum FROM ( SELECT info.*, stuff.value + info.total_for_month AS new_value , row_number() OVER (partition by info.month) as rownum FROM ( SELECT DISTINCT date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month , log.activity_id , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month FROM log_entry log ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC ) AS info LEFT OUTER JOIN stuff ON info.month = stuff.month ) q ;
Gives:
month | activity_id | num_logs_per_activity | total_for_month | new_value | rownum | total_new_value_sum |
---|---|---|---|---|---|---|
2017-01-01 00:00:00.000000 | 1 | 4 | 8 | 30 | 1 | 141 |
2017-01-01 00:00:00.000000 | 2 | 4 | 8 | 30 | 2 | 141 |
2017-02-01 00:00:00.000000 | 1 | 10 | 12 | 111 | 1 | 141 |
2017-02-01 00:00:00.000000 | 2 | 2 | 12 | 111 | 2 | 141 |
2017-03-01 00:00:00.000000 | 1 | 1 | 1 | NULL | 1 | 141 |
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963