Thread: Not counting duplicates of declared pratition in OVER()-clause

Not counting duplicates of declared pratition in OVER()-clause

From
Andreas Joseph Krogh
Date:
Hi.
 
I'm trying to prevent duplicate values from being part of SUM().
 
(complete schema with INSERTs below)
 
I have this query to count all log-entries per activity per month in a sub-query, then adding a value from another table in the outer query, which I'd then like to sum but only count values from the same month once:
 
SELECT info.*, stuff.value + info.total_for_month AS new_value    , SUM(stuff.value + info.total_for_month) OVER () AS total_new_value_sum
FROM (         SELECT DISTINCT
               date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
             , log.activity_id             , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity             , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month         FROM
             log_entry log
         ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC
     ) AS info
    LEFT OUTER JOIN stuff ON info.month = stuff.month
;
 
What I want is this result:
 
monthactivity_idnum_logs_per_activitytotal_for_monthnew_valuetotal_new_value_sum
2017-01-01 00:00:00.00000014830141
2017-01-01 00:00:00.00000024830141
2017-02-01 00:00:00.00000011012111141
2017-02-01 00:00:00.0000002212111141
2017-03-01 00:00:00.000000111NULL141
 
But what I get is:
 
monthactivity_idnum_logs_per_activitytotal_for_monthnew_valuetotal_new_value_sum
2017-01-01 00:00:00.00000014830282
2017-01-01 00:00:00.00000024830282
2017-02-01 00:00:00.00000011012111282
2017-02-01 00:00:00.0000002212111282
2017-03-01 00:00:00.000000111NULL282
 
 
The problem is I don't know how to prevent every values in "new_value"-column from being included in the SUM().
 
I'd like something like this:
 
    , SUM(stuff.value + info.total_for_month) <distinct by month> AS total_new_value_sum
 
Any hints on how to accomplish this?
 
Here is the complete schema:
DROP TABLE IF EXISTS stuff;
DROP TABLE IF EXISTS log_entry;
CREATE TABLE log_entry(    entity_id SERIAL PRIMARY KEY,    start_date DATE NOT NULL,    activity_id BIGINT NOT NULL,    logged_for BIGINT NOT NULL
);

CREATE TABLE stuff(    entity_id SERIAL PRIMARY KEY,    month DATE NOT NULL UNIQUE,    value INTEGER NOT NULL
);

INSERT INTO log_entry(start_date, activity_id, logged_for)
VALUES ('2017-01-01', 1, 5)    , ('2017-01-02', 1, 5)    , ('2017-01-03', 2, 5)    , ('2017-01-04', 2, 5)    , ('2017-02-01', 1, 5)    , ('2017-02-01', 2, 5)    , ('2017-02-01', 1, 5)    , ('2017-02-02', 1, 5)    , ('2017-02-02', 1, 5)    , ('2017-02-03', 1, 5)    , ('2017-01-01', 1, 6)    , ('2017-01-02', 1, 6)    , ('2017-01-03', 2, 6)    , ('2017-01-04', 2, 6)    , ('2017-02-01', 1, 6)    , ('2017-02-01', 2, 6)    , ('2017-02-01', 1, 6)    , ('2017-02-02', 1, 6)    , ('2017-02-02', 1, 6)    , ('2017-02-03', 1, 6)    , ('2017-03-01', 1, 6);

INSERT INTO stuff(month, value) VALUES('2017-01-01', 22),('2017-02-01', 99);
 
Thanks in advance.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment

Sv: Not counting duplicates of declared pratition in OVER()-clause

From
Andreas Joseph Krogh
Date:
På tirsdag 28. november 2017 kl. 18:54:47, skrev Andreas Joseph Krogh <andreas@visena.com>:
Hi.
 
I'm trying to prevent duplicate values from being part of SUM().
 
(complete schema with INSERTs below)
 
I have this query to count all log-entries per activity per month in a sub-query, then adding a value from another table in the outer query, which I'd then like to sum but only count values from the same month once:
 
SELECT info.*, stuff.value + info.total_for_month AS new_value    , SUM(stuff.value + info.total_for_month) OVER () AS total_new_value_sum
FROM (         SELECT DISTINCT
               date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
             , log.activity_id             , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity             , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month         FROM
             log_entry log
         ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC
     ) AS info
    LEFT OUTER JOIN stuff ON info.month = stuff.month
;
 
It seems I forgot to turn my brain on, sorry.
 
This query gives me what I want (using row_number() and an outer query with FILTER on rownum=1):
 
SELECT q.*    , SUM(q.new_value) FILTER (WHERE q.rownum = 1) OVER() AS total_new_value_sum
FROM (
SELECT info.*, stuff.value + info.total_for_month AS new_value    , row_number() OVER (partition by info.month) as rownum
FROM (         SELECT DISTINCT
               date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) AS month
             , log.activity_id             , count(log.entity_id) OVER(partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE), log.activity_id) AS num_logs_per_activity             , count(log.entity_id) OVER (partition by date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE)) AS total_for_month         FROM
             log_entry log
         ORDER BY date_trunc('month', log.start_date::TIMESTAMP WITHOUT TIME ZONE) ASC, log.activity_id ASC
     ) AS info
    LEFT OUTER JOIN stuff ON info.month = stuff.month
     ) q
;
 
Gives:
monthactivity_idnum_logs_per_activitytotal_for_monthnew_valuerownumtotal_new_value_sum
2017-01-01 00:00:00.000000148301141
2017-01-01 00:00:00.000000248302141
2017-02-01 00:00:00.000000110121111141
2017-02-01 00:00:00.00000022121112141
2017-03-01 00:00:00.000000111NULL1141
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment