Thread: total and partial sums in the same query??
Is there a way to return total and partial sums (grouped by a third column) in the same query?
Total is an aggregate function i.e. COUNT(1), partial is some sort of conditional as in: CASE WHEN EXTRACT(MONTH FROM date) = 10 THEN COUNT(1) , ….
I've tried to Window functions https://www.postgresql.org/docs/9.1/tutorial-window.html however, it was not possible to recognize the partition
SELECT split_part(description, ' ', 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part(description, ' ', 25) WHERE EXTRACT(MONTH FROM creation_date::date) = 10 AS TotalOctober FROM qt_vehicle_ti GROUP BY type;
);
ERROR: syntax error at or near "split_part"
LINE 1: ... 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part...
The column “description" is manipulated with split_part to allow GROUP BY to sort and count by categories, which is one word among others within the description column, as in .
{id 7281 plate_number FRP380 first_seen {2020-07-15 14:50:26} last_seen {2020-07-15 14:50:26} probability 0.6 location_name Test camera_name LPR4 direction LEAVING class Car}
So, the result must be something like the result bellow
SELECT split_part(description, ' ', 25) AS type,
COUNT(1) AS total,
(
SELECT COUNT(1) as partial FROM qt_vehicle_ti v2 WHERE split_part(v2.description, ' ', 25) = split_part(description, ' ', 25) AND EXTRACT(MONTH FROM v2.creation_date::date) = 10
) AS partial
FROM qt_vehicle_ti GROUP BY type;
type | count | partial
------------+--------+--------------
Bus | 6702 | 8779
Car | 191761 | 8779
Motorbike | 3746 | 8779
SUV/Pickup | 22536 | 8779
Truck | 21801 | 8779
Unknown | 588341 | 8779
Van | 7951 | 8779
Best wishes,
I
On Fri, Oct 9, 2020 at 7:58 PM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Is there a way to return total and partial sums (grouped by a third column) in the same query?
Yes.
Total is an aggregate function i.e. COUNT(1), partial is some sort of conditional as in: CASE WHEN EXTRACT(MONTH FROM date) = 10 THEN COUNT(1) , ….I've tried to Window functions https://www.postgresql.org/docs/9.1/tutorial-window.html however, it was not possible to recognize the partition
You should observe the version numbers when viewing documentation and try and use either the most current docs or the version you are coding against. If you are indeed coding against 9.1 be advised it is considerably out-of-date.
SELECT split_part(description, ' ', 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part(description, ' ', 25) WHERE EXTRACT(MONTH FROM creation_date::date) = 10 AS TotalOctober FROM qt_vehicle_ti GROUP BY type;);ERROR: syntax error at or near "split_part"LINE 1: ... 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part...
You forget the keyword "BY" in "PARTITION BY". That explains the immediate syntax error message. You would get many more errors due to having made up the entire contents of the PARTITION BY portion of the window definition (partitions are not specified using a full select-like statement, the trailing semicolon in there is also a problem). You need to consult the SQL Command reference documentation, in this case SELECT, to get the full syntax for stuff - tutorials are not necessarily comprehensive.
The column “description" is manipulated with split_part to allow GROUP BY to sort and count by categories, which is one word among others within the description column, as in .{id 7281 plate_number FRP380 first_seen {2020-07-15 14:50:26} last_seen {2020-07-15 14:50:26} probability 0.6 location_name Test camera_name LPR4 direction LEAVING class Car}
This detail seems immaterial to the immediate question at hand. A self-contained problem (see WITH/CTE) with fewer complex expressions generally makes learning, and asking for help, easier.
So, the result must be something like the result bellow
SELECT split_part(description, ' ', 25) AS type,COUNT(1) AS total,(SELECT COUNT(1) as partial FROM qt_vehicle_ti v2 WHERE split_part(v2.description, ' ', 25) = split_part(description, ' ', 25) AND EXTRACT(MONTH FROM v2.creation_date::date) = 10) AS partialFROM qt_vehicle_ti GROUP BY type;type | count | partial------------+--------+--------------Bus | 6702 | 8779Car | 191761 | 8779
Motorbike | 3746 | 8779SUV/Pickup | 22536 | 8779
Truck | 21801 | 8779
Unknown | 588341 | 8779
Van | 7951 | 8779
What about the above example, assuming it is indeed something that works, is wrong?
Between subqueries, window functions, and group by you've got the tools pretty well identified. If you want help putting them together you should construct a simplified self-contained example and, using the provided input data, describe exactly what the output table needs to show.
David J.
Hi David,
RIGHT OUTER JOIN is the key!
TOTAL
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY type
OCTOBER
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type
FINAL
SELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total FROM qt_vehicle_ti v RIGHT OUTER JOIN ( SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1) AS t ON t.type1 = split_part(v.description, ' ', 25) GROUP BY type, partial
Let me know if you would use a different approach
p.s. It’s very self rewarding when you ask a question and you yourself answer it!
Best wishes,
I
On Saf. 23, 1442 AH, at 00:40, David G. Johnston <david.g.johnston@gmail.com> wrote:On Fri, Oct 9, 2020 at 7:58 PM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:Is there a way to return total and partial sums (grouped by a third column) in the same query?Yes.Total is an aggregate function i.e. COUNT(1), partial is some sort of conditional as in: CASE WHEN EXTRACT(MONTH FROM date) = 10 THEN COUNT(1) , ….I've tried to Window functions https://www.postgresql.org/docs/9.1/tutorial-window.html however, it was not possible to recognize the partitionYou should observe the version numbers when viewing documentation and try and use either the most current docs or the version you are coding against. If you are indeed coding against 9.1 be advised it is considerably out-of-date.SELECT split_part(description, ' ', 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part(description, ' ', 25) WHERE EXTRACT(MONTH FROM creation_date::date) = 10 AS TotalOctober FROM qt_vehicle_ti GROUP BY type;);ERROR: syntax error at or near "split_part"LINE 1: ... 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part...You forget the keyword "BY" in "PARTITION BY". That explains the immediate syntax error message. You would get many more errors due to having made up the entire contents of the PARTITION BY portion of the window definition (partitions are not specified using a full select-like statement, the trailing semicolon in there is also a problem). You need to consult the SQL Command reference documentation, in this case SELECT, to get the full syntax for stuff - tutorials are not necessarily comprehensive.The column “description" is manipulated with split_part to allow GROUP BY to sort and count by categories, which is one word among others within the description column, as in .{id 7281 plate_number FRP380 first_seen {2020-07-15 14:50:26} last_seen {2020-07-15 14:50:26} probability 0.6 location_name Test camera_name LPR4 direction LEAVING class Car}This detail seems immaterial to the immediate question at hand. A self-contained problem (see WITH/CTE) with fewer complex expressions generally makes learning, and asking for help, easier.So, the result must be something like the result bellowSELECT split_part(description, ' ', 25) AS type,COUNT(1) AS total,(SELECT COUNT(1) as partial FROM qt_vehicle_ti v2 WHERE split_part(v2.description, ' ', 25) = split_part(description, ' ', 25) AND EXTRACT(MONTH FROM v2.creation_date::date) = 10) AS partialFROM qt_vehicle_ti GROUP BY type;type | count | partial------------+--------+--------------Bus | 6702 | 8779Car | 191761 | 8779
Motorbike | 3746 | 8779SUV/Pickup | 22536 | 8779
Truck | 21801 | 8779
Unknown | 588341 | 8779
Van | 7951 | 8779What about the above example, assuming it is indeed something that works, is wrong?Between subqueries, window functions, and group by you've got the tools pretty well identified. If you want help putting them together you should construct a simplified self-contained example and, using the provided input data, describe exactly what the output table needs to show.David J.
On Friday, October 9, 2020, Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Hi David,RIGHT OUTER JOIN is the key!TOTALSELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY typeOCTOBERSELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY typeFINALSELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total FROM qt_vehicle_ti v RIGHT OUTER JOIN ( SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1) AS t ON t.type1 = split_part(v.description, ' ', 25) GROUP BY type, partialLet me know if you would use a different approach
The convention I try to observe when using outer joins is to use left join, not right (outer is implied). That said, you seem to have written a left join query since the totals, a superset of october, are on the left. Also, count(*) is my learned convention instead of count(1).
A simple conditional (filter) count would be much easier to understand and should be much faster:
Select type, count(*) as total_count, count(*) filter (where month=10) as m10_count from v_normalized_data group by type;
I added v_normalized because the expressions the decompose your data tend to be better placed in a view and your main queries focus just on their purpose and not structural data manipulation. Especially something expensive like duplicating split_part.
David J.
David,
Going further in our conversation. I went to PG's documentation and did a better research in order to come out with better solution/approaches.
Furthermore, based on your previous email, your words were “key” in the process. Thanks a lot!
“ … you should construct a simplified self-contained example and,… “
I reviewed the original query:
SELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total
FROM qt_vehicle_ti v
RIGHT OUTER JOIN (
SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial
FROM qt_vehicle_ti
WHERE EXTRACT(MONTH FROM creation_date) = 10
GROUP BY type1) AS t
ON t.type1 = split_part(v.description, ' ', 25)
GROUP BY type, partial
and rewrote it to the following one:
WITH
cte1 AS (SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1),
cte2 AS (SELECT split_part(description, ' ', 25) AS type2, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY type2)
SELECT type1, total, partial FROM cte1 JOIN cte2 ON cte1.type1 = cte2.type2;
Indeed! Performance is way better now. As well as readability, and less code written!
Nevertheless, I’m still reluctant to the necessity of using v_normalized. On the other hand my reluctancy comes from not entirely understanding this approach and the benefits of it.
Your words were:
“
A simple conditional (filter) count would be much easier to understand and should be much faster:
Select type, count(*) as total_count, count(*) filter (where month=10) as m10_count from v_normalized_data group by type;
I added v_normalized because the expressions the decompose your data tend to be better placed in a view and your main queries focus just on their purpose and not structural data manipulation. Especially something expensive like duplicating split_part.
"
So, feel free whether to go deeper continuing this discussion
Best wishes,
I
On Saf. 23, 1442 AH, at 00:58, David G. Johnston <david.g.johnston@gmail.com> wrote:On Friday, October 9, 2020, Iuri Sampaio <iuri.sampaio@gmail.com> wrote:Hi David,RIGHT OUTER JOIN is the key!TOTALSELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY typeOCTOBERSELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY typeFINALSELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total FROM qt_vehicle_ti v RIGHT OUTER JOIN ( SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1) AS t ON t.type1 = split_part(v.description, ' ', 25) GROUP BY type, partialLet me know if you would use a different approachThe convention I try to observe when using outer joins is to use left join, not right (outer is implied). That said, you seem to have written a left join query since the totals, a superset of october, are on the left. Also, count(*) is my learned convention instead of count(1).A simple conditional (filter) count would be much easier to understand and should be much faster:Select type, count(*) as total_count, count(*) filter (where month=10) as m10_count from v_normalized_data group by type;I added v_normalized because the expressions the decompose your data tend to be better placed in a view and your main queries focus just on their purpose and not structural data manipulation. Especially something expensive like duplicating split_part.David J.
On Sat, Oct 10, 2020 at 8:28 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
So, feel free whether to go deeper continuing this discussion
Specifically:
WITH v_normalized AS (SELECT split_part(description, ' ', 25) AS type1, EXTRACT(MONTH FROM creation_date) AS month FROM qt_vehicle_ti)
SELECT type1, count(*) as count_total, count(*) filter (where month=10) AS count_m10 FROM v_normalized GROUP BY type1;There is no benefit to having a join here.
Using WITH to compute v_normalized in just this query works but I suspect you will find you are writing something like that a lot which suggests it, or rather something similar but a bit more general, should be computed using CREATE VIEW.
David J.