Thread: total and partial sums in the same query??

total and partial sums in the same query??

From
Iuri Sampaio
Date:
Is there a way to return total and partial sums (grouped by a third column) in the same query?  

Total is an aggregate function i.e. COUNT(1),  partial is some sort of conditional as in: CASE  WHEN EXTRACT(MONTH FROM date) = 10 THEN COUNT(1) , ….

I've tried to Window functions https://www.postgresql.org/docs/9.1/tutorial-window.html however, it was not possible to recognize the partition


SELECT split_part(description, ' ', 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part(description, ' ', 25) WHERE EXTRACT(MONTH FROM creation_date::date) = 10 AS TotalOctober FROM qt_vehicle_ti GROUP BY type;
);
ERROR:  syntax error at or near "split_part"
LINE 1: ... 25) AS type, COUNT(1), COUNT(1) OVER  (PARTITION split_part...



The column “description" is manipulated with split_part to allow GROUP BY to sort and count by categories, which is one word among others within the description column, as in .

{id 7281 plate_number FRP380 first_seen {2020-07-15 14:50:26} last_seen {2020-07-15 14:50:26} probability 0.6 location_name Test camera_name LPR4 direction LEAVING class Car}


So, the result must be something like the result bellow


SELECT split_part(description, ' ', 25) AS type, 
COUNT(1) AS total,   
     SELECT COUNT(1) as partial FROM qt_vehicle_ti v2 WHERE split_part(v2.description, ' ', 25) = split_part(description, ' ', 25) AND EXTRACT(MONTH FROM v2.creation_date::date) = 10
) AS partial 
FROM qt_vehicle_ti GROUP BY type;




   type    | count  | partial 
------------+--------+--------------
Bus        |   6702 |         8779
Car        | 191761 |         8779

Motorbike  |   3746 |         8779
SUV/Pickup |  22536 |         8779

Truck      |  21801 |         8779

Unknown    | 588341 |         8779

Van        |   7951 |         8779


Best wishes,
I

Re: total and partial sums in the same query??

From
"David G. Johnston"
Date:
On Fri, Oct 9, 2020 at 7:58 PM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Is there a way to return total and partial sums (grouped by a third column) in the same query?  

Yes.

Total is an aggregate function i.e. COUNT(1),  partial is some sort of conditional as in: CASE  WHEN EXTRACT(MONTH FROM date) = 10 THEN COUNT(1) , ….

I've tried to Window functions https://www.postgresql.org/docs/9.1/tutorial-window.html however, it was not possible to recognize the partition

You should observe the version numbers when viewing documentation and try and use either the most current docs or the version you are coding against.  If you are indeed coding against 9.1 be advised it is considerably out-of-date.


SELECT split_part(description, ' ', 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part(description, ' ', 25) WHERE EXTRACT(MONTH FROM creation_date::date) = 10 AS TotalOctober FROM qt_vehicle_ti GROUP BY type;
);
ERROR:  syntax error at or near "split_part"
LINE 1: ... 25) AS type, COUNT(1), COUNT(1) OVER  (PARTITION split_part...


You forget the keyword "BY" in "PARTITION BY".  That explains the immediate syntax error message.  You would get many more errors due to having made up the entire contents of the PARTITION BY portion of the window definition (partitions are not specified using a full select-like statement, the trailing semicolon in there is also a problem).  You need to consult the SQL Command reference documentation, in this case SELECT, to get the full syntax for stuff - tutorials are not necessarily comprehensive.


The column “description" is manipulated with split_part to allow GROUP BY to sort and count by categories, which is one word among others within the description column, as in .

{id 7281 plate_number FRP380 first_seen {2020-07-15 14:50:26} last_seen {2020-07-15 14:50:26} probability 0.6 location_name Test camera_name LPR4 direction LEAVING class Car}

This detail seems immaterial to the immediate question at hand.  A self-contained problem (see WITH/CTE) with fewer complex expressions generally makes learning, and asking for help, easier.
 
So, the result must be something like the result bellow

SELECT split_part(description, ' ', 25) AS type, 
COUNT(1) AS total,   
     SELECT COUNT(1) as partial FROM qt_vehicle_ti v2 WHERE split_part(v2.description, ' ', 25) = split_part(description, ' ', 25) AND EXTRACT(MONTH FROM v2.creation_date::date) = 10
) AS partial 
FROM qt_vehicle_ti GROUP BY type;




   type    | count  | partial 
------------+--------+--------------
Bus        |   6702 |         8779
Car        | 191761 |         8779

Motorbike  |   3746 |         8779
SUV/Pickup |  22536 |         8779

Truck      |  21801 |         8779

Unknown    | 588341 |         8779

Van        |   7951 |         8779



What about the above example, assuming it is indeed something that works, is wrong?
Between subqueries, window functions, and group by you've got the tools pretty well identified.  If you want help putting them together you should construct a simplified self-contained example and, using the provided input data, describe exactly what the output table needs to show.

David J.

Re: total and partial sums in the same query??

From
Iuri Sampaio
Date:
Hi David, 

RIGHT OUTER JOIN is the key!

TOTAL
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY type

OCTOBER
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type


FINAL 
SELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total FROM qt_vehicle_ti v RIGHT OUTER JOIN ( SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1) AS t ON t.type1 = split_part(v.description, ' ', 25) GROUP BY type, partial
 


Let me know if you would use a different approach




p.s. It’s very self rewarding when you ask a question and you  yourself answer it!

Best wishes,
I


On Saf. 23, 1442 AH, at 00:40, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Fri, Oct 9, 2020 at 7:58 PM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Is there a way to return total and partial sums (grouped by a third column) in the same query?  

Yes.

Total is an aggregate function i.e. COUNT(1),  partial is some sort of conditional as in: CASE  WHEN EXTRACT(MONTH FROM date) = 10 THEN COUNT(1) , ….

I've tried to Window functions https://www.postgresql.org/docs/9.1/tutorial-window.html however, it was not possible to recognize the partition

You should observe the version numbers when viewing documentation and try and use either the most current docs or the version you are coding against.  If you are indeed coding against 9.1 be advised it is considerably out-of-date.


SELECT split_part(description, ' ', 25) AS type, COUNT(1), COUNT(1) OVER (PARTITION split_part(description, ' ', 25) WHERE EXTRACT(MONTH FROM creation_date::date) = 10 AS TotalOctober FROM qt_vehicle_ti GROUP BY type;
);
ERROR:  syntax error at or near "split_part"
LINE 1: ... 25) AS type, COUNT(1), COUNT(1) OVER  (PARTITION split_part...


You forget the keyword "BY" in "PARTITION BY".  That explains the immediate syntax error message.  You would get many more errors due to having made up the entire contents of the PARTITION BY portion of the window definition (partitions are not specified using a full select-like statement, the trailing semicolon in there is also a problem).  You need to consult the SQL Command reference documentation, in this case SELECT, to get the full syntax for stuff - tutorials are not necessarily comprehensive.


The column “description" is manipulated with split_part to allow GROUP BY to sort and count by categories, which is one word among others within the description column, as in .

{id 7281 plate_number FRP380 first_seen {2020-07-15 14:50:26} last_seen {2020-07-15 14:50:26} probability 0.6 location_name Test camera_name LPR4 direction LEAVING class Car}

This detail seems immaterial to the immediate question at hand.  A self-contained problem (see WITH/CTE) with fewer complex expressions generally makes learning, and asking for help, easier.
 
So, the result must be something like the result bellow

SELECT split_part(description, ' ', 25) AS type, 
COUNT(1) AS total,   
     SELECT COUNT(1) as partial FROM qt_vehicle_ti v2 WHERE split_part(v2.description, ' ', 25) = split_part(description, ' ', 25) AND EXTRACT(MONTH FROM v2.creation_date::date) = 10
) AS partial 
FROM qt_vehicle_ti GROUP BY type;




   type    | count  | partial 
------------+--------+--------------
Bus        |   6702 |         8779
Car        | 191761 |         8779

Motorbike  |   3746 |         8779
SUV/Pickup |  22536 |         8779

Truck      |  21801 |         8779

Unknown    | 588341 |         8779

Van        |   7951 |         8779



What about the above example, assuming it is indeed something that works, is wrong?
Between subqueries, window functions, and group by you've got the tools pretty well identified.  If you want help putting them together you should construct a simplified self-contained example and, using the provided input data, describe exactly what the output table needs to show.

David J.

Re: total and partial sums in the same query??

From
"David G. Johnston"
Date:
On Friday, October 9, 2020, Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Hi David, 

RIGHT OUTER JOIN is the key!

TOTAL
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY type

OCTOBER
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type


FINAL 
SELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total FROM qt_vehicle_ti v RIGHT OUTER JOIN ( SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1) AS t ON t.type1 = split_part(v.description, ' ', 25) GROUP BY type, partial
 


Let me know if you would use a different approach

The convention I try to observe when using outer joins is to use left join, not right (outer is implied).  That said, you seem to have written a left join query since the totals, a superset of october, are on the left.  Also, count(*) is my learned convention instead of count(1).

A simple conditional (filter) count would be much easier to understand and should be much faster:

Select type, count(*) as total_count, count(*) filter (where month=10) as m10_count from v_normalized_data group by type;


I added v_normalized because the expressions the decompose your data tend to be better placed in a view and your main queries focus just on their purpose and not structural data manipulation.  Especially something expensive like duplicating split_part.

David J.


Re: total and partial sums in the same query??

From
Iuri Sampaio
Date:
David,

Going further in our conversation. I went to PG's documentation and did a better research in order to come out with better solution/approaches. 

Furthermore, based on your previous email, your words were “key” in the process. Thanks a lot!

“ … you should construct a simplified self-contained example and,… 


reviewed the original query:

SELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total 
FROM qt_vehicle_ti v 
RIGHT OUTER JOIN ( 
SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial 
FROM qt_vehicle_ti 
WHERE EXTRACT(MONTH FROM creation_date) = 10
GROUP BY type1) AS t 
ON t.type1 = split_part(v.description, ' ', 25) 
GROUP BY type, partial
 



and rewrote it to the following one:


WITH 
cte1 AS (SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1), 
cte2 AS (SELECT split_part(description, ' ', 25) AS type2, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY type2) 
SELECT type1, total, partial FROM cte1 JOIN cte2 ON cte1.type1 = cte2.type2;


Indeed! Performance is way better now. As well as readability, and less code written!

Nevertheless, I’m still reluctant to the necessity of using v_normalized. On the other hand my reluctancy comes from not entirely understanding this approach and the benefits of it.


Your words were:  
“ 
A simple conditional (filter) count would be much easier to understand and should be much faster:

Select type, count(*) as total_count, count(*) filter (where month=10) as m10_count from v_normalized_data group by type;


I added v_normalized because the expressions the decompose your data tend to be better placed in a view and your main queries focus just on their purpose and not structural data manipulation.  Especially something expensive like duplicating split_part.
"


So, feel free whether to go deeper continuing this discussion 

Best wishes,
I



On Saf. 23, 1442 AH, at 00:58, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Friday, October 9, 2020, Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Hi David, 

RIGHT OUTER JOIN is the key!

TOTAL
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti GROUP BY type

OCTOBER
SELECT split_part(description, ' ', 25) AS type, COUNT(1) AS total FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type


FINAL 
SELECT split_part(v.description, ' ', 25) AS type, t.partial, COUNT(1) AS total FROM qt_vehicle_ti v RIGHT OUTER JOIN ( SELECT split_part(description, ' ', 25) AS type1, COUNT(1) AS partial FROM qt_vehicle_ti WHERE EXTRACT(MONTH FROM creation_date) = 10 GROUP BY type1) AS t ON t.type1 = split_part(v.description, ' ', 25) GROUP BY type, partial
 


Let me know if you would use a different approach

The convention I try to observe when using outer joins is to use left join, not right (outer is implied).  That said, you seem to have written a left join query since the totals, a superset of october, are on the left.  Also, count(*) is my learned convention instead of count(1).

A simple conditional (filter) count would be much easier to understand and should be much faster:

Select type, count(*) as total_count, count(*) filter (where month=10) as m10_count from v_normalized_data group by type;


I added v_normalized because the expressions the decompose your data tend to be better placed in a view and your main queries focus just on their purpose and not structural data manipulation.  Especially something expensive like duplicating split_part.

David J.



Re: total and partial sums in the same query??

From
"David G. Johnston"
Date:
On Sat, Oct 10, 2020 at 8:28 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
So, feel free whether to go deeper continuing this discussion 

Specifically:

WITH v_normalized AS (SELECT split_part(description, ' ', 25) AS type1, EXTRACT(MONTH FROM creation_date) AS month FROM qt_vehicle_ti)
SELECT type1, count(*) as count_total, count(*) filter (where month=10) AS count_m10 FROM v_normalized GROUP BY type1;

There is no benefit to having a join here.

Using WITH to compute v_normalized in just this query works but I suspect you will find you are writing something like that a lot which suggests it, or rather something similar but a bit more general, should be computed using CREATE VIEW.

David J.