Thread: Understanding aggregation and window functions.

Understanding aggregation and window functions.

From
Pól Ua Laoínecháin
Date:
Hi all,

I have three tables: theatre, show and ticket - all DDL and DML
available at the bottom of this post and on a fiddle here:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529

I've kept things very simple:

theatre (theatre_id) is an FK in show (show_theatre_id)

show (show_id) is an FK in ticket (ticket_show_id)


So, I start off:

SELECT
  th.theatre_id, th.theatre_name AS th_name,
  s.show_id AS s_id, s.show_theatre_id AS s_th, s.show_name AS s_name,
  t.ticket_id AS t_id, t.ticket_show_id AS t_show, t.price AS t_p
FROM
  theatre th
JOIN show s ON th.theatre_id = s.show_theatre_id
JOIN ticket t ON s.show_id = t.ticket_show_id;

And I get 29 rows returned - (not all shown) - 1 for every ticket.

theatre_id th_name s_id s_th s_name t_id t_show t_p
1               Theatre_1 10      1 show_10   1         10 10
1               Theatre_1 10      1 show_10   2         10 10

That's all fine.

So, now I run:

SELECT
  th.theatre_name AS "Theatre name", s.show_name AS "Show name",
  SUM(t.price) AS "Total sales"

FROM
  theatre th
JOIN show s ON th.theatre_id = s.show_theatre_id
JOIN ticket t ON s.show_id = t.ticket_show_id
GROUP BY th.theatre_name, s.show_name
ORDER BY th.theatre_name, s.show_name;


And obtain:

Theatre name    Show name   Sales/show
      Theatre_1         show_10             2050
      Theatre_1          show_11                77
      Theatre_2         show_20               200
      Theatre_2          show_21                63

Again, fine! But what I want is (theatre_1 2127) and (theatre_2, 263).

I would also appreciate *_explanations_*, references, URLs... a bit
more than just the correct SQL would be great!

But, now I want to get the total sales/per theatre in the same query,
so I try adding the line (see fiddle):

  SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.theatre_id),

and I receive the error:

ERROR: column "t.price" must appear in the GROUP BY clause or be used
in an aggregate function LINE 8: SUM(t.price) OVER (PARTITION BY
th.theatre_id ORDER BY th....

Now, I don't understand why I'm getting the error - I *_already_* have
t.price in an aggregate function - i.e. the simple SUM without OVER
()?

So, I ran another query to experiment:

SELECT
  th.theatre_name AS "T. name", s.show_id AS "Show name", t.price AS
"Tkt price",

  SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T.",

  SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id,
s.show_id) AS "Sales/show",

  COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id) "No. of
difft tkts/show",

  SUM(t.price) AS "Tot sales/ticket"

FROM
  theatre th
JOIN show s ON th.theatre_id = s.show_theatre_id
JOIN ticket t ON s.show_id = t.ticket_show_id
GROUP BY th.theatre_id, th.theatre_name, s.show_id, t.price
ORDER BY th.theatre_name, s.show_id, t.price DESC;


So, the result of this is:

      T. name   Show name   Tkt price    Sales/T.    Sales/show
tkts/show     Tot sales/ticket
   Theatre_1                 10           500          521
    510                2                     2000
   Theatre_1                 10             10          521
     510                2                         50
   Theatre_1                 11             11           521
        11                1                         77
   Theatre_2                 20             20            41
        20                1                       200
   Theatre_2                 21             21            41
        21                1                         63


I hope this "chart" comes out in the formatting - if not, check out
the fiddle - it's far more readable there!

So,

- t.price AS "Tkt price" gives me 5 prices - i.e. for show  10, there
are two types available, 10 and 500 € (or whatever...) and 5 overall
(1 each for the others) - OK!

- SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T." gives
me the total of the prices for the individual different tickets by
theatre - i.e. the sum of 500 + 10 + 11 for theatre 1 - that's OK

So, I'm aggregating over the theatres... but for individual tickets
and not the total?


 - SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id,
s.show_id) AS "Sales/show" give me the sum of the prices of the
individual tickets per show - 500 + 10 for theatre 1, show 1 - OK

 - COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id)
"Tkts/show", give 2 for shows 10 and 1 for the others - which is OK

 - We've already seen the final result above

What I want is

Theatre_1 2127
Theatre_2 263

i.e. the total sales overall for all shows by theatre - and I just
can't seem to understand why my various partitioning and ordering
attempts aren't working out!

** BONUS QUESTION **

I would also like to include the number of tickets sold by price point
and the number of tickets sold per theatre and per show - this might
involve a subselect in the SELECT clause?

I tried (see end of fiddle):

(https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529)

  SELECT COUNT(t.ticket_show_id)
  FROM show s
  JOIN ticket t ON s.show_id = t.ticket_show_id
  GROUP BY s.show_id

Result

count
9
7
3
10

But when I tried to integrate it into another query as a SELECT in the
SELECT statement, I received the error:

  ERROR: more than one row returned by a subquery used as an expression

Which I can understand, but can't figure out how to solve. For each
row of my main query, the query above should only return 1 row -
but...

Any advice appreciated -


Pól...


================= DDL and DML =================

CREATE TABLE theatre
(
  theatre_id   INTEGER      NOT NULL PRIMARY KEY,
  theatre_name VARCHAR (25) NOT NULL
);

INSERT INTO theatre VALUES (1, 'Theatre_1'), (2, 'Theatre_2');

CREATE TABLE show
(
  show_id         INTEGER      NOT NULL PRIMARY KEY,
  show_theatre_id INTEGER      NOT NULL,
  show_name       VARCHAR (25) NOT NULL,
  CONSTRAINT show_theatre_fk FOREIGN KEY (show_theatre_id) REFERENCES
theatre (theatre_id)
);

INSERT INTO show (show_id, show_theatre_id, show_name) VALUES
(10, 1, 'show_10'), (11, 1, 'show_11'), (20, 2, 'show_20'), (21, 2, 'show_21');

CREATE TABLE ticket
(
  ticket_id SERIAL PRIMARY KEY,
  ticket_show_id INTEGER NOT NULL,
  CONSTRAINT ticket_show_fk FOREIGN KEY (ticket_show_id) REFERENCES
show (show_id),
  price INT NOT NULL
  --
  -- Various other fields - seat_no, is_sold, discount... whatever
);

INSERT INTO ticket (ticket_show_id, price) VALUES
(10, 10), (10, 10), (10, 10), (10, 10), (10, 10),
-- 5 tickets for show 10
(11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11),
-- 7/show 11 &c
(20, 20), (20, 20), (20, 20), (20, 20), (20, 20),
(20, 20), (20, 20), (20, 20), (20, 20), (20, 20),                     -- 10
(21, 21), (21, 21), (21, 21),                                         -- 3
(10, 500), (10, 500), (10, 500), (10, 500);        -- 4 EXPENSIVE
tickets for show_10;



Re: Understanding aggregation and window functions.

From
"David G. Johnston"
Date:
On Wed, Jan 6, 2021 at 9:08 AM Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:

SELECT
  th.theatre_name AS "Theatre name", s.show_name AS "Show name",
  SUM(t.price) AS "Total sales"

Theatre name    Show name   Sales/show
      Theatre_1         show_10             2050
      Theatre_1          show_11                77
      Theatre_2         show_20               200
      Theatre_2          show_21                63

Again, fine! But what I want is (theatre_1 2127) and (theatre_2, 263).


Then why are you grouping on "Show name" if the total you want is "Theatre name"?
 
  SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.theatre_id),

and I receive the error:

ERROR: column "t.price" must appear in the GROUP BY clause or be used
in an aggregate function LINE 8: SUM(t.price) OVER (PARTITION BY
th.theatre_id ORDER BY th....


sum(sum(t.price)) over (...)

When using window function in a aggregate query you need to write one aggregate function calls to appease the group by (the inner sum) and one aggregate function call to appease the "over" (the outer sum).

Ignoring the rest of the section, re-ask if you still have questions after getting clarity for the above.
** BONUS QUESTION **

I would also like to include the number of tickets sold by price point
and the number of tickets sold per theatre and per show - this might
involve a subselect in the SELECT clause?

  SELECT COUNT(t.ticket_show_id)
  FROM show s
  JOIN ticket t ON s.show_id = t.ticket_show_id
  GROUP BY s.show_id

Result

count
9
7
3
10


I try to never write a group by query that doesn't include a column for each grouped field, then any aggregates.

 
But when I tried to integrate it into another query as a SELECT in the
SELECT statement, I received the error:

  ERROR: more than one row returned by a subquery used as an expression 
 
Which I can understand, but can't figure out how to solve. For each
row of my main query, the query above should only return 1 row -
but...

Then you probably need to correlate the outer and inner queries by referencing a value in the outer query inside of the subquery.

with subquery as (select id, val)
select mainquery.id, (select val from subquery where subquery.id = mainquery.id)
from mainquery

Though, as can be seen from the above, usually a join works better than a subquery in cases like this.

David J.