Thread: Why oh why is this join not working?
Hi all, it's late and I'm tired and I hope there's somebody out there who can get me out of this rut! It's probably something really basic and blindingly obvious, but I'm stumped. All DDL and DML and SQL is available at the fiddle here: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=6194f16306c4ebff90f56c2dac781465 My table: CREATE TABLE resultdata ( class INTEGER NOT NULL, roll_number INTEGER NOT NULL, subjects VARCHAR (15) NOT NULL, marks INTEGER NOT NULL ); Data - sample lines - full data (28 records) in fiddle. INSERT INTO resultdata(class,roll_number,subjects,marks) VALUES (8, 1, 'math', 98), (8, 1,'english', 88), (8, 1,'science', 96), (8, 1,'computer', 94),... &c. I ran this nonsense CROSS JOIN query to prove that there were no bugs in dbfiddle.uk *_and_* that I'm not going mad! :-) SELECT t1.class, t1.roll_number -- Simple join - WORKS! FROM resultdata t1 JOIN ( SELECT t2.class, t2.roll_number FROM resultdata t2 ) AS t2 ON t1.class = t2.class; class roll_number 8 1 8 1 8 1 &c... 272 records - works fine Then I run Query 1: SELECT t1.class, t1.roll_number, -- Query 1 - works! SUM(CASE WHEN t1.subjects = 'math' THEN t1.marks ELSE 0 END) AS mathmark, SUM(CASE WHEN t1.subjects = 'computer' THEN t1.marks ELSE 0 END) AS compmark, SUM(CASE WHEN t1.subjects = 'english' THEN t1.marks ELSE 0 END) AS englmark, SUM(CASE WHEN t1.subjects = 'science' THEN t1.marks ELSE 0 END) AS sciemark FROM resultdata t1 GROUP BY t1.class, t1.roll_number; Works. Then Query 2: SELECT class, MAX(marks) AS maxmark -- Query 2 - works! FROM resultdata WHERE subjects = 'english' GROUP BY class; Works. BUT, when I try and run this (JOINING the two tables above): SELECT t1.class, t1.roll_number, SUM(CASE WHEN t1.subjects = 'math' THEN t1.marks ELSE 0 END) AS mathmark, SUM(CASE WHEN t1.subjects = 'computer' THEN t1.marks ELSE 0 END) AS compmark, SUM(CASE WHEN t1.subjects = 'english' THEN t1.marks ELSE 0 END) AS englmark, SUM(CASE WHEN t1.subjects = 'science' THEN t1.marks ELSE 0 END) AS sciemark FROM resultdata t1 GROUP BY t1.class, t1.roll_number JOIN <<<<<==== Fails here ( SELECT class, MAX(marks) AS maxmark FROM resultdata WHERE subjects = 'english' GROUP BY class ) AS t2 ON t1.class = t2.class AND t1.englmark = t2.maxmark; Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN I've tried put every variation that I can think of to alias the first table - brackets... the whole chebang - I can get nothing to work! As a final note, when I use CTEs, it works fine. However, I have to get this code working on a MySQL 5.7 box also, but I'd be interested to know why I can't perform a simple join using PostgreSQL. Should you require any further information, please don't hesitate to contact me. Any helpful URLs or SQL references appreciated. TIA and rgs, Pól... WITH cte1 AS . -- <<<<<<<<<< This whole CTE with JOIN at end works fine also - produces correct result! SELECT t1.class, t1.roll_number, -- Query 1 - works! SUM(CASE WHEN t1.subjects = 'math' THEN t1.marks ELSE 0 END) AS mathmark, SUM(CASE WHEN t1.subjects = 'computer' THEN t1.marks ELSE 0 END) AS compmark, SUM(CASE WHEN t1.subjects = 'english' THEN t1.marks ELSE 0 END) AS englmark, SUM(CASE WHEN t1.subjects = 'science' THEN t1.marks ELSE 0 END) AS sciemark FROM resultdata t1 GROUP BY t1.class, t1.roll_number ), cte2 AS ( SELECT class, MAX(marks) AS maxmark FROM resultdata WHERE subjects = 'english' GROUP BY class ) SELECT t1.class, t1.roll_number, t1.mathmark, t1.englmark FROM cte1 t1 JOIN cte2 t2 ON t1.class = t2.class AND t1.englmark = t2.maxmark ORDER BY class ASC;
On Mon, 18 Nov 2019 01:15:36 +0000 Pól Ua Laoínecháin <linehanp@tcd.ie> wrote: Hi, What does: EXPLAIN ANALYZE <query that fails> returns? Jean-Yves > Hi all, it's late and I'm tired and I hope there's somebody out there > who can get me out of this rut! It's probably something really basic > and blindingly obvious, but I'm stumped. > > All DDL and DML and SQL is available at the fiddle here: > > https://dbfiddle.uk/?rdbms=postgres_12&fiddle=6194f16306c4ebff90f56c2dac781465 > > > My table: > > CREATE TABLE resultdata > ( > class INTEGER NOT NULL, > roll_number INTEGER NOT NULL, > subjects VARCHAR (15) NOT NULL, > marks INTEGER NOT NULL > ); > > Data - sample lines - full data (28 records) in fiddle. > > > INSERT INTO resultdata(class,roll_number,subjects,marks) VALUES > (8, 1, 'math', 98), > (8, 1,'english', 88), > (8, 1,'science', 96), > (8, 1,'computer', 94),... &c. > > I ran this nonsense CROSS JOIN query to prove that there were no bugs > in dbfiddle.uk *_and_* that I'm not going mad! :-) > > SELECT t1.class, t1.roll_number -- Simple join - WORKS! > FROM resultdata t1 > JOIN > ( > SELECT t2.class, t2.roll_number > FROM resultdata t2 > ) AS t2 > ON t1.class = t2.class; > > class roll_number > 8 1 > 8 1 > 8 1 &c... 272 records - works fine > > Then I run Query 1: > > SELECT t1.class, t1.roll_number, -- Query 1 - works! > SUM(CASE WHEN t1.subjects = 'math' > THEN t1.marks ELSE 0 END) AS mathmark, > SUM(CASE WHEN t1.subjects = 'computer' > THEN t1.marks ELSE 0 END) AS compmark, > SUM(CASE WHEN t1.subjects = 'english' > THEN t1.marks ELSE 0 END) AS englmark, > SUM(CASE WHEN t1.subjects = 'science' > THEN t1.marks ELSE 0 END) AS sciemark > FROM resultdata t1 > GROUP BY t1.class, t1.roll_number; > > Works. > > Then Query 2: > > SELECT class, MAX(marks) AS maxmark -- Query 2 - works! > FROM resultdata > WHERE subjects = 'english' > GROUP BY class; > > Works. > > BUT, when I try and run this (JOINING the two tables above): > > SELECT t1.class, t1.roll_number, > SUM(CASE WHEN t1.subjects = 'math' > THEN t1.marks ELSE 0 END) AS mathmark, > SUM(CASE WHEN t1.subjects = 'computer' > THEN t1.marks ELSE 0 END) AS compmark, > SUM(CASE WHEN t1.subjects = 'english' > THEN t1.marks ELSE 0 END) AS englmark, > SUM(CASE WHEN t1.subjects = 'science' > THEN t1.marks ELSE 0 END) AS sciemark > FROM resultdata t1 > GROUP BY t1.class, t1.roll_number > JOIN <<<<<==== Fails here > ( > SELECT class, MAX(marks) AS maxmark > FROM resultdata > WHERE subjects = 'english' > GROUP BY class > ) AS t2 > ON t1.class = t2.class AND > t1.englmark = t2.maxmark; > > Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN > > I've tried put every variation that I can think of to alias the first > table - brackets... the whole chebang - I can get nothing to work! > > As a final note, when I use CTEs, it works fine. However, I have to > get this code working on a MySQL 5.7 box also, but I'd be interested > to know why I can't perform a simple join using PostgreSQL. > > Should you require any further information, please don't hesitate to > contact me. Any helpful URLs or SQL references appreciated. > > TIA and rgs, Pól... > > WITH cte1 AS . -- <<<<<<<<<< This whole CTE with JOIN at end works > fine also - produces correct result! > SELECT t1.class, t1.roll_number, -- Query 1 - works! > SUM(CASE WHEN t1.subjects = 'math' > THEN t1.marks ELSE 0 END) AS mathmark, > SUM(CASE WHEN t1.subjects = 'computer' > THEN t1.marks ELSE 0 END) AS compmark, > SUM(CASE WHEN t1.subjects = 'english' > THEN t1.marks ELSE 0 END) AS englmark, > SUM(CASE WHEN t1.subjects = 'science' > THEN t1.marks ELSE 0 END) AS sciemark > FROM resultdata t1 > GROUP BY t1.class, t1.roll_number > ), > cte2 AS > ( > SELECT class, MAX(marks) AS maxmark > FROM resultdata > WHERE subjects = 'english' > GROUP BY class > ) > SELECT t1.class, t1.roll_number, t1.mathmark, t1.englmark > FROM cte1 t1 > JOIN cte2 t2 > ON t1.class = t2.class AND > t1.englmark = t2.maxmark > ORDER BY class ASC; > >
On 11/17/19 8:15 PM, Pól Ua Laoínecháin wrote: > BUT, when I try and run this (JOINING the two tables above): > > SELECT t1.class, t1.roll_number, > SUM(CASE WHEN t1.subjects = 'math' > THEN t1.marks ELSE 0 END) AS mathmark, > SUM(CASE WHEN t1.subjects = 'computer' > THEN t1.marks ELSE 0 END) AS compmark, > SUM(CASE WHEN t1.subjects = 'english' > THEN t1.marks ELSE 0 END) AS englmark, > SUM(CASE WHEN t1.subjects = 'science' > THEN t1.marks ELSE 0 END) AS sciemark > FROM resultdata t1 > GROUP BY t1.class, t1.roll_number > JOIN <<<<<==== Fails here > ( > SELECT class, MAX(marks) AS maxmark > FROM resultdata > WHERE subjects = 'english' > GROUP BY class > ) AS t2 > ON t1.class = t2.class AND > t1.englmark = t2.maxmark; > > Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN > Hello Pól, I think you just need parentheses around the first subquery, and to put both subqueries in the FROM list of an outer SELECT. When I run this, I get the same result as your CTE query: SELECT t1.class, t1.roll_number, t1.mathmark, t1.englmark FROM ( SELECT class, roll_number, SUM(CASE WHEN subjects = 'math' THEN marks ELSE 0 END) AS mathmark, SUM(CASE WHEN subjects = 'computer' THEN marks ELSE 0 END) AS compmark, SUM(CASE WHEN subjects = 'english' THEN marks ELSE 0 END) AS englmark, SUM(CASE WHEN subjects = 'science' THEN marks ELSE 0 END) AS sciemark FROM resultdata GROUP BY class, roll_number ) AS t1 JOIN ( SELECT class, MAX(marks) AS maxmark FROM resultdata WHERE subjects = 'english' GROUP BY class ) AS t2 ON t1.class = t2.class AND t1.englmark = t2.maxmark ORDER BY class ASC; Regards, Bryan -- Postdoctoral Researcher Georgia Cooperative Fish & Wildlife Research Unit Warnell School of Forestry & Natural Resources University of Georgia Athens, GA 30606-2152
On Mon, 18 Nov 2019 at 14:16, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote: > SELECT t1.class, t1.roll_number, > SUM(CASE WHEN t1.subjects = 'math' > THEN t1.marks ELSE 0 END) AS mathmark, > SUM(CASE WHEN t1.subjects = 'computer' > THEN t1.marks ELSE 0 END) AS compmark, > SUM(CASE WHEN t1.subjects = 'english' > THEN t1.marks ELSE 0 END) AS englmark, > SUM(CASE WHEN t1.subjects = 'science' > THEN t1.marks ELSE 0 END) AS sciemark > FROM resultdata t1 > GROUP BY t1.class, t1.roll_number > JOIN <<<<<==== Fails here > ( > SELECT class, MAX(marks) AS maxmark > FROM resultdata > WHERE subjects = 'english' > GROUP BY class > ) AS t2 > ON t1.class = t2.class AND > t1.englmark = t2.maxmark; > > Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN This isn't valid SQL. GROUP BY logically and syntactically comes after joins at each level. If you want to group before join then you can include that SQL in a derived table in the FROM clause, e.g: SELECT t1.class, t1.roll_number FROM (SELECT SUM(...) FROM resultdata t1 GROUP BY t1.class, t1.roll_number) t1 JOIN .... Note the additional SELECT and parenthesis around it. You could also look into the aggregate FILTER (WHERE ...) clause so that you don't have to have those not so nice CASE expressions inside the aggregate. However, I'm unsure what MySQL supports there. You might be stuck with them if you must support both using the same SQL syntax. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi David and thanks for your interest and I am happy to report that some progress has been made! > > Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN > This isn't valid SQL. GROUP BY logically and syntactically comes after > joins at each level. If you want to group before join then you can > include that SQL in a derived table in the FROM clause, e.g: > SELECT t1.class, t1.roll_number FROM (SELECT SUM(...) FROM resultdata > t1 GROUP BY t1.class, t1.roll_number) t1 JOIN .... > Note the additional SELECT and parenthesis around it. Indeed, and this hint has got me considerably further in this issue! I am now running this query: fiddle here - https://dbfiddle.uk/?rdbms=postgres_12&fiddle=97ef20c1981d2d1d4918f1b3599ead32 SELECT t1.class, t1.roll_number, SUM(CASE WHEN t1.subjects = 'math' THEN t1.marks ELSE 0 END) AS mathmark, SUM(CASE WHEN t1.subjects = 'computer' THEN t1.marks ELSE 0 END) AS compmark, SUM(CASE WHEN t1.subjects = 'english' THEN t1.marks ELSE 0 END) AS englmark, SUM(CASE WHEN t1.subjects = 'science' THEN t1.marks ELSE 0 END) AS sciemark FROM resultdata AS t1 JOIN ( SELECT class, MAX(marks) AS maxmark FROM resultdata WHERE subjects = 'english' GROUP BY class ) AS t2 ON t1.class = t2.class AND t1.englmark = t2.maxmark -- <<<<<====== Error here! I can refer to t1.class, but not t1.englmark - why? GROUP BY t1.class, t1.roll_number ORDER BY class, englmark DESC; but this gives the error: ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark I don't seem to be able to refer to the result(s) of my SUM(CASE...) in the ON clause of the JOIN? I've tried various solutions to this, and again, nothing works! > You could also look into the aggregate FILTER (WHERE ...) clause so Thanks for the heads up - it would be a perfect use case and I haven't really looked at that one much. I've been trying to follow Markus Winand's " Modern SQL in Open Source and Commercial Databases" theme of not remaining stuck with SQL '92! But, I also have to deal with MySQL 5.7 (eyes raise to heaven! :-) ) Thanks again for your input! Regards, Pól... > David Rowley http://www.2ndQuadrant.com/
Salut/Hi Jean-Yves, and merci for your input. > What does: > EXPLAIN ANALYZE <query that fails> > returns? It returns exactly the same error as when the EXPLAIN ANALYZE is not there! fiddle: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=043de4fa23fa9fdfa98e8f46daae8e8f Any other thoughts? Pól... > Jean-Yves
Hi Bryan, and thanks for your interest in my problem. > I think you just need parentheses around the first subquery, and to put > both subqueries in the FROM list of an outer SELECT. Yes, I could do that! However, if you check out my reply to David Rowley in the same thread, you'll see that I've got much further. Fiddle here: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=5aaa2fde61da3b70521f5e1bc29a047e With the query: SELECT t1.class, t1.roll_number, SUM(CASE WHEN t1.subjects = 'math' THEN t1.marks ELSE 0 END) AS mathmark, SUM(CASE WHEN t1.subjects = 'computer' THEN t1.marks ELSE 0 END) AS compmark, SUM(CASE WHEN t1.subjects = 'english' THEN t1.marks ELSE 0 END) AS englmark, SUM(CASE WHEN t1.subjects = 'science' THEN t1.marks ELSE 0 END) AS sciemark FROM resultdata AS t1 JOIN ( SELECT class, MAX(marks) AS maxmark FROM resultdata WHERE subjects = 'english' GROUP BY class ) AS t2 ON t1.class = t2.class AND t1.englmark = t2.maxmark -- Error occurs here. GROUP BY t1.class, t1.roll_number ORDER BY class, englmark DESC; The error occurs way down the query at line 19. Error message: ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark Now, I can refer to t1.class in the ON clause, but *_not_* to t1.englmark. To me this makes no sense - if I can refer to one, I should be able to refer to the other? I could have taken the nested subquery strategy but a) I think the query would be more elegant without it and b) (something I'll check up on), I'm wondering if more levels of subqueries might have performance implications? Finally, c) As mentioned above, if I can refer to t1.class, I should be able to refer to t1.englmark! Thanks again for your input, rgs, Pól... > Bryan
On Mon, 2019-11-18 at 07:56 +0000, Pól Ua Laoínecháin wrote: > SELECT t1.class, t1.roll_number, > SUM(CASE WHEN t1.subjects = 'math' > THEN t1.marks ELSE 0 END) AS mathmark, > SUM(CASE WHEN t1.subjects = 'computer' > THEN t1.marks ELSE 0 END) AS compmark, > SUM(CASE WHEN t1.subjects = 'english' > THEN t1.marks ELSE 0 END) AS englmark, > SUM(CASE WHEN t1.subjects = 'science' > THEN t1.marks ELSE 0 END) AS sciemark > FROM resultdata AS t1 > JOIN > ( > SELECT class, MAX(marks) AS maxmark > FROM resultdata > WHERE subjects = 'english' > GROUP BY class > ) AS t2 > ON t1.class = t2.class > AND t1.englmark = t2.maxmark -- <<<<<====== Error here! I can > refer to t1.class, but not t1.englmark - why? > GROUP BY t1.class, t1.roll_number > ORDER BY class, englmark DESC; > > but this gives the error: > > ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark You cannot refer to a column alias in a WHERE condition, because grouping takes place *after* the WHERE condition has been evaluated. Try to use a subquery: SELECT t1.class, t1.roll_number, t1.mathmark, t1.compmark, t1.englmark, t1.sciemark FROM (SELECT class, roll_number, SUM(t1.marks) FILTER (WHERE t1.subjects = 'math') AS mathmark, SUM(t1.marks) FILTER (WHERE t1.subjects = 'computer') AS compmark, SUM(t1.marks) FILTER (WHERE t1.subjects = 'english') AS englmark, SUM(t1.marks) FILTER (WHERE t1.subjects = 'science') AS sciemark FROM resultdata GROUP BY t1.class, t1.roll_number ) AS t1 JOIN (SELECT class, MAX(marks) AS maxmark FROM resultdata WHERE subjects = 'english' GROUP BY class ) AS t2 ON t1.class = t2.class AND t1.englmark = t2.maxmark ORDER BY class, englmark DESC; Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi Laurenz, and thanks for your contribution, > > ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark > You cannot refer to a column alias in a WHERE condition, > because grouping takes place *after* the WHERE condition > has been evaluated. Yes - of course! And nor can I refer to an aggregate! Thanks for pointing out what I should have known at this stage! I'm putting it down to late-night fatigue yesterday. I'm just wondering if there are significant performance implications associated with deeper levels of nesting? I was attempting to look at using Window/Analytic functions to get this done, but I just can't see it. I think that your solution (and explanation) is optimal for this particular use case - a PIVOT (pity it's not available by default in PostgreSQL!) is essential and there's no (what I see as) a more elegant way. Thanks again for your input, Regards, Pól... > Laurenz Albe
> Now, I can refer to t1.class in the ON clause, but *_not_* to > t1.englmark. To me this makes no sense - if I can refer to one, I > should be able to refer to the other? The error is occurring because t1.englmark doesn’t exist yet at the time that the join is being processed. It would onlyexists after the join when the sum(case...) statements are being evaluated. t1.class, however, is part of the resultdatatable. Your cte solution in your SQL fiddle is clean and works, so go with that. EXPLAIN (ANALYZE, BUFFERS) can be used to examine relative performance of different versions of the same query. Dan E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidentialand/or legally privileged information. If you are not the intended recipient of this message or if this messagehas been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this messageand any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution,copying, or storage of this message or any attachment is strictly prohibited.