Thread: Why oh why is this join not working?

Why oh why is this join not working?

From
Pól Ua Laoínecháin
Date:
Hi all, it's late and I'm tired and I hope there's somebody out there
who can get me out of this rut! It's probably something really basic
and blindingly obvious, but I'm stumped.

All DDL and DML and SQL is available at the fiddle here:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=6194f16306c4ebff90f56c2dac781465


My table:

CREATE TABLE resultdata
(
  class INTEGER NOT NULL,
  roll_number INTEGER NOT NULL,
  subjects VARCHAR (15) NOT NULL,
  marks INTEGER NOT NULL
);

Data - sample lines - full data (28 records) in fiddle.


INSERT INTO resultdata(class,roll_number,subjects,marks) VALUES
(8, 1, 'math', 98),
(8, 1,'english', 88),
(8, 1,'science', 96),
(8, 1,'computer', 94),... &c.

I ran this nonsense CROSS JOIN query to prove that there were no bugs
in dbfiddle.uk *_and_* that I'm not going mad! :-)

SELECT t1.class, t1.roll_number  -- Simple join - WORKS!
FROM resultdata t1
JOIN
(
  SELECT t2.class, t2.roll_number
  FROM resultdata t2
) AS t2
ON t1.class = t2.class;

class  roll_number
8        1
8        1
8        1 &c... 272 records - works fine

Then I run Query 1:

SELECT t1.class, t1.roll_number,      -- Query 1 - works!
  SUM(CASE WHEN t1.subjects = 'math'
    THEN t1.marks ELSE 0 END) AS mathmark,
  SUM(CASE WHEN t1.subjects = 'computer'
    THEN t1.marks ELSE 0 END) AS compmark,
  SUM(CASE WHEN t1.subjects = 'english'
    THEN t1.marks ELSE 0 END)  AS englmark,
  SUM(CASE WHEN t1.subjects = 'science'
    THEN t1.marks ELSE 0 END)  AS sciemark
FROM resultdata t1
GROUP BY t1.class, t1.roll_number;

Works.

Then Query 2:

SELECT class, MAX(marks) AS maxmark  -- Query 2 - works!
FROM resultdata
WHERE subjects = 'english'
GROUP BY class;

Works.

BUT, when I try and run this (JOINING the two tables above):

SELECT t1.class, t1.roll_number,
  SUM(CASE WHEN t1.subjects = 'math'
    THEN t1.marks ELSE 0 END) AS mathmark,
  SUM(CASE WHEN t1.subjects = 'computer'
    THEN t1.marks ELSE 0 END) AS compmark,
  SUM(CASE WHEN t1.subjects = 'english'
    THEN t1.marks ELSE 0 END)  AS englmark,
  SUM(CASE WHEN t1.subjects = 'science'
    THEN t1.marks ELSE 0 END)  AS sciemark
FROM resultdata t1
GROUP BY t1.class, t1.roll_number
JOIN    <<<<<==== Fails here
(
  SELECT class, MAX(marks) AS maxmark
  FROM resultdata
  WHERE subjects = 'english'
  GROUP BY class
) AS t2
ON t1.class = t2.class AND
   t1.englmark = t2.maxmark;

Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN

I've tried put every variation that I can think of to alias the first
table - brackets... the whole chebang - I can get nothing to work!

As a final note, when I use CTEs, it works fine. However, I have to
get this code working on a MySQL 5.7 box also, but I'd be interested
to know why I can't perform a simple join using PostgreSQL.

Should you require any further information, please don't hesitate to
contact me. Any helpful URLs or SQL references appreciated.

TIA and rgs, Pól...

WITH cte1 AS .   -- <<<<<<<<<< This whole CTE with JOIN at end works
fine also - produces correct result!
  SELECT t1.class, t1.roll_number,      -- Query 1 - works!
  SUM(CASE WHEN t1.subjects = 'math'
    THEN t1.marks ELSE 0 END) AS mathmark,
  SUM(CASE WHEN t1.subjects = 'computer'
    THEN t1.marks ELSE 0 END) AS compmark,
  SUM(CASE WHEN t1.subjects = 'english'
    THEN t1.marks ELSE 0 END)  AS englmark,
  SUM(CASE WHEN t1.subjects = 'science'
    THEN t1.marks ELSE 0 END)  AS sciemark
  FROM resultdata t1
  GROUP BY t1.class, t1.roll_number
),
cte2 AS
(
  SELECT class, MAX(marks) AS maxmark
  FROM resultdata
  WHERE subjects = 'english'
  GROUP BY class
)
SELECT t1.class, t1.roll_number, t1.mathmark, t1.englmark
FROM cte1 t1
JOIN cte2 t2
  ON t1.class = t2.class AND
     t1.englmark = t2.maxmark
ORDER BY class ASC;



Re: Why oh why is this join not working?

From
Bzzzz
Date:
On Mon, 18 Nov 2019 01:15:36 +0000
Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:

Hi,

What does:
    EXPLAIN ANALYZE <query that fails>
returns?

Jean-Yves

> Hi all, it's late and I'm tired and I hope there's somebody out there
> who can get me out of this rut! It's probably something really basic
> and blindingly obvious, but I'm stumped.
>
> All DDL and DML and SQL is available at the fiddle here:
>
> https://dbfiddle.uk/?rdbms=postgres_12&fiddle=6194f16306c4ebff90f56c2dac781465
>
>
> My table:
>
> CREATE TABLE resultdata
> (
>   class INTEGER NOT NULL,
>   roll_number INTEGER NOT NULL,
>   subjects VARCHAR (15) NOT NULL,
>   marks INTEGER NOT NULL
> );
>
> Data - sample lines - full data (28 records) in fiddle.
>
>
> INSERT INTO resultdata(class,roll_number,subjects,marks) VALUES
> (8, 1, 'math', 98),
> (8, 1,'english', 88),
> (8, 1,'science', 96),
> (8, 1,'computer', 94),... &c.
>
> I ran this nonsense CROSS JOIN query to prove that there were no bugs
> in dbfiddle.uk *_and_* that I'm not going mad! :-)
>
> SELECT t1.class, t1.roll_number  -- Simple join - WORKS!
> FROM resultdata t1
> JOIN
> (
>   SELECT t2.class, t2.roll_number
>   FROM resultdata t2
> ) AS t2
> ON t1.class = t2.class;
>
> class  roll_number
> 8        1
> 8        1
> 8        1 &c... 272 records - works fine
>
> Then I run Query 1:
>
> SELECT t1.class, t1.roll_number,      -- Query 1 - works!
>   SUM(CASE WHEN t1.subjects = 'math'
>     THEN t1.marks ELSE 0 END) AS mathmark,
>   SUM(CASE WHEN t1.subjects = 'computer'
>     THEN t1.marks ELSE 0 END) AS compmark,
>   SUM(CASE WHEN t1.subjects = 'english'
>     THEN t1.marks ELSE 0 END)  AS englmark,
>   SUM(CASE WHEN t1.subjects = 'science'
>     THEN t1.marks ELSE 0 END)  AS sciemark
> FROM resultdata t1
> GROUP BY t1.class, t1.roll_number;
>
> Works.
>
> Then Query 2:
>
> SELECT class, MAX(marks) AS maxmark  -- Query 2 - works!
> FROM resultdata
> WHERE subjects = 'english'
> GROUP BY class;
>
> Works.
>
> BUT, when I try and run this (JOINING the two tables above):
>
> SELECT t1.class, t1.roll_number,
>   SUM(CASE WHEN t1.subjects = 'math'
>     THEN t1.marks ELSE 0 END) AS mathmark,
>   SUM(CASE WHEN t1.subjects = 'computer'
>     THEN t1.marks ELSE 0 END) AS compmark,
>   SUM(CASE WHEN t1.subjects = 'english'
>     THEN t1.marks ELSE 0 END)  AS englmark,
>   SUM(CASE WHEN t1.subjects = 'science'
>     THEN t1.marks ELSE 0 END)  AS sciemark
> FROM resultdata t1
> GROUP BY t1.class, t1.roll_number
> JOIN    <<<<<==== Fails here
> (
>   SELECT class, MAX(marks) AS maxmark
>   FROM resultdata
>   WHERE subjects = 'english'
>   GROUP BY class
> ) AS t2
> ON t1.class = t2.class AND
>    t1.englmark = t2.maxmark;
>
> Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN
>
> I've tried put every variation that I can think of to alias the first
> table - brackets... the whole chebang - I can get nothing to work!
>
> As a final note, when I use CTEs, it works fine. However, I have to
> get this code working on a MySQL 5.7 box also, but I'd be interested
> to know why I can't perform a simple join using PostgreSQL.
>
> Should you require any further information, please don't hesitate to
> contact me. Any helpful URLs or SQL references appreciated.
>
> TIA and rgs, Pól...
>
> WITH cte1 AS .   -- <<<<<<<<<< This whole CTE with JOIN at end works
> fine also - produces correct result!
>   SELECT t1.class, t1.roll_number,      -- Query 1 - works!
>   SUM(CASE WHEN t1.subjects = 'math'
>     THEN t1.marks ELSE 0 END) AS mathmark,
>   SUM(CASE WHEN t1.subjects = 'computer'
>     THEN t1.marks ELSE 0 END) AS compmark,
>   SUM(CASE WHEN t1.subjects = 'english'
>     THEN t1.marks ELSE 0 END)  AS englmark,
>   SUM(CASE WHEN t1.subjects = 'science'
>     THEN t1.marks ELSE 0 END)  AS sciemark
>   FROM resultdata t1
>   GROUP BY t1.class, t1.roll_number
> ),
> cte2 AS
> (
>   SELECT class, MAX(marks) AS maxmark
>   FROM resultdata
>   WHERE subjects = 'english'
>   GROUP BY class
> )
> SELECT t1.class, t1.roll_number, t1.mathmark, t1.englmark
> FROM cte1 t1
> JOIN cte2 t2
>   ON t1.class = t2.class AND
>      t1.englmark = t2.maxmark
> ORDER BY class ASC;
>
>




Re: Why oh why is this join not working?

From
Bryan Nuse
Date:
On 11/17/19 8:15 PM, Pól Ua Laoínecháin wrote:

> BUT, when I try and run this (JOINING the two tables above):
>
> SELECT t1.class, t1.roll_number,
>    SUM(CASE WHEN t1.subjects = 'math'
>      THEN t1.marks ELSE 0 END) AS mathmark,
>    SUM(CASE WHEN t1.subjects = 'computer'
>      THEN t1.marks ELSE 0 END) AS compmark,
>    SUM(CASE WHEN t1.subjects = 'english'
>      THEN t1.marks ELSE 0 END)  AS englmark,
>    SUM(CASE WHEN t1.subjects = 'science'
>      THEN t1.marks ELSE 0 END)  AS sciemark
> FROM resultdata t1
> GROUP BY t1.class, t1.roll_number
> JOIN    <<<<<==== Fails here
> (
>    SELECT class, MAX(marks) AS maxmark
>    FROM resultdata
>    WHERE subjects = 'english'
>    GROUP BY class
> ) AS t2
> ON t1.class = t2.class AND
>     t1.englmark = t2.maxmark;
>
> Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN
>
Hello Pól,

I think you just need parentheses around the first subquery, and to put 
both subqueries in the FROM list of an outer SELECT.

When I run this, I get the same result as your CTE query:

SELECT t1.class, t1.roll_number, t1.mathmark, t1.englmark
FROM
(
   SELECT class, roll_number,
   SUM(CASE WHEN subjects = 'math'
     THEN marks ELSE 0 END) AS mathmark,
   SUM(CASE WHEN subjects = 'computer'
     THEN marks ELSE 0 END) AS compmark,
   SUM(CASE WHEN subjects = 'english'
     THEN marks ELSE 0 END)  AS englmark,
   SUM(CASE WHEN subjects = 'science'
     THEN marks ELSE 0 END)  AS sciemark
   FROM resultdata
   GROUP BY class, roll_number
) AS t1
JOIN
(
   SELECT class, MAX(marks) AS maxmark
   FROM resultdata
   WHERE subjects = 'english'
   GROUP BY class
) AS t2
ON t1.class = t2.class AND
    t1.englmark = t2.maxmark
ORDER BY class ASC;



Regards,
Bryan



-- 
Postdoctoral Researcher
Georgia Cooperative Fish & Wildlife Research Unit
Warnell School of Forestry & Natural Resources
University of Georgia
Athens, GA 30606-2152



Re: Why oh why is this join not working?

From
David Rowley
Date:
On Mon, 18 Nov 2019 at 14:16, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
> SELECT t1.class, t1.roll_number,
>   SUM(CASE WHEN t1.subjects = 'math'
>     THEN t1.marks ELSE 0 END) AS mathmark,
>   SUM(CASE WHEN t1.subjects = 'computer'
>     THEN t1.marks ELSE 0 END) AS compmark,
>   SUM(CASE WHEN t1.subjects = 'english'
>     THEN t1.marks ELSE 0 END)  AS englmark,
>   SUM(CASE WHEN t1.subjects = 'science'
>     THEN t1.marks ELSE 0 END)  AS sciemark
> FROM resultdata t1
> GROUP BY t1.class, t1.roll_number
> JOIN    <<<<<==== Fails here
> (
>   SELECT class, MAX(marks) AS maxmark
>   FROM resultdata
>   WHERE subjects = 'english'
>   GROUP BY class
> ) AS t2
> ON t1.class = t2.class AND
>    t1.englmark = t2.maxmark;
>
> Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN

This isn't valid SQL. GROUP BY logically and syntactically comes after
joins at each level. If you want to group before join then you can
include that SQL in a derived table in the FROM clause, e.g:

SELECT t1.class, t1.roll_number FROM (SELECT SUM(...) FROM resultdata
t1 GROUP BY t1.class, t1.roll_number) t1 JOIN ....

Note the additional SELECT and parenthesis around it.

You could also look into the aggregate FILTER (WHERE ...) clause so
that you don't have to have those not so nice CASE expressions inside
the aggregate. However, I'm unsure what MySQL supports there. You
might be stuck with them if you must support both using the same SQL
syntax.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Why oh why is this join not working?

From
Pól Ua Laoínecháin
Date:
Hi David and thanks for your interest and I am happy to report that
some progress has been made!

> > Result is: ERROR: syntax error at or near "JOIN" LINE 12: JOIN

> This isn't valid SQL. GROUP BY logically and syntactically comes after
> joins at each level. If you want to group before join then you can
> include that SQL in a derived table in the FROM clause, e.g:
> SELECT t1.class, t1.roll_number FROM (SELECT SUM(...) FROM resultdata
> t1 GROUP BY t1.class, t1.roll_number) t1 JOIN ....

> Note the additional SELECT and parenthesis around it.

Indeed, and this hint has got me considerably further in this issue!

I am now running this query: fiddle here -
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=97ef20c1981d2d1d4918f1b3599ead32

SELECT t1.class, t1.roll_number,
  SUM(CASE WHEN t1.subjects = 'math'
    THEN t1.marks ELSE 0 END) AS mathmark,
  SUM(CASE WHEN t1.subjects = 'computer'
    THEN t1.marks ELSE 0 END) AS compmark,
  SUM(CASE WHEN t1.subjects = 'english'
    THEN t1.marks ELSE 0 END)  AS englmark,
  SUM(CASE WHEN t1.subjects = 'science'
    THEN t1.marks ELSE 0 END)  AS sciemark
FROM resultdata AS t1
JOIN
(
  SELECT class, MAX(marks) AS maxmark
  FROM resultdata
  WHERE subjects = 'english'
  GROUP BY class
) AS t2
ON t1.class = t2.class
AND t1.englmark = t2.maxmark        -- <<<<<====== Error here! I can
refer to t1.class, but not t1.englmark - why?
GROUP BY t1.class, t1.roll_number
ORDER BY class, englmark DESC;

but this gives the error:

ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark

I don't seem to be able to refer to the result(s) of my SUM(CASE...)
in the ON clause of the JOIN? I've tried various solutions to this,
and again, nothing works!


> You could also look into the aggregate FILTER (WHERE ...) clause so

Thanks for the heads up - it would be a perfect use case and I haven't
really looked at that one much. I've been trying to follow Markus
Winand's " Modern SQL in Open Source and Commercial Databases" theme
of not remaining stuck with SQL '92! But, I also have to deal with
MySQL 5.7 (eyes raise to heaven! :-) )


Thanks again for your input! Regards,

Pól...


>  David Rowley                   http://www.2ndQuadrant.com/



Re: Why oh why is this join not working?

From
Pól Ua Laoínecháin
Date:
Salut/Hi Jean-Yves, and merci for your input.

> What does:
>         EXPLAIN ANALYZE <query that fails>
> returns?

It returns exactly the same error as when the EXPLAIN ANALYZE is not there!

fiddle: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=043de4fa23fa9fdfa98e8f46daae8e8f

Any other thoughts?


Pól...

> Jean-Yves



Re: Why oh why is this join not working?

From
Pól Ua Laoínecháin
Date:
Hi Bryan, and thanks for your interest in my problem.

> I think you just need parentheses around the first subquery, and to put
> both subqueries in the FROM list of an outer SELECT.

Yes, I could do that! However, if you check out my reply to David
Rowley in the same thread, you'll see that I've got much further.

Fiddle here:
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=5aaa2fde61da3b70521f5e1bc29a047e

With the query:

SELECT t1.class, t1.roll_number,
  SUM(CASE WHEN t1.subjects = 'math'
    THEN t1.marks ELSE 0 END) AS mathmark,
  SUM(CASE WHEN t1.subjects = 'computer'
    THEN t1.marks ELSE 0 END) AS compmark,
  SUM(CASE WHEN t1.subjects = 'english'
    THEN t1.marks ELSE 0 END)  AS englmark,
  SUM(CASE WHEN t1.subjects = 'science'
    THEN t1.marks ELSE 0 END)  AS sciemark
FROM resultdata AS t1
JOIN
(
  SELECT class, MAX(marks) AS maxmark
  FROM resultdata
  WHERE subjects = 'english'
  GROUP BY class
) AS t2
ON t1.class = t2.class
AND t1.englmark = t2.maxmark        -- Error occurs here.
GROUP BY t1.class, t1.roll_number
ORDER BY class, englmark DESC;

The error occurs way down the query at line 19. Error message:

ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark

Now, I can refer to t1.class in the ON clause, but *_not_* to
t1.englmark. To me this makes no sense - if I can refer to one, I
should be able to refer to the other?

I could have taken the nested subquery strategy but a) I think the
query would be more elegant without it and b) (something I'll check up
on), I'm wondering if more levels of subqueries might have performance
implications? Finally, c) As mentioned above, if I can refer to
t1.class, I should be able to refer to t1.englmark!

Thanks again for your input, rgs,

Pól...


> Bryan



Re: Why oh why is this join not working?

From
Laurenz Albe
Date:
On Mon, 2019-11-18 at 07:56 +0000, Pól Ua Laoínecháin wrote:
> SELECT t1.class, t1.roll_number,
>   SUM(CASE WHEN t1.subjects = 'math'
>     THEN t1.marks ELSE 0 END) AS mathmark,
>   SUM(CASE WHEN t1.subjects = 'computer'
>     THEN t1.marks ELSE 0 END) AS compmark,
>   SUM(CASE WHEN t1.subjects = 'english'
>     THEN t1.marks ELSE 0 END)  AS englmark,
>   SUM(CASE WHEN t1.subjects = 'science'
>     THEN t1.marks ELSE 0 END)  AS sciemark
> FROM resultdata AS t1
> JOIN
> (
>   SELECT class, MAX(marks) AS maxmark
>   FROM resultdata
>   WHERE subjects = 'english'
>   GROUP BY class
> ) AS t2
> ON t1.class = t2.class
> AND t1.englmark = t2.maxmark        -- <<<<<====== Error here! I can
> refer to t1.class, but not t1.englmark - why?
> GROUP BY t1.class, t1.roll_number
> ORDER BY class, englmark DESC;
> 
> but this gives the error:
> 
> ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark

You cannot refer to a column alias in a WHERE condition,
because grouping takes place *after* the WHERE condition
has been evaluated.

Try to use a subquery:

SELECT t1.class, t1.roll_number,
       t1.mathmark,
       t1.compmark,
       t1.englmark,
       t1.sciemark
FROM (SELECT class, roll_number,
             SUM(t1.marks) FILTER (WHERE t1.subjects = 'math') AS mathmark,
             SUM(t1.marks) FILTER (WHERE t1.subjects = 'computer') AS compmark,
             SUM(t1.marks) FILTER (WHERE t1.subjects = 'english') AS englmark,
             SUM(t1.marks) FILTER (WHERE t1.subjects = 'science') AS sciemark
      FROM resultdata
      GROUP BY t1.class, t1.roll_number
     ) AS t1
   JOIN (SELECT class, MAX(marks) AS maxmark
         FROM resultdata
         WHERE subjects = 'english'
         GROUP BY class
        ) AS t2
      ON t1.class = t2.class AND t1.englmark = t2.maxmark
ORDER BY class, englmark DESC;

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Why oh why is this join not working?

From
Pól Ua Laoínecháin
Date:
Hi Laurenz, and thanks for your contribution,

> > ERROR: column t1.englmark does not exist LINE 19: AND t1.englmark = t2.maxmark

> You cannot refer to a column alias in a WHERE condition,
> because grouping takes place *after* the WHERE condition
> has been evaluated.

Yes - of course! And nor can I refer to an aggregate! Thanks for
pointing out what I should have known at this stage! I'm putting it
down to late-night fatigue yesterday.

I'm just wondering if there are significant performance implications
associated with deeper levels of nesting?

I was attempting to look at using Window/Analytic functions to get
this done, but I just can't see it. I think that your solution (and
explanation) is optimal for this particular use case - a PIVOT (pity
it's not available by default in PostgreSQL!) is essential and there's
no (what I see as) a more elegant way.

Thanks again for your input,

Regards,

Pól...


> Laurenz Albe



RE: Why oh why is this join not working?

From
"Vianello, Dan A"
Date:
> Now, I can refer to t1.class in the ON clause, but *_not_* to
> t1.englmark. To me this makes no sense - if I can refer to one, I
> should be able to refer to the other?

The error is occurring because t1.englmark doesn’t exist yet at the time that the join is being processed.  It would
onlyexists after the join when the sum(case...) statements are being evaluated.  t1.class, however, is part of the
resultdatatable.
 

Your cte solution in your SQL fiddle is clean and works, so go with that.  

EXPLAIN (ANALYZE, BUFFERS) can be used to examine relative performance of different versions of the same query.

Dan
E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain
confidentialand/or legally privileged information. If you are not the intended recipient of this message or if this
messagehas been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this
messageand any attachments. If you are not the intended recipient, you are notified that any use, dissemination,
distribution,copying, or storage of this message or any attachment is strictly prohibited.