Thread: Change JOIN tutorial to focus more on explicit joins

Change JOIN tutorial to focus more on explicit joins

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/tutorial-join.html
Description:

The tutorial about joins makes the following statement about the explicit
JOIN operator:

> This syntax is not as commonly used as the one above

I think in 2020 this claim is no longer true, and I would love to see the
manual prefer the "modern" explicit JOIN operator rather than sticking to
the ancient implicit joins in the WHERE clause.

Re: Change JOIN tutorial to focus more on explicit joins

From
Thomas Munro
Date:
On Thu, May 21, 2020 at 1:37 AM PG Doc comments form
<noreply@postgresql.org> wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/12/tutorial-join.html
> Description:
>
> The tutorial about joins makes the following statement about the explicit
> JOIN operator:
>
> > This syntax is not as commonly used as the one above
>
> I think in 2020 this claim is no longer true, and I would love to see the
> manual prefer the "modern" explicit JOIN operator rather than sticking to
> the ancient implicit joins in the WHERE clause.

+1

The "new" syntax is 28 years old, from SQL 92.  I don't see too many
SQL 86 joins.  Would you like to write a documentation patch?



Re: Change JOIN tutorial to focus more on explicit joins

From
Jürgen Purtz
Date:
On 20.05.20 23:56, Thomas Munro wrote:
> On Thu, May 21, 2020 at 1:37 AM PG Doc comments form
> <noreply@postgresql.org> wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/tutorial-join.html
>> Description:
>>
>> The tutorial about joins makes the following statement about the explicit
>> JOIN operator:
>>
>>> This syntax is not as commonly used as the one above
>> I think in 2020 this claim is no longer true, and I would love to see the
>> manual prefer the "modern" explicit JOIN operator rather than sticking to
>> the ancient implicit joins in the WHERE clause.
> +1
>
> The "new" syntax is 28 years old, from SQL 92.  I don't see too many
> SQL 86 joins.  Would you like to write a documentation patch?
>
>
The attached patch

- prefers the explicit join-syntax over the implicit one and explains 
the keywords of the explicit syntax

- uses a more accurate definition of 'join'

- separates <programlisting> and <screen> tags

- shifts <indexterm> definitions outside of <para> to get a better 
rendering in PDF

- adds a note concerning IDs and foreign keys


--

J. Purtz



Attachment

Re: Change JOIN tutorial to focus more on explicit joins

From
Thomas Munro
Date:
On Wed, May 27, 2020 at 8:29 PM Jürgen Purtz <juergen@purtz.de> wrote:
> > The "new" syntax is 28 years old, from SQL 92.  I don't see too many
> > SQL 86 joins.  Would you like to write a documentation patch?
> >
> >
> The attached patch
>
> - prefers the explicit join-syntax over the implicit one and explains
> the keywords of the explicit syntax
>
> - uses a more accurate definition of 'join'
>
> - separates <programlisting> and <screen> tags
>
> - shifts <indexterm> definitions outside of <para> to get a better
> rendering in PDF
>
> - adds a note concerning IDs and foreign keys

Hi Jürgen,

Please add to the commitfest app, so we don't lose track of it.



Re: Change JOIN tutorial to focus more on explicit joins

From
Peter Eisentraut
Date:
On 2020-05-27 10:29, Jürgen Purtz wrote:
> The attached patch
> 
> - prefers the explicit join-syntax over the implicit one and explains
> the keywords of the explicit syntax
> 
> - uses a more accurate definition of 'join'
> 
> - separates <programlisting> and <screen> tags
> 
> - shifts <indexterm> definitions outside of <para> to get a better
> rendering in PDF
> 
> - adds a note concerning IDs and foreign keys

I have committed some parts of this patch:

 > - separates <programlisting> and <screen> tags

 > - shifts <indexterm> definitions outside of <para> to get a better
 > rendering in PDF

as well as the change of W1/W2 to w1/w2.  (Note that there is also 
src/tutorial/basics.source that should be adjusted in the same way.)

For the remaining patch I have a couple of concerns:

 >      <para>
 >       Attempt to determine the semantics of this query when the
 > -     <literal>WHERE</literal> clause is omitted.
 > +     <literal>ON</literal> clause is omitted.
 >      </para>
 >     </formalpara>

This no longer works.

In general, I agree that some more emphasis on the JOIN syntax is okay. 
But I think the order in which the tutorial has taught it so far is 
okay: First you do it the manual way, then you learn the more abstract way.

 > +   <note>
 > +    <para>
 > +     The examples shown here combine rows via city names.
 > +     This should help to understand the concept. Professional
 > +     solutions prefer to use numerical IDs and foreign keys
 > +     to join tables.
 > +    </para>
 > +   </note>

While there are interesting debates to be had about natural vs. 
surrogate keys, I don't think we should imply that one of them is 
unprofessional and then leave it at that and give no further guidance. 
I think we should leave this out.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Change JOIN tutorial to focus more on explicit joins

From
Jürgen Purtz
Date:
On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:
>
> >      <para>
> >       Attempt to determine the semantics of this query when the
> > -     <literal>WHERE</literal> clause is omitted.
> > +     <literal>ON</literal> clause is omitted.
> >      </para>
> >     </formalpara>
>
> This no longer works.
>
Ok, but I don't have any better suggestion than to delete this para.
> In general, I agree that some more emphasis on the JOIN syntax is 
> okay. But I think the order in which the tutorial has taught it so far 
> is okay: First you do it the manual way, then you learn the more 
> abstract way.

In this context, I wouldn't use the terms 'manual' and 'abstract', it's 
more about 'implicit' and 'explicit' syntax. The 'explicit' syntax does 
not only emphasis the aspect of 'joining' tables, it also differentiates 
between the usage of following AND/OR/NOT key words as join conditions 
or as additional restrictions (the results are identical but not the 
semantic). Because the purpose of this patch is the preference of the 
explicit syntax, we shall show this syntax first.

>
> > +   <note>
> > +    <para>
> > +     The examples shown here combine rows via city names.
> > +     This should help to understand the concept. Professional
> > +     solutions prefer to use numerical IDs and foreign keys
> > +     to join tables.
> > +    </para>
> > +   </note>
>
> While there are interesting debates to be had about natural vs. 
> surrogate keys, I don't think we should imply that one of them is 
> unprofessional and then leave it at that and give no further guidance. 
> I think we should leave this out.
>
Ok, deleted.

--

Jürgen Purtz


Attachment

Re: Change JOIN tutorial to focus more on explicit joins

From
"David G. Johnston"
Date:
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:

This patch should not be changing the formatting choices for these queries, just the addition of a JOIN clause and modification of the WHERE clause.  Specifically, SELECT is left-aligned while all subsequent clauses indent under it.  Forced alignment by adding extra spaces isn't done here either.  I have not altered those in the attached.

Did some word-smithing on the first paragraph.  The part about the cross-join was hurt by "in some way" and "may be" is not needed.

Pointing out that values from both tables doesn't seem like an improvement when the second item covers that and it is more specific in noting that the city name that is joined on appears twice - once from each table.

ON expression is more precise and the reader should be ok with the term.

Removal of the exercise is good.  Not the time to discuss cross join anyway.  Given that "ON true" works the cross join form isn't even required.

In the FROM clause form I would not add table prefixes to the column names.  They are not part of the form changing.  If discussion about table prefixing is desired it should be done explicitly and by itself.  They are used later on, I didn't check to see whether that was covered or might be confusing.

I suggested a wording for why to use join syntax that doesn't involve legacy and points out its merit compared to sticking a join expression into the where clause.

The original patch missed having the syntax for the first left outer join conform to the multi-line query writing standard you introduced.  I did not change.

The "AND" ON clause should just go with (not changed):

ON (w1.temp_lo < w2.temp_lo
    AND w1.temp_hi > w2.temp_high);

Attaching my suggestions made on top of the attached original 0002-query.patch

David J.

Attachment

Re: Change JOIN tutorial to focus more on explicit joins

From
Jürgen Purtz
Date:
On 22.10.20 01:40, David G. Johnston wrote:
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:

This patch should not be changing the formatting choices for these queries, just the addition of a JOIN clause and modification of the WHERE clause.  Specifically, SELECT is left-aligned while all subsequent clauses indent under it.  Forced alignment by adding extra spaces isn't done here either.  I have not altered those in the attached.

Did some word-smithing on the first paragraph.  The part about the cross-join was hurt by "in some way" and "may be" is not needed.

Pointing out that values from both tables doesn't seem like an improvement when the second item covers that and it is more specific in noting that the city name that is joined on appears twice - once from each table.

ON expression is more precise and the reader should be ok with the term.

Removal of the exercise is good.  Not the time to discuss cross join anyway.  Given that "ON true" works the cross join form isn't even required.

In the FROM clause form I would not add table prefixes to the column names.  They are not part of the form changing.  If discussion about table prefixing is desired it should be done explicitly and by itself.  They are used later on, I didn't check to see whether that was covered or might be confusing.

I suggested a wording for why to use join syntax that doesn't involve legacy and points out its merit compared to sticking a join expression into the where clause.

The original patch missed having the syntax for the first left outer join conform to the multi-line query writing standard you introduced.  I did not change.

The "AND" ON clause should just go with (not changed):

ON (w1.temp_lo < w2.temp_lo
    AND w1.temp_hi > w2.temp_high);

Attaching my suggestions made on top of the attached original 0002-query.patch

David J.

(Hopefully) I have integrated all of David's suggestions as well as the following rules:

- Syntax formatting with the previously used 4 spaces plus newline for JOIN

- Table aliases only when necessary or explicitly discussed

The discussion about the explicit vs. implicit syntax is added to the "As join expressions serve a specific purpose ... " sentence and creates a paragraph of its own.

The patch is build on top of master.

-- J. Purtz


Attachment

Re: Change JOIN tutorial to focus more on explicit joins

From
Pavel Stehule
Date:


čt 22. 10. 2020 v 15:32 odesílatel Jürgen Purtz <juergen@purtz.de> napsal:
On 22.10.20 01:40, David G. Johnston wrote:
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:

This patch should not be changing the formatting choices for these queries, just the addition of a JOIN clause and modification of the WHERE clause.  Specifically, SELECT is left-aligned while all subsequent clauses indent under it.  Forced alignment by adding extra spaces isn't done here either.  I have not altered those in the attached.

Did some word-smithing on the first paragraph.  The part about the cross-join was hurt by "in some way" and "may be" is not needed.

Pointing out that values from both tables doesn't seem like an improvement when the second item covers that and it is more specific in noting that the city name that is joined on appears twice - once from each table.

ON expression is more precise and the reader should be ok with the term.

Removal of the exercise is good.  Not the time to discuss cross join anyway.  Given that "ON true" works the cross join form isn't even required.

In the FROM clause form I would not add table prefixes to the column names.  They are not part of the form changing.  If discussion about table prefixing is desired it should be done explicitly and by itself.  They are used later on, I didn't check to see whether that was covered or might be confusing.

I suggested a wording for why to use join syntax that doesn't involve legacy and points out its merit compared to sticking a join expression into the where clause.

The original patch missed having the syntax for the first left outer join conform to the multi-line query writing standard you introduced.  I did not change.

The "AND" ON clause should just go with (not changed):

ON (w1.temp_lo < w2.temp_lo
    AND w1.temp_hi > w2.temp_high);

Attaching my suggestions made on top of the attached original 0002-query.patch

David J.

(Hopefully) I have integrated all of David's suggestions as well as the following rules:

- Syntax formatting with the previously used 4 spaces plus newline for JOIN

- Table aliases only when necessary or explicitly discussed

The discussion about the explicit vs. implicit syntax is added to the "As join expressions serve a specific purpose ... " sentence and creates a paragraph of its own.

The patch is build on top of master.


Why do you use parenthesis for ON clause?  It is useless. SQL is not C or JAVA.

Regards

Pavel

-- J. Purtz


Re: Change JOIN tutorial to focus more on explicit joins

From
"David G. Johnston"
Date:
On Thu, Oct 22, 2020 at 8:14 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Why do you use parenthesis for ON clause?  It is useless. SQL is not C or JAVA.


At this point in my career it's just a personal habit.  I never programmed C, done most of my development in Java so maybe that's a subconscious influence?

I suspect it is partly because I seldom need to use "ON" but instead join with "USING" which does require the parentheses, so when I need to use ON I just keep them.

I agree they are unnecessary in the example and should be removed to be consistent.

David J.

Re: Change JOIN tutorial to focus more on explicit joins

From
Pavel Stehule
Date:


čt 22. 10. 2020 v 18:27 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Thu, Oct 22, 2020 at 8:14 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Why do you use parenthesis for ON clause?  It is useless. SQL is not C or JAVA.


At this point in my career it's just a personal habit.  I never programmed C, done most of my development in Java so maybe that's a subconscious influence?

I suspect it is partly because I seldom need to use "ON" but instead join with "USING" which does require the parentheses, so when I need to use ON I just keep them.

I agree they are unnecessary in the example and should be removed to be consistent.

:)



David J.

Re: Change JOIN tutorial to focus more on explicit joins

From
Jürgen Purtz
Date:
On 22.10.20 17:14, Pavel Stehule wrote:
>
> Why do you use parenthesis for ON clause?  It is useless. SQL is not C 
> or JAVA.


Two more general answers:
- Why do people use tabs, spaces, and newlines to format their code even 
though it's not necessary? SQL is a language to develop applications. 
And what are the main costs of an application? It's not the time which 
it takes to develop them. It's the time for their maintenance. During 
the course of one or more decades, different persons will have to read 
the code, add additional features, and fix bugs. They need some time to 
read and understand the existing code. This task can be accelerated if 
the code is easy to read. Therefore, it's a good habit of developers to 
sometimes spend some extra characters to the code than is required -  
not only comments. An example: there are clear precedence rules for 
Boolean operators NOT/AND/OR. In an extensive statement it may be 
helpful - for the developer himself as well as for anybody else -to use 
newlines and parentheses at places where they are not necessary to keep 
an overview of the intention of the statement. In such cases, 
code-optimization is the duty of the compiler, not of the developer.
- In my professional life as a software developer, I have seen about 15 
different languages. But only in rare cases, they have offered new 
features or concepts. To overcome this Babylonian linguistic diversity I 
tend to use such syntactical constructs which are common to many of them 
even, even if they are not necessary for the concrete language.

And the concrete answer: Omitting the parentheses for the join condition 
raises the danger that its Boolean operators are mixed with the Boolean 
operators of the WHERE condition. The result at runtime is the same, but 
a reader will understand the intention of the statement faster if the 
parentheses exists.

--

J. Purtz





Re: Change JOIN tutorial to focus more on explicit joins

From
Pavel Stehule
Date:


pá 23. 10. 2020 v 11:14 odesílatel Jürgen Purtz <juergen@purtz.de> napsal:
On 22.10.20 17:14, Pavel Stehule wrote:
>
> Why do you use parenthesis for ON clause?  It is useless. SQL is not C
> or JAVA.


Two more general answers:
- Why do people use tabs, spaces, and newlines to format their code even
though it's not necessary? SQL is a language to develop applications.
And what are the main costs of an application? It's not the time which
it takes to develop them. It's the time for their maintenance. During
the course of one or more decades, different persons will have to read
the code, add additional features, and fix bugs. They need some time to
read and understand the existing code. This task can be accelerated if
the code is easy to read. Therefore, it's a good habit of developers to
sometimes spend some extra characters to the code than is required - 
not only comments. An example: there are clear precedence rules for
Boolean operators NOT/AND/OR. In an extensive statement it may be
helpful - for the developer himself as well as for anybody else -to use
newlines and parentheses at places where they are not necessary to keep
an overview of the intention of the statement. In such cases,
code-optimization is the duty of the compiler, not of the developer.
- In my professional life as a software developer, I have seen about 15
different languages. But only in rare cases, they have offered new
features or concepts. To overcome this Babylonian linguistic diversity I
tend to use such syntactical constructs which are common to many of them
even, even if they are not necessary for the concrete language.

And the concrete answer: Omitting the parentheses for the join condition
raises the danger that its Boolean operators are mixed with the Boolean
operators of the WHERE condition. The result at runtime is the same, but
a reader will understand the intention of the statement faster if the
parentheses exists.

I strongly disagree.

If there are some boolean predicates, then parenthesis has sense. Without these predicates the parenthesis decrease readability. This is the sense of JOIN syntax to separate predicates.

I have a different problem - when I see parentheses where they should not be, I am searching for a reason, and It is unfriendly where there is not any reason. I can understand if somebody uses useless parentheses in their product, but we talk about official documentation, and then we should respect the character of language. 

Regards

Pavel



--

J. Purtz