Thread: Change JOIN tutorial to focus more on explicit joins
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/12/tutorial-join.html Description: The tutorial about joins makes the following statement about the explicit JOIN operator: > This syntax is not as commonly used as the one above I think in 2020 this claim is no longer true, and I would love to see the manual prefer the "modern" explicit JOIN operator rather than sticking to the ancient implicit joins in the WHERE clause.
On Thu, May 21, 2020 at 1:37 AM PG Doc comments form <noreply@postgresql.org> wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/12/tutorial-join.html > Description: > > The tutorial about joins makes the following statement about the explicit > JOIN operator: > > > This syntax is not as commonly used as the one above > > I think in 2020 this claim is no longer true, and I would love to see the > manual prefer the "modern" explicit JOIN operator rather than sticking to > the ancient implicit joins in the WHERE clause. +1 The "new" syntax is 28 years old, from SQL 92. I don't see too many SQL 86 joins. Would you like to write a documentation patch?
On 20.05.20 23:56, Thomas Munro wrote: > On Thu, May 21, 2020 at 1:37 AM PG Doc comments form > <noreply@postgresql.org> wrote: >> The following documentation comment has been logged on the website: >> >> Page: https://www.postgresql.org/docs/12/tutorial-join.html >> Description: >> >> The tutorial about joins makes the following statement about the explicit >> JOIN operator: >> >>> This syntax is not as commonly used as the one above >> I think in 2020 this claim is no longer true, and I would love to see the >> manual prefer the "modern" explicit JOIN operator rather than sticking to >> the ancient implicit joins in the WHERE clause. > +1 > > The "new" syntax is 28 years old, from SQL 92. I don't see too many > SQL 86 joins. Would you like to write a documentation patch? > > The attached patch - prefers the explicit join-syntax over the implicit one and explains the keywords of the explicit syntax - uses a more accurate definition of 'join' - separates <programlisting> and <screen> tags - shifts <indexterm> definitions outside of <para> to get a better rendering in PDF - adds a note concerning IDs and foreign keys -- J. Purtz
Attachment
On Wed, May 27, 2020 at 8:29 PM Jürgen Purtz <juergen@purtz.de> wrote: > > The "new" syntax is 28 years old, from SQL 92. I don't see too many > > SQL 86 joins. Would you like to write a documentation patch? > > > > > The attached patch > > - prefers the explicit join-syntax over the implicit one and explains > the keywords of the explicit syntax > > - uses a more accurate definition of 'join' > > - separates <programlisting> and <screen> tags > > - shifts <indexterm> definitions outside of <para> to get a better > rendering in PDF > > - adds a note concerning IDs and foreign keys Hi Jürgen, Please add to the commitfest app, so we don't lose track of it.
On 2020-05-27 10:29, Jürgen Purtz wrote: > The attached patch > > - prefers the explicit join-syntax over the implicit one and explains > the keywords of the explicit syntax > > - uses a more accurate definition of 'join' > > - separates <programlisting> and <screen> tags > > - shifts <indexterm> definitions outside of <para> to get a better > rendering in PDF > > - adds a note concerning IDs and foreign keys I have committed some parts of this patch: > - separates <programlisting> and <screen> tags > - shifts <indexterm> definitions outside of <para> to get a better > rendering in PDF as well as the change of W1/W2 to w1/w2. (Note that there is also src/tutorial/basics.source that should be adjusted in the same way.) For the remaining patch I have a couple of concerns: > <para> > Attempt to determine the semantics of this query when the > - <literal>WHERE</literal> clause is omitted. > + <literal>ON</literal> clause is omitted. > </para> > </formalpara> This no longer works. In general, I agree that some more emphasis on the JOIN syntax is okay. But I think the order in which the tutorial has taught it so far is okay: First you do it the manual way, then you learn the more abstract way. > + <note> > + <para> > + The examples shown here combine rows via city names. > + This should help to understand the concept. Professional > + solutions prefer to use numerical IDs and foreign keys > + to join tables. > + </para> > + </note> While there are interesting debates to be had about natural vs. surrogate keys, I don't think we should imply that one of them is unprofessional and then leave it at that and give no further guidance. I think we should leave this out. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04.09.20 08:52, Peter Eisentraut wrote: > > For the remaining patch I have a couple of concerns: > > > <para> > > Attempt to determine the semantics of this query when the > > - <literal>WHERE</literal> clause is omitted. > > + <literal>ON</literal> clause is omitted. > > </para> > > </formalpara> > > This no longer works. > Ok, but I don't have any better suggestion than to delete this para. > In general, I agree that some more emphasis on the JOIN syntax is > okay. But I think the order in which the tutorial has taught it so far > is okay: First you do it the manual way, then you learn the more > abstract way. In this context, I wouldn't use the terms 'manual' and 'abstract', it's more about 'implicit' and 'explicit' syntax. The 'explicit' syntax does not only emphasis the aspect of 'joining' tables, it also differentiates between the usage of following AND/OR/NOT key words as join conditions or as additional restrictions (the results are identical but not the semantic). Because the purpose of this patch is the preference of the explicit syntax, we shall show this syntax first. > > > + <note> > > + <para> > > + The examples shown here combine rows via city names. > > + This should help to understand the concept. Professional > > + solutions prefer to use numerical IDs and foreign keys > > + to join tables. > > + </para> > > + </note> > > While there are interesting debates to be had about natural vs. > surrogate keys, I don't think we should imply that one of them is > unprofessional and then leave it at that and give no further guidance. > I think we should leave this out. > Ok, deleted. -- Jürgen Purtz
Attachment
On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:
Attachment
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:This patch should not be changing the formatting choices for these queries, just the addition of a JOIN clause and modification of the WHERE clause. Specifically, SELECT is left-aligned while all subsequent clauses indent under it. Forced alignment by adding extra spaces isn't done here either. I have not altered those in the attached.Did some word-smithing on the first paragraph. The part about the cross-join was hurt by "in some way" and "may be" is not needed.Pointing out that values from both tables doesn't seem like an improvement when the second item covers that and it is more specific in noting that the city name that is joined on appears twice - once from each table.ON expression is more precise and the reader should be ok with the term.Removal of the exercise is good. Not the time to discuss cross join anyway. Given that "ON true" works the cross join form isn't even required.In the FROM clause form I would not add table prefixes to the column names. They are not part of the form changing. If discussion about table prefixing is desired it should be done explicitly and by itself. They are used later on, I didn't check to see whether that was covered or might be confusing.I suggested a wording for why to use join syntax that doesn't involve legacy and points out its merit compared to sticking a join expression into the where clause.The original patch missed having the syntax for the first left outer join conform to the multi-line query writing standard you introduced. I did not change.The "AND" ON clause should just go with (not changed):ON (w1.temp_lo < w2.temp_loAND w1.temp_hi > w2.temp_high);Attaching my suggestions made on top of the attached original 0002-query.patchDavid J.
(Hopefully) I have integrated all of David's suggestions as well as the following rules:
- Syntax formatting with the previously used 4 spaces plus newline for JOIN
- Table aliases only when necessary or explicitly discussed
The discussion about the explicit vs. implicit syntax is added to the "As join expressions serve a specific purpose ... " sentence and creates a paragraph of its own.
The patch is build on top of master.
-- J. Purtz
Attachment
On 22.10.20 01:40, David G. Johnston wrote:On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:On 04.09.20 08:52, Peter Eisentraut wrote:
>
> For the remaining patch I have a couple of concerns:This patch should not be changing the formatting choices for these queries, just the addition of a JOIN clause and modification of the WHERE clause. Specifically, SELECT is left-aligned while all subsequent clauses indent under it. Forced alignment by adding extra spaces isn't done here either. I have not altered those in the attached.Did some word-smithing on the first paragraph. The part about the cross-join was hurt by "in some way" and "may be" is not needed.Pointing out that values from both tables doesn't seem like an improvement when the second item covers that and it is more specific in noting that the city name that is joined on appears twice - once from each table.ON expression is more precise and the reader should be ok with the term.Removal of the exercise is good. Not the time to discuss cross join anyway. Given that "ON true" works the cross join form isn't even required.In the FROM clause form I would not add table prefixes to the column names. They are not part of the form changing. If discussion about table prefixing is desired it should be done explicitly and by itself. They are used later on, I didn't check to see whether that was covered or might be confusing.I suggested a wording for why to use join syntax that doesn't involve legacy and points out its merit compared to sticking a join expression into the where clause.The original patch missed having the syntax for the first left outer join conform to the multi-line query writing standard you introduced. I did not change.The "AND" ON clause should just go with (not changed):ON (w1.temp_lo < w2.temp_loAND w1.temp_hi > w2.temp_high);Attaching my suggestions made on top of the attached original 0002-query.patchDavid J.(Hopefully) I have integrated all of David's suggestions as well as the following rules:
- Syntax formatting with the previously used 4 spaces plus newline for JOIN
- Table aliases only when necessary or explicitly discussed
The discussion about the explicit vs. implicit syntax is added to the "As join expressions serve a specific purpose ... " sentence and creates a paragraph of its own.
The patch is build on top of master.
-- J. Purtz
Why do you use parenthesis for ON clause? It is useless. SQL is not C or JAVA.
On Thu, Oct 22, 2020 at 8:14 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:Why do you use parenthesis for ON clause? It is useless. SQL is not C or JAVA.At this point in my career it's just a personal habit. I never programmed C, done most of my development in Java so maybe that's a subconscious influence?I suspect it is partly because I seldom need to use "ON" but instead join with "USING" which does require the parentheses, so when I need to use ON I just keep them.I agree they are unnecessary in the example and should be removed to be consistent.
David J.
On 22.10.20 17:14, Pavel Stehule wrote: > > Why do you use parenthesis for ON clause? It is useless. SQL is not C > or JAVA. Two more general answers: - Why do people use tabs, spaces, and newlines to format their code even though it's not necessary? SQL is a language to develop applications. And what are the main costs of an application? It's not the time which it takes to develop them. It's the time for their maintenance. During the course of one or more decades, different persons will have to read the code, add additional features, and fix bugs. They need some time to read and understand the existing code. This task can be accelerated if the code is easy to read. Therefore, it's a good habit of developers to sometimes spend some extra characters to the code than is required - not only comments. An example: there are clear precedence rules for Boolean operators NOT/AND/OR. In an extensive statement it may be helpful - for the developer himself as well as for anybody else -to use newlines and parentheses at places where they are not necessary to keep an overview of the intention of the statement. In such cases, code-optimization is the duty of the compiler, not of the developer. - In my professional life as a software developer, I have seen about 15 different languages. But only in rare cases, they have offered new features or concepts. To overcome this Babylonian linguistic diversity I tend to use such syntactical constructs which are common to many of them even, even if they are not necessary for the concrete language. And the concrete answer: Omitting the parentheses for the join condition raises the danger that its Boolean operators are mixed with the Boolean operators of the WHERE condition. The result at runtime is the same, but a reader will understand the intention of the statement faster if the parentheses exists. -- J. Purtz
On 22.10.20 17:14, Pavel Stehule wrote:
>
> Why do you use parenthesis for ON clause? It is useless. SQL is not C
> or JAVA.
Two more general answers:
- Why do people use tabs, spaces, and newlines to format their code even
though it's not necessary? SQL is a language to develop applications.
And what are the main costs of an application? It's not the time which
it takes to develop them. It's the time for their maintenance. During
the course of one or more decades, different persons will have to read
the code, add additional features, and fix bugs. They need some time to
read and understand the existing code. This task can be accelerated if
the code is easy to read. Therefore, it's a good habit of developers to
sometimes spend some extra characters to the code than is required -
not only comments. An example: there are clear precedence rules for
Boolean operators NOT/AND/OR. In an extensive statement it may be
helpful - for the developer himself as well as for anybody else -to use
newlines and parentheses at places where they are not necessary to keep
an overview of the intention of the statement. In such cases,
code-optimization is the duty of the compiler, not of the developer.
- In my professional life as a software developer, I have seen about 15
different languages. But only in rare cases, they have offered new
features or concepts. To overcome this Babylonian linguistic diversity I
tend to use such syntactical constructs which are common to many of them
even, even if they are not necessary for the concrete language.
And the concrete answer: Omitting the parentheses for the join condition
raises the danger that its Boolean operators are mixed with the Boolean
operators of the WHERE condition. The result at runtime is the same, but
a reader will understand the intention of the statement faster if the
parentheses exists.
--
J. Purtz