Thread: Possible documentation inaccuracy in optimizer README

Possible documentation inaccuracy in optimizer README

From
Zeyuan Hu
Date:
Hello,

In https://github.com/postgres/postgres/tree/master/src/backend/optimizer, there are two examples on the dynamic programming (DP) algorithm used in the optimizer, which I think 
have some inaccuracy:
SELECT  *    FROM    tab1, tab2, tab3, tab4    WHERE   tab1.col = tab2.col AND        tab2.col = tab3.col AND        tab3.col = tab4.col
    Tables 1, 2, 3, and 4 are joined as:    {1 2},{2 3},{3 4}    {1 2 3},{2 3 4}    {1 2 3 4}    (other possibilities will be excluded for lack of join clauses)
    SELECT  *    FROM    tab1, tab2, tab3, tab4    WHERE   tab1.col = tab2.col AND        tab1.col = tab3.col AND        tab1.col = tab4.col
    Tables 1, 2, 3, and 4 are joined as:    {1 2},{1 3},{1 4}    {1 2 3},{1 3 4},{1 2 4}    {1 2 3 4}
Both examples seem to ignore the transitivity imposed by the join clauses, which creates a gap between the documentation here and what the actual optimizer does. 
For the first example, in the actual optimizer, {1,3},{1,4},{2,4} are also considered. For the second example, {2,3}, {2,4}, {3,4} are also considered. 
The reason is that, for example 1, tab1.col = tab2.col AND tab2.col = tab3.col imply that tab1.col = tab3.col due to transitivity, which leads to the consideration of {1,3}. 
The rest of the missing entries follow the same reasoning.

best
regards,


Zeyuan

Re: Possible documentation inaccuracy in optimizer README

From
Tom Lane
Date:
Zeyuan Hu <ferrishu3886@gmail.com> writes:
> In https://github.com/postgres/postgres/tree/master/src/backend/optimizer,
> there are two examples on the dynamic programming (DP) algorithm used in
> the optimizer, which I think
> have some inaccuracy:

You're right that these examples do not consider the effects of
clauses generated by the EquivalenceClass machinery.  But I don't
think the exposition would be improved by mentioning that here.
The point of these examples is that we don't consider joining
rels that have no linking clauses at all.

We could possibly avoid the inaccuracy by making the examples use
some other operators that are not equijoins.  But I wonder if that
would not be more confusing rather than less so.

            regards, tom lane



Re: Possible documentation inaccuracy in optimizer README

From
Zeyuan Hu
Date:
Hello Tom,

Thanks for the clarification. I now understand the main goal of the examples. I was confused by the remarks in the example "(other possibilities will be excluded for lack of join clauses)" as this is not a true statement: some possibilities are not shown for lack of join clauses while others are not shown for the sake of simplicity. I think it would be nice to add what you explained somewhere in the text to indicate this is a partial example with the main goal of illustrating join rels that have no linking clauses are not considered by the optimizer; I got the impression that these two examples are to illustrate how DP works in the optimizer.

best
regards,


Zeyuan


On Mon, Apr 7, 2025 at 9:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Zeyuan Hu <ferrishu3886@gmail.com> writes:
> In https://github.com/postgres/postgres/tree/master/src/backend/optimizer,
> there are two examples on the dynamic programming (DP) algorithm used in
> the optimizer, which I think
> have some inaccuracy:

You're right that these examples do not consider the effects of
clauses generated by the EquivalenceClass machinery.  But I don't
think the exposition would be improved by mentioning that here.
The point of these examples is that we don't consider joining
rels that have no linking clauses at all.

We could possibly avoid the inaccuracy by making the examples use
some other operators that are not equijoins.  But I wonder if that
would not be more confusing rather than less so.

                        regards, tom lane

Re: Possible documentation inaccuracy in optimizer README

From
David Rowley
Date:
On Tue, 8 Apr 2025 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We could possibly avoid the inaccuracy by making the examples use
> some other operators that are not equijoins.  But I wonder if that
> would not be more confusing rather than less so.

I don't think it'd hurt to mention that we're just ignoring the
existence of ECs for this example. Something like:

--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -143,7 +143,10 @@ For example:
     {1 2},{2 3},{3 4}
     {1 2 3},{2 3 4}
     {1 2 3 4}
-    (other possibilities will be excluded for lack of join clauses)
+    (other possibilities will be excluded for lack of join clauses
+    (technically, EquivalenceClasses do allow us to determine derived join
+    clauses for this case, but we ignore that for the simplicity of this
+    example))

     SELECT  *
     FROM    tab1, tab2, tab3, tab4

If it'll stop a future question or someone from being confused, it
seems worthwhile.

David



Re: Possible documentation inaccuracy in optimizer README

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I don't think it'd hurt to mention that we're just ignoring the
> existence of ECs for this example.

Seems like a reasonable approach.

> -    (other possibilities will be excluded for lack of join clauses)
> +    (other possibilities will be excluded for lack of join clauses
> +    (technically, EquivalenceClasses do allow us to determine derived join
> +    clauses for this case, but we ignore that for the simplicity of this
> +    example))

Maybe better:

    Other possibilities will be excluded for lack of join clauses.
    (In reality, use of EquivalenceClasses would allow us to
    deduce additional join clauses that allow more join
    combinations, but here we ignore that to preserve the
    simplicity of this example.)

            regards, tom lane



Re: Possible documentation inaccuracy in optimizer README

From
David Rowley
Date:
On Wed, 9 Apr 2025 at 14:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe better:
>
>         Other possibilities will be excluded for lack of join clauses.
>         (In reality, use of EquivalenceClasses would allow us to
>         deduce additional join clauses that allow more join
>         combinations, but here we ignore that to preserve the
>         simplicity of this example.)

Looks good to me.

David



Re: Possible documentation inaccuracy in optimizer README

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 9 Apr 2025 at 14:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe better:
>> 
>> Other possibilities will be excluded for lack of join clauses.
>> (In reality, use of EquivalenceClasses would allow us to
>> deduce additional join clauses that allow more join
>> combinations, but here we ignore that to preserve the
>> simplicity of this example.)

> Looks good to me.

OK, done.  I moved the text after noticing that it really applies
to both of the examples here.

            regards, tom lane