Thread: Possible documentation inaccuracy in optimizer README
Hello,
In https://github.com/postgres/postgres/tree/master/src/backend/optimizer, there are two examples on the dynamic programming (DP) algorithm used in the optimizer, which I think
have some inaccuracy:
SELECT * FROM tab1, tab2, tab3, tab4 WHERE tab1.col = tab2.col AND tab2.col = tab3.col AND tab3.col = tab4.col Tables 1, 2, 3, and 4 are joined as: {1 2},{2 3},{3 4} {1 2 3},{2 3 4} {1 2 3 4} (other possibilities will be excluded for lack of join clauses) SELECT * FROM tab1, tab2, tab3, tab4 WHERE tab1.col = tab2.col AND tab1.col = tab3.col AND tab1.col = tab4.col Tables 1, 2, 3, and 4 are joined as: {1 2},{1 3},{1 4} {1 2 3},{1 3 4},{1 2 4} {1 2 3 4}Both examples seem to ignore the transitivity imposed by the join clauses, which creates a gap between the documentation here and what the actual optimizer does.
For the first example, in the actual optimizer, {1,3},{1,4},{2,4} are also considered. For the second example, {2,3}, {2,4}, {3,4} are also considered.
The reason is that, for example 1, tab1.col = tab2.col AND tab2.col = tab3.col imply that tab1.col = tab3.col due to transitivity, which leads to the consideration of {1,3}.
The rest of the missing entries follow the same reasoning.
best
Zeyuan
regards,
Zeyuan
Zeyuan Hu <ferrishu3886@gmail.com> writes: > In https://github.com/postgres/postgres/tree/master/src/backend/optimizer, > there are two examples on the dynamic programming (DP) algorithm used in > the optimizer, which I think > have some inaccuracy: You're right that these examples do not consider the effects of clauses generated by the EquivalenceClass machinery. But I don't think the exposition would be improved by mentioning that here. The point of these examples is that we don't consider joining rels that have no linking clauses at all. We could possibly avoid the inaccuracy by making the examples use some other operators that are not equijoins. But I wonder if that would not be more confusing rather than less so. regards, tom lane
Hello Tom,
Thanks for the clarification. I now understand the main goal of the examples. I was confused by the remarks in the example "(other possibilities will be excluded for lack of join clauses)" as this is not a true statement: some possibilities are not shown for lack of join clauses while others are not shown for the sake of simplicity. I think it would be nice to add what you explained somewhere in the text to indicate this is a partial example with the main goal of illustrating join rels that have no linking clauses are not considered by the optimizer; I got the impression that these two examples are to illustrate how DP works in the optimizer.
best
Zeyuan
regards,
Zeyuan
On Mon, Apr 7, 2025 at 9:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Zeyuan Hu <ferrishu3886@gmail.com> writes:
> In https://github.com/postgres/postgres/tree/master/src/backend/optimizer,
> there are two examples on the dynamic programming (DP) algorithm used in
> the optimizer, which I think
> have some inaccuracy:
You're right that these examples do not consider the effects of
clauses generated by the EquivalenceClass machinery. But I don't
think the exposition would be improved by mentioning that here.
The point of these examples is that we don't consider joining
rels that have no linking clauses at all.
We could possibly avoid the inaccuracy by making the examples use
some other operators that are not equijoins. But I wonder if that
would not be more confusing rather than less so.
regards, tom lane
On Tue, 8 Apr 2025 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > We could possibly avoid the inaccuracy by making the examples use > some other operators that are not equijoins. But I wonder if that > would not be more confusing rather than less so. I don't think it'd hurt to mention that we're just ignoring the existence of ECs for this example. Something like: --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -143,7 +143,10 @@ For example: {1 2},{2 3},{3 4} {1 2 3},{2 3 4} {1 2 3 4} - (other possibilities will be excluded for lack of join clauses) + (other possibilities will be excluded for lack of join clauses + (technically, EquivalenceClasses do allow us to determine derived join + clauses for this case, but we ignore that for the simplicity of this + example)) SELECT * FROM tab1, tab2, tab3, tab4 If it'll stop a future question or someone from being confused, it seems worthwhile. David
David Rowley <dgrowleyml@gmail.com> writes: > I don't think it'd hurt to mention that we're just ignoring the > existence of ECs for this example. Seems like a reasonable approach. > - (other possibilities will be excluded for lack of join clauses) > + (other possibilities will be excluded for lack of join clauses > + (technically, EquivalenceClasses do allow us to determine derived join > + clauses for this case, but we ignore that for the simplicity of this > + example)) Maybe better: Other possibilities will be excluded for lack of join clauses. (In reality, use of EquivalenceClasses would allow us to deduce additional join clauses that allow more join combinations, but here we ignore that to preserve the simplicity of this example.) regards, tom lane
On Wed, 9 Apr 2025 at 14:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe better: > > Other possibilities will be excluded for lack of join clauses. > (In reality, use of EquivalenceClasses would allow us to > deduce additional join clauses that allow more join > combinations, but here we ignore that to preserve the > simplicity of this example.) Looks good to me. David
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 9 Apr 2025 at 14:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Maybe better: >> >> Other possibilities will be excluded for lack of join clauses. >> (In reality, use of EquivalenceClasses would allow us to >> deduce additional join clauses that allow more join >> combinations, but here we ignore that to preserve the >> simplicity of this example.) > Looks good to me. OK, done. I moved the text after noticing that it really applies to both of the examples here. regards, tom lane