Re: Reducing duplicativeness of EquivalenceClass-derived clauses - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Reducing duplicativeness of EquivalenceClass-derived clauses
Date
Msg-id CAMbWs4-6-ywf8AoQJ9z48q+v1eL5Scsw-d_nsr9s+dm-Ge4VGA@mail.gmail.com
Whole thread Raw
In response to Reducing duplicativeness of EquivalenceClass-derived clauses  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Reducing duplicativeness of EquivalenceClass-derived clauses  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On Wed, Oct 26, 2022 at 6:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
While fooling with my longstanding outer-join variables changes
(I am making progress on that, honest), I happened to notice that
equivclass.c is leaving some money on the table by generating
redundant RestrictInfo clauses.  It already attempts to not generate
the same clause twice, which can save a nontrivial amount of work
because we cache selectivity estimates and so on per-RestrictInfo.
I realized though that it will build and return a clause like
"a.x = b.y" even if it already has "b.y = a.x".  This is just
wasteful.  It's always been the case that equivclass.c will
produce clauses that are ordered according to its own whims.
Consumers that need the operands in a specific order, such as
index scans or hash joins, are required to commute the clause
to be the way they want it while building the finished plan.
Therefore, it shouldn't matter which order of the operands we
return, and giving back the commutator clause if available could
potentially save as much as half of the selectivity-estimation
work we do with these clauses subsequently.

Hence, PFA a patch that adjusts create_join_clause() to notice
commuted as well as exact matches among the EquivalenceClass's
existing clauses.  This results in a number of changes visible in
regression test cases, but they're all clearly inconsequential.
 
I think there is no problem with this idea, given the operands of
EC-derived clauses are commutative, and it seems no one would actually
rely on the order of the operands.  I can see hashjoin/mergejoin would
commute hash/merge joinclauses if needed with get_switched_clauses().
 

The only thing that I think might be controversial here is that
I dropped the check for matching operator OID.  To preserve that,
we'd have needed to use get_commutator() in the reverse-match cases,
which it seemed to me would be a completely unjustified expenditure
of cycles.  The operators we select for freshly-generated clauses
will certainly always match those of previously-generated clauses.
Maybe there's a chance that they'd not match those of ec_sources
clauses (that is, the user-written clauses we started from), but
if they don't and that actually makes any difference then surely
we are talking about a buggy opclass definition.
 
The operator is chosen according to the two given EC members's data
type.  Since we are dealing with the same pair of EC members, I think
the operator is always the same one. So it also seems no problem to drop
the check for operator. I wonder if we can even add an assertion if
we've found a RestrictInfo from ec_derives that the operator matches.

Thanks
Richard

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Have nodeSort.c use datum sorts single-value byref types
Next
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum