Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Date
Msg-id 3478841.1724878067@sss.pgh.pa.us
Whole thread Raw
In response to Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
List pgsql-hackers
Richard Guo <guofenglinux@gmail.com> writes:
> Exactly.  What Tom's patch does is that if the expression contains
> Vars/PHVs that belong to the subquery, and does not contain any
> non-strict constructs, then it can escape being wrapped.

> In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the
> subquery, and '+' is strict, so it can escape being wrapped.

> The expression 't1.two+1' does not meet these conditions, so it is
> wrapped into a PHV, and the PHV contains lateral reference to t1,
> which results in a nestloop join with a parameterized inner path.
> That's why Memoize can work in this query.

Yeah.  (I'd missed that t1.two is a lateral reference and t2.two is
not; sorry for the noise.)

What happens as of HEAD is that, because we wrap this subquery output
in a PHV marked as due to be evaluated at t2, the entire clause

    (t1.two+t2.two) = t2.unique1

becomes a base restriction clause for t2, so that when we generate
a path for t2 it will include this as a path qual (forcing the path
to be laterally dependent on t1).  Without the PHV, it's just an
ordinary join clause and it will not be evaluated at scan level
unless it can be turned into an indexqual --- which it can't.

The preceding regression-test case with "t1.two+1 = t2.unique1"
can be made into a parameterized indexscan on t2.unique1, so it is,
and then memoize can trigger off that.

I'm inclined to think that treating such a clause as a join clause
is strictly better than what happens now, so I'm not going to
apologize for the PHV not being there.  If you wanted to cast
blame, you could look to set_plain_rel_pathlist, where it says

     * We don't support pushing join clauses into the quals of a seqscan, but
     * it could still have required parameterization due to LATERAL refs in
     * its tlist.

(This comment could stand some work, as it fails to note that
labeling the path with required parameterization can result in
"join clauses" being evaluated there anyway.)

In the normal course of things I'd be dubious about the value of
pushing join clauses into a seqscan, but maybe the possibility of a
memoize'd join has moved the goalposts enough that we should
consider that.  Alternatively, maybe get_memoized_path should take
more responsibility for devising plausible subpaths rather than
assuming they'll be handed to it on a platter.  (I don't remember
all the conditions checked in add_path, but I wonder if we are
missing some potential memoize applications because suitable paths
fail to survive the scan rel's add_path tournament.)

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: allowing extensions to control planner behavior
Next
From: Thomas Munro
Date:
Subject: Re: Streaming read-ready sequential scan code