Re: Get rid of runtime handling of AlternativeSubPlan? - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Get rid of runtime handling of AlternativeSubPlan?
Date
Msg-id CAKU4AWpZKKXG5wC8rfLYpRC9_+NRdiKVhEh4okJedoCEKs03Uw@mail.gmail.com
Whole thread Raw
In response to Re: Get rid of runtime handling of AlternativeSubPlan?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Tue, Sep 1, 2020 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> Thank you for this code!  I still have some confusion about when a SubPlan
> should be executed when a join is involved.  I care about this because this
> has an impact on when we can get the num_exec for a subplan.

> The subplan in a target list,  it is executed after the join in my case.
> The subplan
> can be execute after the scan of T1(see below example) and it can also be
> executed
> after the join. Which one is better depends on which methods make the
> num_exec
> smaller.  Is it something we already considered?

Uh, I'm not following your concern.  SubPlans appearing in the join
targetlist *must* be executed "after the join", ie only for valid
join rows.  Otherwise we could have cases where, say, they throw
errors that should not occur. 

I am feeling I'm wrong somewhere however I can't figure it out until now. 

Q1:  select (select t3.a from t3 where t3.c = t1.c) from t1, t2 where t1.b = t2.b;

should equals Q2: 

1. select (select t3.a from t3 where t3.c = t1.c) as a, b from t1  ==>  t13.  
2. select t13.a from t13, t2 where t13.b = t2.b; 

With the following data,  Q1 will execute the subplan twice (since we get 2 rows
after join t1, t2).  while Q2 executes the subplan once (since t1 has only 1 row).
Finally the result is the same. 

postgres=# select * from t1;
 a | b | c
---+---+---
 1 | 1 | 1
(1 row)

postgres=# select * from t2;
 a | b | c
---+---+---
 1 | 1 | 1
 1 | 1 | 2
(2 rows)

postgres=# select * from t3;
 a | b | c
---+---+---
 1 | 1 | 1
(1 row)

On the other hand, SubPlans appearing
in the join's qual conditions have to be executed "before the join",
although exactly what that means is fuzzy because we don't make any
promises about the relative ordering of different qual conditions.

> When the subplan is in a Qual, it is supposed to be executed as soon as
> possible,
> The current implementation matches the below cases.  So can we say we
> knows the num_execs of SubPlan just after we plan the dependent rels?

I wouldn't say so.  If the SubPlan's qual actually only depends on one
of the input rels, distribute_qual_to_rels would have pushed it down
further than the join. Among the quals that do have to be evaluated at the join,
a qual involving a SubPlan is best executed last on cost
grounds, or so I'd guess anyway.  So the number of executions is probably
less than the product of the input rel sizes.  That's what motivates
the choice of NUM_EXEC_QUAL in my patch.

Understand now.  Thank you!

--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Boundary value check in lazy_tid_reaped()
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Append with naive multiplexing of FDWs