Re: Consider parallel for lateral subqueries with limit - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Consider parallel for lateral subqueries with limit
Date
Msg-id CA+TgmoY+zjkz9USjMMhY_QoQoW_Lsj1URwwcMHxApu1RvdoaEw@mail.gmail.com
Whole thread Raw
In response to Re: Consider parallel for lateral subqueries with limit  (James Coleman <jtc331@gmail.com>)
Responses Re: Consider parallel for lateral subqueries with limit
List pgsql-hackers
On Mon, Sep 19, 2022 at 3:58 PM James Coleman <jtc331@gmail.com> wrote:
> But in the case where there's correlation via LATERAL we already don't
> guarantee unique executions for a given set of params into the lateral
> subquery execution, right? For example, suppose we have:
>
>   select *
>   from foo
>   left join lateral (
>     select n
>     from bar
>     where bar.a = foo.a
>     limit 1
>   ) on true
>
> and suppose that foo.a (in the table scan) returns these values in
> order: 1, 2, 1. In that case we'll execute the lateral subquery 3
> separate times rather than attempting to order the results of foo.a
> such that we can re-execute the subquery only when the param changes
> to a new unique value (and we definitely don't cache the results to
> guarantee unique subquery executions).

I think this is true, but I don't really understand why we should
focus on LATERAL here. What we really need, and I feel like we've
talked about this before, is a way to reason about where parameters
are set and used. Your sample query gets a plan like this:

 Nested Loop Left Join  (cost=0.00..1700245.00 rows=10000 width=8)
   ->  Seq Scan on foo  (cost=0.00..145.00 rows=10000 width=4)
   ->  Limit  (cost=0.00..170.00 rows=1 width=4)
         ->  Seq Scan on bar  (cost=0.00..170.00 rows=1 width=4)
               Filter: (foo.a = a)

If this were to occur inside a larger plan tree someplace, it would be
OK to insert a Gather node above the Nested Loop node without doing
anything further, because then the parameter that stores foo.a would
be both set and used in the worker. If you wanted to insert a Gather
at any other place in this plan, things get more complicated. But just
because you have LATERAL doesn't mean that you have this problem,
because if you delete the "limit 1" then the subqueries get flattened
together and the parameter disappears, and if you delete the lateral
reference (i.e. WHERE foo.a = bar.a) then there's still a subquery but
it no longer refers to an outer parameter. And on the flip side just
because you don't have LATERAL doesn't mean that you don't have this
problem. e.g. the query could instead be:

select *, (select n from bar where bar.a = foo.a limit 1) from foo;

...which I think is pretty much equivalent to your formulation and has
the same problem as far as parallel query as your formulation but does
not involve the LATERAL keyword.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Consider parallel for lateral subqueries with limit
Next
From: David Rowley
Date:
Subject: Re: Fix typos in code comments