Re: CTE push down - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: CTE push down
Date
Msg-id CAExHW5vBx3RuEO2cAG0hhHY5MayrMcvdeGrapZUc5rGsS6sSTQ@mail.gmail.com
Whole thread Raw
In response to CTE push down  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Responses Re: CTE push down  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
List pgsql-hackers
On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:
>
> Hi.
>
> Currently PostgreSQL supports CTE push down for SELECT statements, but
> it is implemented as turning each CTE reference into subquery.
>
> When CTE is referenced multiple times, we have choice - to materialize
> CTE (and disable quals distribution to the CTE query) or inline it (and
> so run CTE query multiple times,
> which can be inefficient, for example, when CTE references foreign
> tables).
>
> I was looking if it is possible to collect quals referencing CTE,
> combine in OR qual and add them to CTE query.
>
> So far I consider the following changes.
>
> 1) Modify SS_process_ctes() to add a list of RestrictInfo* to
> PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this
> list cte_restrictinfos for now)/
> 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE
> relkind and sure that can safely pushdown restrictinfo, preserve
> restrictinfo in cte_restrictinfos, converting multiple restrictions to
> "OR" RestrictInfos.
> 3) In the end of subquery_planner() (after inheritance_planner() or
> grouping_planner()) we can check if cte_restrictinfos contain some
> non-null RestrictInfo pointers and recreate plan for corresponding CTEs,
> distributing quals to relations inside CTE queries.
>
> For now I'm not sure how to handle vars mapping when we push
> restrictinfos to the level of cte root or when we push it down to the
> cte plan, but properly mapping vars seems seems to be doable.

I think similar mapping happens when we push quals that reference a
named JOIN down to join rels. I didn't take a look at it, but I think
it happens before planning time. But some similar machinary might help
in this case.

I believe step2 is needed to avoid materializing rows which will never
be selected. That would be a good improvement. However, care needs to
be taken for volatile quals. I think, the quals on CTE will be
evaluated twice, once when materializing the CTE result and second
time when scanning the materialized result. volatile quals may produce
different results when run multiple times.

>
> Is there something else I miss?
> Does somebody work on alternative solution or see issues in such
> approach?

IMO, a POC patch will help understand your idea.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Converting contrib SQL functions to new style
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Identify LWLocks in tracepoints