Re: Add semi-join pushdown to postgres_fdw - Mailing list pgsql-hackers

From Alexander Pyhalov
Subject Re: Add semi-join pushdown to postgres_fdw
Date
Msg-id 816fa8b1bc2da09a87484d1ef239a332@postgrespro.ru
Whole thread Raw
In response to RE: Add semi-join pushdown to postgres_fdw  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Responses RE: Add semi-join pushdown to postgres_fdw  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Re: Add semi-join pushdown to postgres_fdw  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Hi, Yuki.

Thanks for looking at this patch.

Fujii.Yuki@df.MitsubishiElectric.co.jp писал 2022-12-03 06:02:

> question1)
>   > + if (jointype == JOIN_SEMI && bms_is_member(var->varno,
> innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
>   It takes time for me to find in what case this condition is true.
>   There is cases in which this condition is true for semi-join of two 
> baserels
>   when running query which joins more than two relations such as
> query2 and query3.
>   Running queries such as query2, you maybe want to pushdown of only
> semi-join path of
>   joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1)
> and baserel(innerrel) f_t3
>   because of safety deparse. So you add this condition.
>   Becouase of this limitation, your patch can't push down subquery 
> expression
>   "exists (select null from f_t2 where c1 = a1.c1)" in query3.
>   I think, it is one of difficulty points for semi-join pushdown.
>   This is my understanding of the intent of this condition and the
> restrictions imposed by this condition.
>   Is my understanding right?

IIRC, planner can create semi-join, which targetlist references Vars 
from inner join relation. However, it's deparsed as exists and so we 
can't reference it from SQL. So, there's this check - if Var is 
referenced in semi-join target list, it can't be pushed down.
You can see this if comment out this check.

EXPLAIN (verbose, costs off)
     SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
     (SELECT * FROM ft4 WHERE EXISTS (
         SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4
     ON ft2.c2 = ft4.c1
     INNER JOIN
     (SELECT * FROM ft2 WHERE EXISTS (
         SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21
     ON ft2.c2 = ft21.c2
     WHERE ft2.c1 > 900
     ORDER BY ft2.c1 LIMIT 10;

will fail with
EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT 
NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2))))

Here you can see that
SELECT * FROM ft2 WHERE EXISTS (
         SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)

was transformed to
SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM 
"S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2))))

where our exists subquery is referenced from tlist. It's fine for plan 
(relations, participating in semi-join, can be referenced in tlist),
but is not going to work with EXISTS subquery.
BTW, there's a comment in joinrel_target_ok(). It tells exactly that -

5535             if (jointype == JOIN_SEMI && bms_is_member(var->varno, 
innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
5536             {
5537                 /* We deparse semi-join as exists() subquery, and 
so can't deparse references to inner rel in join target list. */
5538                 ok = false;
5539                 break;
5540             }

Expanded comment.

> question2) In foreign_join_ok
>   > * Constructing queries representing ANTI joins is hard, hence
>   Is this true? Is it hard to expand your approach to ANTI join 
> pushdown?

I haven't tried, so don't know.

> question3) You use variables whose name is "addl_condXXX" in the 
> following code.
>   > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s",
> join_sql_i.data);
>   Does this naming mean additional literal?
>   Is there more complehensive naming, such as "subquery_exprXXX"?

The naming means additional conditions (for WHERE clause, by analogy 
with ignore_conds and remote_conds). Not sure if subquery_expr sounds 
better, but if you come with better idea, I'm fine with renaming them.

> question4) Although really detail, there is expression making space 
> such as
>   "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
>   Is there reason for this difference? If not, need we use same policy
> for making space?
> 

Fixed.

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Attachment

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: A problem about ParamPathInfo for an AppendPath
Next
From: Etsuro Fujita
Date:
Subject: Re: Allow batched insert during cross-partition updates