Re: Asymmetric partition-wise JOIN - Mailing list pgsql-hackers

From Andrey V. Lepikhov
Subject Re: Asymmetric partition-wise JOIN
Date
Msg-id de20c214-1b26-b94a-081b-5b9aad3f490c@postgrespro.ru
Whole thread Raw
In response to Re: Asymmetric partition-wise JOIN  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: Asymmetric partition-wise JOIN
List pgsql-hackers
On 11/30/20 7:43 PM, Anastasia Lubennikova wrote:
> This entry was inactive during this CF, so I've marked it as returned 
> with feedback. Feel free to resubmit an updated version to a future 
> commitfest. 
I return the patch to commitfest. My current reason differs from reason 
of origin author.
This patch can open a door for more complex optimizations in the 
partitionwise join push-down technique.
I mean, we can push-down join not only of two partitioned tables with 
the same partition schema, but a partitioned (sharded) table with an 
arbitrary subplan that is provable independent of local resources.

Example:

CREATE TABLE p(a int) PARTITION BY HASH (a);
CREATE TABLE p1 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE p2 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE p3 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 2);

SELECT * FROM p, (SELECT * FROM generate_series(1,2) AS a) AS s
WHERE p.a=s.a;

  Hash Join
    Hash Cond: (p.a = a.a)
    ->  Append
          ->  Seq Scan on p1 p_1
          ->  Seq Scan on p2 p_2
          ->  Seq Scan on p3 p_3
    ->  Hash
          ->  Function Scan on generate_series a

But with asymmetric join feature we have the plan:

  Append
    ->  Hash Join
          Hash Cond: (p_1.a = a.a)
          ->  Seq Scan on p1 p_1
          ->  Hash
                ->  Function Scan on generate_series a
    ->  Hash Join
          Hash Cond: (p_2.a = a.a)
          ->  Seq Scan on p2 p_2
          ->  Hash
                ->  Function Scan on generate_series a
    ->  Hash Join
          Hash Cond: (p_3.a = a.a)
          ->  Seq Scan on p3 p_3
          ->  Hash
                ->  Function Scan on generate_series a

In the case of FDW-sharding it means that if we can prove that the inner 
relation is independent from the execution server, we can push-down 
these joins and execute it in parallel.

-- 
regards,
Andrey Lepikhov
Postgres Professional



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Result Cache node shows per-worker info even for workers not launched
Next
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings