Re: Creating foreign key on partitioned table is too slow - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Creating foreign key on partitioned table is too slow
Date
Msg-id CAExHW5t4iwBfDnjH60n6Yz_WquOefBd5yvAgn4SecbW13rUwhA@mail.gmail.com
Whole thread Raw
In response to Re: Creating foreign key on partitioned table is too slow  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Sep 4, 2020 at 12:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Langote <amitlangote09@gmail.com> writes:
> >> Fwiw, I am fine with applying the memory-leak fix in all branches down
> >> to v12 if we are satisfied with the implementation.
>
> > I have revised the above patch slightly to introduce a variable for
> > the condition whether to use a temporary memory context.
>
> This CF entry has been marked "ready for committer", which I find
> inappropriate since there doesn't seem to be any consensus about
> whether we need it.
>
> I tried running the original test case under HEAD.  I do not see
> any visible memory leak, which I think indicates that 5b9312378 or
> some other fix has taken care of the leak since the original report.
> However, after waiting awhile and noting that the ADD FOREIGN KEY
> wasn't finishing, I poked into its progress with a debugger and
> observed that each iteration of RI_Initial_Check() was taking about
> 15 seconds.  I presume we have to do that for each partition,
> which leads to the estimate that it'll take 34 hours to finish this
> ... and that's with no data in the partitions, god help me if
> there'd been a lot.
>
> Some quick "perf" work says that most of the time seems to be
> getting spent in the planner, in get_eclass_for_sort_expr().
> So this is likely just a variant of performance issues we've
> seen before.  (I do wonder why we're not able to prune the
> join to just the matching PK partition, though.)
>

Consider this example
postgres=# create table t1 (a int, b int, CHECK (a between 100 and 150));
CREATE TABLE
postgres=# create table part(a int, b int) partition by range(a);
CREATE TABLE
postgres=# create table part_p1 partition of part for values from (0) to (50);
CREATE TABLE
postgres=# create table part_p2 partition of part for values from (50) to (100);
CREATE TABLE
postgres=# create table part_p3 partition of part for values from
(100) to (150);
CREATE TABLE
postgres=# create table part_p4 partition of part for values from
(150) to (200);
CREATE TABLE
postgres=# explain (costs off) select * from t1 r1, part r2 where r1.a = r2.a;
              QUERY PLAN
--------------------------------------
 Hash Join
   Hash Cond: (r2.a = r1.a)
   ->  Append
         ->  Seq Scan on part_p1 r2_1
         ->  Seq Scan on part_p2 r2_2
         ->  Seq Scan on part_p3 r2_3
         ->  Seq Scan on part_p4 r2_4
   ->  Hash
         ->  Seq Scan on t1 r1
(9 rows)

Given that t1.a can not have any value less than 100 and greater than
150, any row in t1 won't have its joining partner in part_p1 and
part_p2. So those two partitions can be pruned. But I think we don't
consider the constraints on table when joining two tables to render a
join empty or even prune partitions. That would be a good optimization
which will improve this case as well.

But further to that, I think when we add constraint on the partition
table which translates to constraints on individual partitions, we
should check the entire partitioned relation rather than individual
partitions. If we do that, we won't need to plan query for every
partition. If the foreign key happens to be partition key e.g in star
schema, this will use partitionwise join to further improve query
performance. Somewhere in future, we will be able to repartition the
foreign key table by foreign key and perform partitionwise join.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Kasahara Tatsuhito
Date:
Subject: Re: autovac issue with large number of tables
Next
From: Laurenz Albe
Date:
Subject: Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...