Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACU8CHmE20dJhnr125s1oCugtgUwJSUbsLkJi3=ymnMKyQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS
List pgsql-hackers
On Thu, Oct 15, 2020 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > > 1. How to represent the parallel insert for CTAS in explain plans? The
> > > > explain CTAS shows the plan for only the SELECT part. How about having
> > > > some textual info along with the Gather node? I'm not quite sure on
> > > > this point, any suggestions are welcome.
> > >
> > > I am also not sure about this point because we don't display anything
> > > for the DDL part in explain. Can you propose by showing some example
> > > of what you have in mind?
> >
> > I thought we could have something like this.
> >  -----------------------------------------------------------------------------
> >      Gather  (cost=1000.00..108738.90 rows=0 width=8)
> >      Workers Planned: 2 Parallel Insert on t_test1
> >         ->  Parallel Seq Scan on t_test  (cost=0.00..106748.00 rows=4954 width=8)
> >              Filter: (many < 10000)
> >  -----------------------------------------------------------------------------
>
> maybe something like below:
> Gather  (cost=1000.00..108738.90 rows=0 width=8)
>    -> Create t_test1
>        ->  Parallel Seq Scan on t_test
>
> I don't know what is the best thing to do here. I think for the
> temporary purpose you can keep something like above then once the
> patch is matured then we can take a separate opinion for this.
>

Agreed. Here's a snapshot of explain with the change suggested.

postgres=# EXPLAIN (ANALYZE, COSTS OFF) CREATE TABLE t1_test AS SELECT * FROM t1;
                                   QUERY PLAN                            
---------------------------------------------------------------------------------
 Gather (actual time=970.524..972.913 rows=0 loops=1)
   ->  Create t1_test
     Workers Planned: 2
     Workers Launched: 2
     ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
 Planning Time: 0.049 ms
 Execution Time: 973.733 ms

>
> I think there is no reason why one can't use ORDER BY in the
> statements we are talking about here. But, I think we can't enable
> parallelism for GatherMerge is because for that node we always need to
> fetch the data in the leader backend to perform the final merge phase.
> So, I was expecting a small comment saying something on those lines.
>

Added comments.

>
> 2. Addition of new test cases.
>

Added new test cases.

>
> Analysis on the 2 mismatches in write_parallel.sql regression test.
>

Done. It needed a small code change in costsize.c. Now, both make check and make check-world passes.

Apart from above, a couple of other things I have finished with the v3 patch.

1. Both make check and make check-world with force_parallel_mode = regress passes.
2. I enabled parallel inserts in case of materialized views. Hope that's fine.

Attaching v3 patch herewith.

I'm done with all the open points in my list. Please review the v3 patch and provide comments.

With Regards,
Bharath Rupireddy.
Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Probable documentation errors or improvements
Next
From: Tom Lane
Date:
Subject: Re: Probable documentation errors or improvements