Home > mailing lists

Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Parallel Inserts in CREATE TABLE AS
Date	October 19, 2020 17:17:01
Msg-id	CALj2ACU8CHmE20dJhnr125s1oCugtgUwJSUbsLkJi3=ymnMKyQ@mail.gmail.com Whole thread
In response to	Re: Parallel Inserts in CREATE TABLE AS (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel Inserts in CREATE TABLE AS
List	pgsql-hackers

Tree view

On Thu, Oct 15, 2020 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > > 1. How to represent the parallel insert for CTAS in explain plans? The
> > > > explain CTAS shows the plan for only the SELECT part. How about having
> > > > some textual info along with the Gather node? I'm not quite sure on
> > > > this point, any suggestions are welcome.
> > >
> > > I am also not sure about this point because we don't display anything
> > > for the DDL part in explain. Can you propose by showing some example
> > > of what you have in mind?
> >
> > I thought we could have something like this.
> > -----------------------------------------------------------------------------
> > Gather (cost=1000.00..108738.90 rows=0 width=8)
> > Workers Planned: 2 Parallel Insert on t_test1
> > -> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954 width=8)
> > Filter: (many < 10000)
> > -----------------------------------------------------------------------------
>
> maybe something like below:
> Gather (cost=1000.00..108738.90 rows=0 width=8)
> -> Create t_test1
> -> Parallel Seq Scan on t_test
>
> I don't know what is the best thing to do here. I think for the
> temporary purpose you can keep something like above then once the
> patch is matured then we can take a separate opinion for this.
>

Agreed. Here's a snapshot of explain with the change suggested.

postgres=# EXPLAIN (ANALYZE, COSTS OFF) CREATE TABLE t1_test AS SELECT * FROM t1;
QUERY PLAN
---------------------------------------------------------------------------------
Gather (actual time=970.524..972.913 rows=0 loops=1)
-> Create t1_test
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
Planning Time: 0.049 ms
Execution Time: 973.733 ms

>
> I think there is no reason why one can't use ORDER BY in the
> statements we are talking about here. But, I think we can't enable
> parallelism for GatherMerge is because for that node we always need to
> fetch the data in the leader backend to perform the final merge phase.
> So, I was expecting a small comment saying something on those lines.
>

Added comments.

> 2. Addition of new test cases.

Added new test cases.

> Analysis on the 2 mismatches in write_parallel.sql regression test.

Done. It needed a small code change in costsize.c. Now, both make check and make check-world passes.

Apart from above, a couple of other things I have finished with the v3 patch.

1. Both make check and make check-world with force_parallel_mode = regress passes.

2. I enabled parallel inserts in case of materialized views. Hope that's fine.

Attaching v3 patch herewith.

I'm done with all the open points in my list. Please review the v3 patch and provide comments.

With Regards,

Bharath Rupireddy.

EnterpriseDB: http://www.enterprisedb.com

Attachment

v3-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch

pgsql-hackers by date:

From: Justin Pryzby
Date: 19 October 2020, 16:45:29
Subject: Re: Probable documentation errors or improvements

From: Tom Lane
Date: 19 October 2020, 18:37:02
Subject: Re: Probable documentation errors or improvements

Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

Attachment

Previous

Next