Home > mailing lists

Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Parallel Inserts in CREATE TABLE AS
Date	October 15, 2020 03:44:25
Msg-id	CALj2ACVZqKsbx+wb=ccWiLRQ2ziNCswT9j0dH6d3bk0ktotYng@mail.gmail.com Whole thread
In response to	Re: Parallel Inserts in CREATE TABLE AS (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel Inserts in CREATE TABLE AS
List	pgsql-hackers

Tree view

On Wed, Oct 14, 2020 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > If somebody expects to preserve the order of the tuples that are
> > coming from GatherMerge node of the select part in CTAS or SELECT INTO
> > while inserting, now if parallelism is allowed, that may not be the
> > case i.e. the order of insertion of tuples may vary. I'm not quite
> > sure, if someone wants to use order by in the select parts of CTAS or
> > SELECT INTO in a real world use case. Thoughts?
> >
>
> I think there is no reason why one can't use ORDER BY in the
> statements we are talking about here. But, I think we can't enable
> parallelism for GatherMerge is because for that node we always need to
> fetch the data in the leader backend to perform the final merge phase.
> So, I was expecting a small comment saying something on those lines.
>

Sure, I will add comments in the upcoming patch.

>
> > For prepared statements, the parallelism will not be picked and so is
> > parallel insertion.
>
> Hmm, I am not sure what makes you say this statement. The parallelism
> is enabled for prepared statements since commit 57a6a72b6b.
>

Thanks for letting me know this. I misunderstood the parallelism for prepared statements. Now, I verified with a proper use case(see below), where I had a prepared statement, CTAS having EXECUTE, in this case too parallelism is picked and parallel insertion happened with the patch proposed in this thread. Do we have any problems if we allow parallel insertion for these cases?

PREPARE myselect AS SELECT * FROM t1;
EXPLAIN ANALYZE CREATE TABLE t1_test AS EXECUTE myselect;

I think the commit 57a6a72b6b has not added any test cases, isn't it good to add one in prepare.sql or select_parallel.sql?

>
> > 1. How to represent the parallel insert for CTAS in explain plans? The
> > explain CTAS shows the plan for only the SELECT part. How about having
> > some textual info along with the Gather node? I'm not quite sure on
> > this point, any suggestions are welcome.
>
> I am also not sure about this point because we don't display anything
> for the DDL part in explain. Can you propose by showing some example
> of what you have in mind?
>

I thought we could have something like this.
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2 Parallel Insert on t_test1
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954 width=8)
Filter: (many < 10000)
-----------------------------------------------------------------------------

With Regards,

Bharath Rupireddy.

EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Julien Rouhaud
Date: 15 October 2020, 03:41:23
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From: Greg Nancarrow
Date: 15 October 2020, 04:25:14
Subject: Re: Parallel INSERT (INTO ... SELECT ...)

Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

Previous

Next