Re: INSERT INTO SELECT, Why Parallelism is not selected? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date
Msg-id CAA4eK1+J28VHR2bGBhQDgR4Qdf59yCy-d67MgazBwZGDD_MDUg@mail.gmail.com
Whole thread Raw
In response to Re: INSERT INTO SELECT, Why Parallelism is not selected?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: INSERT INTO SELECT, Why Parallelism is not selected?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Jul 16, 2020 at 6:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 15, 2020 at 11:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The attached patch fixes the comments.  Let me know if you think I
> > have missed anything or any other comments.
>
> If it's safe now, why not remove the error check?
>

I think it is not safe for all kind of Inserts (see my response later
in email), so we need some smarts to identify un-safe inserts before
we can open this check.

> (Is it safe? Could there be other problems?)
>

I think we need to be careful of two things: (a) Do we want to enable
parallel inserts where tuple locks are involved, forex. in statements
like "Insert into primary_tbl Select * from secondary_tbl Where col <
10 For Update;"?  In such statements, I don't see any problem because
each worker will operate on a separate page and even if the leader
already has a lock on the tuple, it will be granted to the worker as
it is taken in the same transaction.  (b) The insert statements that
can generate 'CommandIds' which can happen while insert into tables
with foreign keys, see below test:

CREATE TABLE primary_tbl(index INTEGER PRIMARY KEY, height real, weight real);
insert into primary_tbl values(1, 1.1, 100);
insert into primary_tbl values(2, 1.2, 100);
insert into primary_tbl values(3, 1.3, 100);

CREATE TABLE secondary_tbl(index INTEGER REFERENCES
primary_tbl(index), height real, weight real);

insert into secondary_tbl values(generate_series(1,3),1.2,200);

Here we can't parallelise statements like "insert into secondary_tbl
values(generate_series(1,3),1.2,200);" as they will generate
'CommandIds' for each row insert into table with foreign key.  The new
command id is generated while performing a foreign key check.  Now, it
is a separate question whether generating a command id for each row
insert is required or not but as of now we can't parallelize such
statements.

Do you have something else in mind?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Added tab completion for the missing options in copy statement
Next
From: Amit Kapila
Date:
Subject: Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher