Re: INSERT INTO SELECT, Why Parallelism is not selected? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date
Msg-id CAA4eK1L3Dca1zmPptcYzqZUa8qfARxYkpZrRfas19vpyEaHBFQ@mail.gmail.com
Whole thread Raw
In response to Re: INSERT INTO SELECT, Why Parallelism is not selected?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: INSERT INTO SELECT, Why Parallelism is not selected?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I still don't agree with this as proposed.
>
> + * For now, we don't allow parallel inserts of any form not even where the
> + * leader can perform the insert.  This restriction can be uplifted once
> + * we allow the planner to generate parallel plans for inserts.  We can
>
> If I'm understanding this correctly, this logic is completely
> backwards. We don't prohibit inserts here because we know the planner
> can't generate them. We prohibit inserts here because, if the planner
> somehow did generate them, it wouldn't be safe. You're saying that
> it's not allowed because we don't try to do it yet, but actually it's
> not allowed because we want to make sure that we don't accidentally
> try to do it. That's very different.
>

Right, so how about something like: "To allow parallel inserts, we
need to ensure that they are safe to be performed in workers.  We have
the infrastructure to allow parallel inserts in general except for the
case where inserts generate a new commandid (eg. inserts into a table
having a foreign key column)."  We can extend this for tuple locking
if required as per the below discussion.  Kindly suggest if you prefer
a different wording here.

>
> + * We should be able to parallelize
> + * the later case if we can ensure that no two parallel processes can ever
> + * operate on the same page.
>
> I don't know whether this is talking about two processes operating on
> the same page at the same time, or ever within a single query
> execution. If it's the former, perhaps we need to explain why that's a
> concern for parallel query but not otherwise;
>

I am talking about the former case and I know that as per current
design it is not possible that two worker processes try to operate on
the same page but I was trying to be pessimistic so that we can ensure
that via some form of Assert.  I don't know whether it is important to
mention this case or not but for the sake of extra safety, I have
mentioned it.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kasahara Tatsuhito
Date:
Subject: Re: Creating a function for exposing memory usage of backend process
Next
From: Michael Paquier
Date:
Subject: Re: Doc patch: mention indexes in pg_inherits docs