Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CALj2ACUQ_9-zimTkxNQu=NZYqbfzJkmkiS3bkD1xNDER9WHvrA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I further checked on full txn id and command id. Yes, these are
> > getting passed to workers  via InitializeParallelDSM() ->
> > SerializeTransactionState(). I tried to summarize what we need to do
> > in case of parallel inserts in general i.e. parallel COPY, parallel
> > inserts in INSERT INTO and parallel inserts in CTAS.
> >
> > In the leader:
> >     GetCurrentFullTransactionId()
> >     GetCurrentCommandId(true)
> >     EnterParallelMode();
> >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > (both full txn id and command id are serialized into parallel DSM)
> >
> > In the workers:
> > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > full txn id and command id are restored into workers'
> > CurrentTransactionState->fullTransactionId and currentCommandId)
> > If the parallel workers are meant for insertions, then we need to set
> > currentCommandIdUsed = true; Maybe we can lift the assert in
> > GetCurrentCommandId(), if we don't want to touch that function, then
> > we can have a new function GetCurrentCommandidInWorker() whose
> > functionality will be same as GetCurrentCommandId() without the
> > Assert(!IsParallelWorker());.
> >
> > Am I missing something?
> >
> > If the above points are true, we might have to update the parallel
> > copy patch set, test the use cases and post separately in the parallel
> > copy thread in coming days.
> >
>
> Hi Bharath,
>
> I pretty much agree with your above points.
>
> I've attached an updated Parallel INSERT...SELECT patch, that:
> - Only uses existing transaction state serialization support for
> transfer of xid and cid.
> - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> currentCommandIdUsed=true at the start of a parallel operation (used
> for Parallel INSERT case, where we know the currentCommandId has been
> assigned to the worker at the start of the parallel operation).
> - Tweaks the Assert condition within "used=true" parameter case in
> GetCurrentCommandId(), so that it only fires if in a parallel worker
> and currentCommandId is false - refer to the updated comment in that
> function.
> - Does not modify any existing GetCurrentCommandId() calls.
> - Does not remove any existing parallel-related asserts/checks, except
> for the "cannot insert tuples in a parallel worker" error in
> heap_prepare_insert(). I am still considering what to do with the
> original error-check here.
> [- Does not yet cater for other exclusion cases that you and Vignesh
> have pointed out]
>
> This patch is mostly a lot cleaner, but does contain a possible ugly
> hack, in that where it needs to call GetCurrentFullTransactionId(), it
> must temporarily escape parallel-mode (recalling that parallel-mode is
> set true right at the top of ExectePlan() in the cases of Parallel
> INSERT/SELECT).
>

Thanks Greg.

In general, see a few things common to all parallel insert
cases(CTAS[1], COPY[2], INSERT INTO SELECTs):
1. Removal of "cannot insert tuples in a parallel worker" restriction
from heap_prepare_insert()
2. Each worker should be able to set currentCommandIdUsed to true.
3. The change you proposed to make in GetCurrentCommandId()'s assert condition.

Please add if I miss any other common point.

Common solutions to each of the above points would be beneficial to
all the parallel insert cases. How about having a common thread,
discussion and a common patch for all the 3 points?

@Amit Kapila  @Greg Nancarrow @vignesh C Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACWj%2B3H5TQqwxANZmdePEnSNxk-YAeT1c5WE184Gf75XUw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Dilip Kumar
Date:
Subject: Re: Re: [HACKERS] Custom compression methods