Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CALj2ACUQ_9-zimTkxNQu=NZYqbfzJkmkiS3bkD1xNDER9WHvrA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > I further checked on full txn id and command id. Yes, these are > > getting passed to workers via InitializeParallelDSM() -> > > SerializeTransactionState(). I tried to summarize what we need to do > > in case of parallel inserts in general i.e. parallel COPY, parallel > > inserts in INSERT INTO and parallel inserts in CTAS. > > > > In the leader: > > GetCurrentFullTransactionId() > > GetCurrentCommandId(true) > > EnterParallelMode(); > > InitializeParallelDSM() --> calls SerializeTransactionState() > > (both full txn id and command id are serialized into parallel DSM) > > > > In the workers: > > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > > full txn id and command id are restored into workers' > > CurrentTransactionState->fullTransactionId and currentCommandId) > > If the parallel workers are meant for insertions, then we need to set > > currentCommandIdUsed = true; Maybe we can lift the assert in > > GetCurrentCommandId(), if we don't want to touch that function, then > > we can have a new function GetCurrentCommandidInWorker() whose > > functionality will be same as GetCurrentCommandId() without the > > Assert(!IsParallelWorker());. > > > > Am I missing something? > > > > If the above points are true, we might have to update the parallel > > copy patch set, test the use cases and post separately in the parallel > > copy thread in coming days. > > > > Hi Bharath, > > I pretty much agree with your above points. > > I've attached an updated Parallel INSERT...SELECT patch, that: > - Only uses existing transaction state serialization support for > transfer of xid and cid. > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting > currentCommandIdUsed=true at the start of a parallel operation (used > for Parallel INSERT case, where we know the currentCommandId has been > assigned to the worker at the start of the parallel operation). > - Tweaks the Assert condition within "used=true" parameter case in > GetCurrentCommandId(), so that it only fires if in a parallel worker > and currentCommandId is false - refer to the updated comment in that > function. > - Does not modify any existing GetCurrentCommandId() calls. > - Does not remove any existing parallel-related asserts/checks, except > for the "cannot insert tuples in a parallel worker" error in > heap_prepare_insert(). I am still considering what to do with the > original error-check here. > [- Does not yet cater for other exclusion cases that you and Vignesh > have pointed out] > > This patch is mostly a lot cleaner, but does contain a possible ugly > hack, in that where it needs to call GetCurrentFullTransactionId(), it > must temporarily escape parallel-mode (recalling that parallel-mode is > set true right at the top of ExectePlan() in the cases of Parallel > INSERT/SELECT). > Thanks Greg. In general, see a few things common to all parallel insert cases(CTAS[1], COPY[2], INSERT INTO SELECTs): 1. Removal of "cannot insert tuples in a parallel worker" restriction from heap_prepare_insert() 2. Each worker should be able to set currentCommandIdUsed to true. 3. The change you proposed to make in GetCurrentCommandId()'s assert condition. Please add if I miss any other common point. Common solutions to each of the above points would be beneficial to all the parallel insert cases. How about having a common thread, discussion and a common patch for all the 3 points? @Amit Kapila @Greg Nancarrow @vignesh C Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACWj%2B3H5TQqwxANZmdePEnSNxk-YAeT1c5WE184Gf75XUw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: