Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...) |
List | pgsql-hackers |
On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > I further checked on full txn id and command id. Yes, these are > getting passed to workers via InitializeParallelDSM() -> > SerializeTransactionState(). I tried to summarize what we need to do > in case of parallel inserts in general i.e. parallel COPY, parallel > inserts in INSERT INTO and parallel inserts in CTAS. > > In the leader: > GetCurrentFullTransactionId() > GetCurrentCommandId(true) > EnterParallelMode(); > InitializeParallelDSM() --> calls SerializeTransactionState() > (both full txn id and command id are serialized into parallel DSM) > > In the workers: > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > full txn id and command id are restored into workers' > CurrentTransactionState->fullTransactionId and currentCommandId) > If the parallel workers are meant for insertions, then we need to set > currentCommandIdUsed = true; Maybe we can lift the assert in > GetCurrentCommandId(), if we don't want to touch that function, then > we can have a new function GetCurrentCommandidInWorker() whose > functionality will be same as GetCurrentCommandId() without the > Assert(!IsParallelWorker());. > > Am I missing something? > > If the above points are true, we might have to update the parallel > copy patch set, test the use cases and post separately in the parallel > copy thread in coming days. > Hi Bharath, I pretty much agree with your above points. I've attached an updated Parallel INSERT...SELECT patch, that: - Only uses existing transaction state serialization support for transfer of xid and cid. - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting currentCommandIdUsed=true at the start of a parallel operation (used for Parallel INSERT case, where we know the currentCommandId has been assigned to the worker at the start of the parallel operation). - Tweaks the Assert condition within "used=true" parameter case in GetCurrentCommandId(), so that it only fires if in a parallel worker and currentCommandId is false - refer to the updated comment in that function. - Does not modify any existing GetCurrentCommandId() calls. - Does not remove any existing parallel-related asserts/checks, except for the "cannot insert tuples in a parallel worker" error in heap_prepare_insert(). I am still considering what to do with the original error-check here. [- Does not yet cater for other exclusion cases that you and Vignesh have pointed out] This patch is mostly a lot cleaner, but does contain a possible ugly hack, in that where it needs to call GetCurrentFullTransactionId(), it must temporarily escape parallel-mode (recalling that parallel-mode is set true right at the top of ExectePlan() in the cases of Parallel INSERT/SELECT). Regards, Greg Nancarrow Fujitsu Australia
Attachment
pgsql-hackers by date: