Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: Parallel INSERT (INTO ... SELECT ...)  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> I further checked on full txn id and command id. Yes, these are
> getting passed to workers  via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
>     GetCurrentFullTransactionId()
>     GetCurrentCommandId(true)
>     EnterParallelMode();
>     InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>
> In the workers:
> ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> full txn id and command id are restored into workers'
> CurrentTransactionState->fullTransactionId and currentCommandId)
> If the parallel workers are meant for insertions, then we need to set
> currentCommandIdUsed = true; Maybe we can lift the assert in
> GetCurrentCommandId(), if we don't want to touch that function, then
> we can have a new function GetCurrentCommandidInWorker() whose
> functionality will be same as GetCurrentCommandId() without the
> Assert(!IsParallelWorker());.
>
> Am I missing something?
>
> If the above points are true, we might have to update the parallel
> copy patch set, test the use cases and post separately in the parallel
> copy thread in coming days.
>

Hi Bharath,

I pretty much agree with your above points.

I've attached an updated Parallel INSERT...SELECT patch, that:
- Only uses existing transaction state serialization support for
transfer of xid and cid.
- Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
currentCommandIdUsed=true at the start of a parallel operation (used
for Parallel INSERT case, where we know the currentCommandId has been
assigned to the worker at the start of the parallel operation).
- Tweaks the Assert condition within "used=true" parameter case in
GetCurrentCommandId(), so that it only fires if in a parallel worker
and currentCommandId is false - refer to the updated comment in that
function.
- Does not modify any existing GetCurrentCommandId() calls.
- Does not remove any existing parallel-related asserts/checks, except
for the "cannot insert tuples in a parallel worker" error in
heap_prepare_insert(). I am still considering what to do with the
original error-check here.
[- Does not yet cater for other exclusion cases that you and Vignesh
have pointed out]

This patch is mostly a lot cleaner, but does contain a possible ugly
hack, in that where it needs to call GetCurrentFullTransactionId(), it
must temporarily escape parallel-mode (recalling that parallel-mode is
set true right at the top of ExectePlan() in the cases of Parallel
INSERT/SELECT).

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: New statistics for tuning WAL buffer size
Next
From: Abhijit Menon-Sen
Date:
Subject: [PATCH] SET search_path += octopus