Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAFiTN-ucbmiL=3FU-QP4izHa-nd3KOoWjCLnJ1qc9hh=u3=QKA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) |
List | pgsql-hackers |
On Thu, Dec 10, 2020 at 1:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > /* > > + * PrepareParallelMode > > + * > > + * Prepare for entering parallel mode, based on command-type. > > + */ > > +void > > +PrepareParallelMode(CmdType commandType) > > +{ > > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > > + > > + if (IsModifySupportedInParallelMode(commandType)) > > + { > > + /* > > + * Prepare for entering parallel mode by assigning a > > + * FullTransactionId, to be included in the transaction state that is > > + * serialized in the parallel DSM. > > + */ > > + (void) GetCurrentTransactionId(); > > + } > > +} > > > > Why do we need to serialize the transaction ID for 0001? I mean in > > 0001 we are just allowing the SELECT to be executed in parallel so why > > we would need the transaction Id for that. I agree that we would need > > this once we try to perform the Insert also from the worker in the > > remaining patches. > > > > There's a very good reason. It's related to parallel-mode checks for > Insert and how the XID is lazily acquired if required. > When allowing SELECT to be executed in parallel, we're in > parallel-mode and the leader interleaves Inserts with retrieval of the > tuple data from the workers. > You will notice that heap_insert() calls GetTransactionId() as the > very first thing it does. If the FullTransactionId is not valid, > AssignTransactionId() is then called, which then executes this code: > > /* > * Workers synchronize transaction state at the beginning of each parallel > * operation, so we can't account for new XIDs at this point. > */ > if (IsInParallelMode() || IsParallelWorker()) > elog(ERROR, "cannot assign XIDs during a parallel operation"); > > So that code (currently) has no way of knowing that a XID is being > (lazily) assigned at the beginning, or somewhere in the middle of, a > parallel operation. > This is the reason why PrepareParallelMode() is calling > GetTransactionId() up-front, to ensure a FullTransactionId is assigned > up-front, prior to parallel-mode (so then there won't be an attempted > XID assignment). > > If you remove the GetTransactionId() call from PrepareParallelMode() > and run "make installcheck-world" with "force_parallel_mode=regress" > in effect, many tests will fail with: > ERROR: cannot assign XIDs during a parallel operation Yeah got it, I missed that point that the goal is the avoid assigning the Transaction Id when we are in parallel mode. But IIUC at least for the first patch we don't want to serialize the XID in the transaction state right because workers don't need the xid as they are only doing select. So maybe we can readjust the comment slightly in the below code > > + * Prepare for entering parallel mode by assigning a > > + * FullTransactionId, to be included in the transaction state that is > > + * serialized in the parallel DSM. > > + */ > > + (void) GetCurrentTransactionId(); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: