Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAFiTN-ucbmiL=3FU-QP4izHa-nd3KOoWjCLnJ1qc9hh=u3=QKA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
On Thu, Dec 10, 2020 at 1:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> >  /*
> > + * PrepareParallelMode
> > + *
> > + * Prepare for entering parallel mode, based on command-type.
> > + */
> > +void
> > +PrepareParallelMode(CmdType commandType)
> > +{
> > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> > +
> > + if (IsModifySupportedInParallelMode(commandType))
> > + {
> > + /*
> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();
> > + }
> > +}
> >
> > Why do we need to serialize the transaction ID for 0001?  I mean in
> > 0001 we are just allowing the SELECT to be executed in parallel so why
> > we would need the transaction Id for that.  I agree that we would need
> > this once we try to perform the Insert also from the worker in the
> > remaining patches.
> >
>
> There's a very good reason. It's related to parallel-mode checks for
> Insert and how the XID is lazily acquired if required.
> When allowing SELECT to be executed in parallel, we're in
> parallel-mode and the leader interleaves Inserts with retrieval of the
> tuple data from the workers.
> You will notice that heap_insert() calls GetTransactionId() as the
> very first thing it does. If the FullTransactionId is not valid,
> AssignTransactionId() is then called, which then executes this code:
>
>     /*
>      * Workers synchronize transaction state at the beginning of each parallel
>      * operation, so we can't account for new XIDs at this point.
>      */
>     if (IsInParallelMode() || IsParallelWorker())
>         elog(ERROR, "cannot assign XIDs during a parallel operation");
>
> So that code (currently) has no way of knowing that a XID is being
> (lazily) assigned at the beginning, or somewhere in the middle of, a
> parallel operation.
> This is the reason why PrepareParallelMode() is calling
> GetTransactionId() up-front, to ensure a FullTransactionId is assigned
> up-front, prior to parallel-mode (so then there won't be an attempted
> XID assignment).
>
> If you remove the GetTransactionId() call from PrepareParallelMode()
> and run "make installcheck-world" with "force_parallel_mode=regress"
> in effect, many tests will fail with:
>     ERROR:  cannot assign XIDs during a parallel operation

Yeah got it, I missed that point that the goal is the avoid assigning
the Transaction Id when we are in parallel mode.  But IIUC at least
for the first patch we don't want to serialize the XID in the
transaction state right because workers don't need the xid as they are
only doing select.  So maybe we can readjust the comment slightly in
the below code

> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Benoit Lobréau
Date:
Subject: pg_shmem_allocations & documentation