Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From vignesh C
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALDaNm0ppjsA_EefjK3VFX6DsfJf9r-6_Z+QbS3jgViHp4CQjg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS
List pgsql-hackers
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
> > Few comments:
> > -       /*
> > -        * To allow parallel inserts, we need to ensure that they are safe to be
> > -        * performed in workers. We have the infrastructure to allow parallel
> > -        * inserts in general except for the cases where inserts generate a new
> > -        * CommandId (eg. inserts into a table having a foreign key column).
> > -        */
> > -       if (IsParallelWorker())
> > -               ereport(ERROR,
> > -                               (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -                                errmsg("cannot insert tuples in a
> > parallel worker")));
> >
> > Is it possible to add a check if it is a CTAS insert here as we do not
> > support insert in parallel workers from others as of now.
>
> Currently, there's no global variable in which we can selectively skip
> this in case of parallel insertion in CTAS. How about having a
> variable in any of the worker global contexts, set that when parallel
> insertion is chosen for CTAS and use that in heap_prepare_insert() to
> skip the above error? Eventually, we can remove this restriction
> entirely in case we fully allow parallelism for INSERT INTO SELECT,
> CTAS, and COPY.
>
> Thoughts?

Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.

>
> > +       Oid                     objectid;               /* workers to
> > open relation/table.  */
> > +       /* Number of tuples inserted by all the workers. */
> > +       pg_atomic_uint64        processed;
> >
> > We can just mention relation instead of relation/table.
>
> I will modify it in the next patch set.
>
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +                      explain_pictas
> > +----------------------------------------------------------
> > + Gather (actual rows=N loops=N)
> > +   Workers Planned: 4
> > +   Workers Launched: N
> > + ->  Create parallel_write
> > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > +(5 rows)
> > +
> > +select count(*) from parallel_write;
> >
> > Can we include selection of cmin, xmin for one of the test to verify
> > that it uses the same transaction id  in the parallel workers
> > something like:
> > select distinct(cmin,xmin) from parallel_write;
>
> This is not possible since cmin and xmin are dynamic, we can not use
> them in test cases. I think it's not necessary to check whether the
> leader and workers are in the same txn or not, since we are not
> creating a new txn. All the txn state from the leader is serialized in
> SerializeTransactionState and restored in
> StartParallelWorkerTransaction.
>

I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  t1) as dt;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Added missing copy related data structures to typedefs.list
Next
From: Thomas Munro
Date:
Subject: Re: Reducing WaitEventSet syscall churn