Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACWsRC9O+bpyEEgAg6NGRU7e7-c2jPE8vgZ5iW9TKfEVDw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
> Few comments:
> -       /*
> -        * To allow parallel inserts, we need to ensure that they are safe to be
> -        * performed in workers. We have the infrastructure to allow parallel
> -        * inserts in general except for the cases where inserts generate a new
> -        * CommandId (eg. inserts into a table having a foreign key column).
> -        */
> -       if (IsParallelWorker())
> -               ereport(ERROR,
> -                               (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> -                                errmsg("cannot insert tuples in a
> parallel worker")));
>
> Is it possible to add a check if it is a CTAS insert here as we do not
> support insert in parallel workers from others as of now.

Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.

Thoughts?

> +       Oid                     objectid;               /* workers to
> open relation/table.  */
> +       /* Number of tuples inserted by all the workers. */
> +       pg_atomic_uint64        processed;
>
> We can just mention relation instead of relation/table.

I will modify it in the next patch set.

> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +                      explain_pictas
> +----------------------------------------------------------
> + Gather (actual rows=N loops=N)
> +   Workers Planned: 4
> +   Workers Launched: N
> + ->  Create parallel_write
> +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> +(5 rows)
> +
> +select count(*) from parallel_write;
>
> Can we include selection of cmin, xmin for one of the test to verify
> that it uses the same transaction id  in the parallel workers
> something like:
> select distinct(cmin,xmin) from parallel_write;

This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Fujii Masao
Date:
Subject: Re: A failure of standby to follow timeline switch