Re: Determine parallel-safety of partition relations for Inserts - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Determine parallel-safety of partition relations for Inserts
Date
Msg-id CAA4eK1KN7hbAKB7S58HcKATE9faYtC80RSDyz0_jFJPUP+XtLA@mail.gmail.com
Whole thread Raw
In response to RE: Determine parallel-safety of partition relations for Inserts  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: Determine parallel-safety of partition relations for Inserts  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Fri, Jan 15, 2021 at 7:35 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Amit Kapila <amit.kapila16@gmail.com>
> > This will surely increase planning time but the execution is reduced
> > to an extent due to parallelism that it won't matter for either of the
> > cases if we see just total time. For example, see the latest results
> > for parallel inserts posted by Haiying Tang [3]. There might be an
> > impact when Selects can't be parallelized due to the small size of the
> > Select-table but we still have to traverse all the partitions to
> > determine parallel-safety but not sure how much it is compared to
> > overall time. I guess we need to find the same but apart from that can
> > anyone think of a better way to determine parallel-safety of
> > partitioned relation for Inserts?
>
> Three solutions(?) quickly come to my mind:
>
>
> (1) Have the user specify whether they want to parallelize DML
> Oracle [1] and SQL Server [2] take this approach.  Oracle disables parallel DML execution by default.  The reason is
describedas "This mode is required because parallel DML and serial DML have different locking, transaction, and disk
spacerequirements and parallel DML is disabled for a session by default."  To enable parallel DML in a session or in a
specificstatement, you need to run either of the following: 
>
>   ALTER SESSION ENABLE PARALLEL DML;
>   INSERT /*+ ENABLE_PARALLEL_DML */ …
>
> Besides, the user has to specify a parallel hint in a DML statement, or specify the parallel attribute in CREATE or
ALTERTABLE. 
>
> SQL Server requires a TABLOCK hint to be specified in the INSERT SELECT statement like this:
>
>   INSERT INTO Sales.SalesHistory WITH (TABLOCK)  (target columns...) SELECT ...;
>

I think it would be good if the parallelism works by default when
required but I guess if we want to use something on these lines then
we can always check if the parallel_workers option is non-zero for a
relation (with RelationGetParallelWorkers). So users can always say
Alter Table <tbl_name> Set (parallel_workers = 0) if they don't want
to enable write parallelism for tbl and if someone is bothered that
this might impact Selects as well because the same option is used to
compute the number of workers for it then we can invent a second
option parallel_dml_workers or something like that.

>
> (2) Postpone the parallel safety check after the planner finds a worthwhile parallel query plan
> I'm not sure if the current planner code allows this easily...
>

I think it is possible but it has a bit of disadvantage as well as
mentioned in response to Ashutosh's email [1].

>
> (3) Record the parallel safety in system catalog
> Add a column like relparallel in pg_class that indicates the parallel safety of the relation.  planner just checks
thevalue instead of doing heavy work for every SQL statement.  That column's value is modified whenever a relation
alterationis made that affects the parallel safety, such as adding a domain column and CHECK constraint.  In case of a
partitionedrelation, the parallel safety attributes of all its descendant relations are merged.  For example, if a
partitionbecomes parallel-unsafe, the ascendant partitioned tables also become parallel-unsafe. 
>
> But... developing such code would be burdonsome and bug-prone?
>
>
> I'm inclined to propose (1).  Parallel DML would be something that a limited people run in limited circumstances
(dataloading in data warehouse and batch processing in OLTP systems by the DBA or data administrator), so I think it's
legitimateto require explicit specification of parallelism. 
>
> As an aside, (1) and (2) has a potential problem with memory consumption.
>

I can see the memory consumption argument for (2) because we might end
up generating parallel paths (partial paths) for reading the table but
don't see how it applies to (1)?

[1] - https://www.postgresql.org/message-id/CAA4eK1J80Rzn4M-A5sfkmJ8NjgTxbaC8UWVaNHK6%2B2BCYYv2Nw%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: new release pspg
Next
From: Bharath Rupireddy
Date:
Subject: Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION