RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id OSBPR01MB298274BCB8959932191EA9A6FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
From: Greg Nancarrow <gregn4422@gmail.com>
> Firstly, in order to perform parallel-safety checks in the case of partitions, the
> patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition may
> itself be a partitioned table). Is there a better way of performing the
> parallel-safety checks and reducing the locking requirements?

First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased
planningtime is negligible when the parallel insert is intentionally used for loading large amount of data.  However,
it'sa problem if the overhead is imposed on OLTP transactions.  Does the overhead occur with the default values of
max_parallel_workers_per_gather= 2 and max_parall_workers = 8?
 

To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like
relhasindexesand relhas triggers.  The concerning point is that we have to maintain the accuracy of the value when
droppingancillary objects around the table/partition.
 


> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a partition
> constraint to be violated on insert, while an "alter table attach partition ..." is
> concurrently being executed in another transaction. Because of the partition
> locking done by the patch's parallel-safety checking code, the insert blocks on
> the exclusive lock held by the "alter table" in the other transaction until the
> transaction ends, so the insert ends up successfully completing (and thus fails
> the test) when the other transaction ends. To overcome this test failure, the
> patch code was updated to instead perform a conditional lock on the partition,
> and on failure (i.e. because of an exclusive lock held somewhere else), just
> assume it's parallel-unsafe because the parallel-safety can't be determined
> without blocking on the lock. This is not ideal, but I'm not sure of what other
> approach could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> insights would be appreciated.

That test looks sane.  I think what we should do is to disable parallel operation during that test.  It looks like some
ofother existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0.  It's not strange
thatsome tests fail with some configuration.  autovacuum is disabled in many places of the regression test.
 

Rather, I don't think we should introduce the trick to use ConditionalLockAcquire().  Otherwise, the insert would be
executedin a serial fashion without the user knowing it -- "What?  The insert suddenly slowed down multiple times
today,and it didn't finish within the planned maintenance window.  What's wrong?"
 


Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Zhihong Yu
Date:
Subject: Re: Deleting older versions in unique indexes to avoid page splits