Re: Determine parallel-safety of partition relations for Inserts - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Determine parallel-safety of partition relations for Inserts |
Date | |
Msg-id | CAA4eK1JjT2AOMNUHHBVTeMBbvFObSveZWe5jE8bcvV5cd7yzoQ@mail.gmail.com Whole thread Raw |
In response to | Re: Determine parallel-safety of partition relations for Inserts (Amit Langote <amitlangote09@gmail.com>) |
List | pgsql-hackers |
On Sun, Jan 17, 2021 at 4:45 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Jan 16, 2021 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 15, 2021 at 6:45 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Fri, Jan 15, 2021 at 9:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > We want to do this for Inserts where only Select can be parallel and > > > > Inserts will always be done by the leader backend. This is actually > > > > the case we first want to implement. > > > > > > Sorry, I haven't looked at the linked threads and the latest patches > > > there closely enough yet, so I may be misreading this, but if the > > > inserts will always be done by the leader backend as you say, then why > > > does the planner need to be checking the parallel safety of the > > > *target* table's expressions? > > > > > > > The reason is that once we enter parallel-mode we can't allow > > parallel-unsafe things (like allocation of new CIDs, XIDs, etc.). We > > enter the parallel-mode at the beginning of the statement execution, > > see ExecutePlan(). So, the Insert will be performed in parallel-mode > > even though it happens in the leader backend. It is not possible that > > we finish getting all the tuples from the gather node first and then > > start inserting. Even, if we somehow find something to make this work > > anyway the checks being discussed will be required to make inserts > > parallel (where inserts will be performed by workers) which is > > actually the next patch in the thread I mentioned in the previous > > email. > > > > Does this answer your question? > > Yes, thanks for the explanation. I kind of figured that doing the > insert part itself in parallel using workers would be a part of the > end goal of this work, although that didn't come across immediately. > > It's a bit unfortunate that the parallel safety check of the > individual partitions cannot be deferred until it's known that a given > partition will be affected by the command at all. Will we need > fundamental changes to how parallel query works to make that possible? > If so, have such options been considered in these projects? > I think it is quite fundamental to how parallel query works and we might not be able to change it due to various reasons like (a) it will end up generating a lot of paths in optimizer when it is not safe to do so and in the end, we won't use it. (b) If after inserting into a few partitions we came to know that the next partition we are going to insert has some parallel-unsafe constraints then we need to give up the execution and restart the statement by again trying to first plan it by having non-parallel paths. Now, we can optimize this by retaining both parallel and non-parallel plans such that if we fail to execute parallel-plan we can use a non-parallel plan to execute the statement but still that doesn't seem like an advisable approach. The extra time spent in optimizer will pay-off well by the parallel execution. As pointer earlier, you can see one of the results shared on the other thread [1]. The cases where it might not get the benefit (say when the underlying plan is non-parallel) can have some impact but still, we have not tested that in detail. The ideas we have discussed so far to address that are (a) postpone parallel-safety checks for partitions till there are some underneath partial paths (from which parallel paths can be generated) but that has some down-side in that we will end up generating partial paths when that is really not required, (b) have a rel option like parallel_dml_workers or use existing option parallel_workers to allow considering parallel insert for a relation. Any better ideas? > If such > changes are not possible in the short term, like for v14, we should at > least try to make sure that the eager checking of all partitions is > only performed if using parallelism is possible at all. > As of now, we do first check if it is safe to generate a parallel plan for underlying select (in Insert into .... Select ..) and then perform parallel-safety checks for the DML. We can postpone it further as suggested above in (a). > I will try to take a look at the patches themselves to see if there's > something I know that will help. > Thank you. It will be really helpful if you can do that. [1] - https://www.postgresql.org/message-id/b54f2e306780449093c311118cd8a04e%40G08CNEXMBPEKD05.g08.fujitsu.local -- With Regards, Amit Kapila.
pgsql-hackers by date: