Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-dWaCqhMRP=E2mWntjX5T55h7-5PKnyqnuhckACyOQYMw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...) |
List | pgsql-hackers |
On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > But we can tighten the condition in GetCurrentCommandId() such that it > Asserts for parallel worker only when currentCommandIdUsed is not set > before start of parallel operation. I also find these changes in the > callers of GetCurrentCommandId() quite adhoc and ugly even if they are > correct. Also, why we don't face a similar problems for parallel copy? > For Parallel Insert, as part of query plan execution, GetCurrentCommandId(true) is being called as part of INSERT statement execution. Parallel Copy of course doesn't have to deal with this. That's why there's a difference. And also, it has its own parallel entry point (ParallelCopyMain), so it's in full control, it's not trying to fit in with the infrastructure for plan execution. > So are you facing problems in this area because we EnterParallelMode > before even assigning the xid in the leader? Because I don't think we > should ever reach this code in the worker. If so, there are two > possibilities that come to my mind (a) assign xid in leader before > entering parallel mode or (b) change the check so that we don't assign > the new xid in workers. In this case, I am again wondering how does > parallel copy dealing this? > Again, there's a fundamental difference in the Parallel Insert case. Right at the top of ExecutePlan it calls EnterParallelMode(). For ParallelCopy(), there is no such problem. EnterParallelMode() is only called just before ParallelCopyMain() is called. So it can easily acquire the xid before this, because then parallel mode is not set. As it turns out, I think I have solved the commandId issue (and almost the xid issue) by realising that both the xid and cid are ALREADY being included as part of the serialized transaction state in the Parallel DSM. So actually I don't believe that there is any need for separately passing them in the DSM, and having to use those AssignXXXXForWorker() functions in the worker code - not even in the Parallel Copy case (? - need to check). GetCurrentCommandId(true) and GetFullTransactionId() need to be called prior to Parallel DSM initialization, so they are included in the serialized transaction state. I just needed to add a function to set currentCommandIdUsed=true in the worker initialization (for INSERT case) and make a small tweak to the Assert in GetCurrentCommandId() to ensure that currentCommandIdUsed, in a parallel worker, never gets set to true when it is false. This is in line with the comment in that function, because we know that "currentCommandId was already true at the start of the parallel operation". With this in place, I don't need to change any of the original calls to GetCurrentCommandId(), so this addresses that issue raised by Andres. I am not sure yet how to get past the issue of the parallel mode being set at the top of ExecutePlan(). With that in place, it doesn't allow a xid to be acquired for the leader, without removing/changing that parallel-mode check in GetNewTransactionId(). Regards, Greg Nancarrow Fujitsu Australia
pgsql-hackers by date: