Re: Parallel copy - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CA+TgmobokGVq3xN2aWnO0tNYpK6TAnLg9ii9EBcSQQDfVB3PjA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel copy
Re: Parallel copy |
List | pgsql-hackers |
I wonder why you're still looking at this instead of looking at just speeding up the current code, especially the line splitting, per previous discussion. And then coming back to study this issue more after that's done. On Mon, May 11, 2020 at 8:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Apart from this, we have analyzed the other cases as mentioned below > where we need to decide whether we can allow parallelism for the copy > command. > Case-1: > Do we want to enable parallelism for a copy when transition tables are > involved? I think it would be OK not to support this. > Case-2: > a. When there are BEFORE/INSTEAD OF triggers on the table. > b. For partitioned tables, we can't support multi-inserts when there > are any statement-level insert triggers. > c. For inserts into foreign tables. > d. If there are volatile default expressions or the where clause > contains a volatile expression. Here, we can check if the expression > is parallel-safe, then we can allow parallelism. This all sounds fine. > Case-3: > In copy command, for performing foreign key checks, we take KEY SHARE > lock on primary key table rows which inturn will increment the command > counter and updates the snapshot. Now, as we share the snapshots at > the beginning of the command, we can't allow it to be changed later. > So, unless we do something special for it, I think we can't allow > parallelism in such cases. This sounds like much more of a problem to me; it'd be a significant restriction that would kick in routine cases where the user isn't doing anything particularly exciting. The command counter presumably only needs to be updated once per command, so maybe we could do that before we start parallelism. However, I think we would need to have some kind of dynamic memory structure to which new combo CIDs can be added by any member of the group, and then discovered by other members of the group later. At the end of the parallel operation, the leader must discover any combo CIDs added by others to that table before destroying it, even if it has no immediate use for the information. We can't allow a situation where the group members have inconsistent notions of which combo CIDs exist or what their mappings are, and if KEY SHARE locks are being taken, new combo CIDs could be created. > Case-4: > For Deferred Triggers, it seems we record CTIDs of tuples (via > ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred > triggers at transaction end using AfterTriggerFireDeferred or at end > of the statement. I think this could be left for the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: