Re: Parallel copy - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel copy
Date
Msg-id CA+TgmobokGVq3xN2aWnO0tNYpK6TAnLg9ii9EBcSQQDfVB3PjA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel copy
Re: Parallel copy
List pgsql-hackers
I wonder why you're still looking at this instead of looking at just
speeding up the current code, especially the line splitting, per
previous discussion. And then coming back to study this issue more
after that's done.

On Mon, May 11, 2020 at 8:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Apart from this, we have analyzed the other cases as mentioned below
> where we need to decide whether we can allow parallelism for the copy
> command.
> Case-1:
> Do we want to enable parallelism for a copy when transition tables are
> involved?

I think it would be OK not to support this.

> Case-2:
> a. When there are BEFORE/INSTEAD OF triggers on the table.
> b. For partitioned tables, we can't support multi-inserts when there
> are any statement-level insert triggers.
> c. For inserts into foreign tables.
> d. If there are volatile default expressions or the where clause
> contains a volatile expression.  Here, we can check if the expression
> is parallel-safe, then we can allow parallelism.

This all sounds fine.

> Case-3:
> In copy command, for performing foreign key checks, we take KEY SHARE
> lock on primary key table rows which inturn will increment the command
> counter and updates the snapshot.  Now, as we share the snapshots at
> the beginning of the command, we can't allow it to be changed later.
> So, unless we do something special for it, I think we can't allow
> parallelism in such cases.

This sounds like much more of a problem to me; it'd be a significant
restriction that would kick in routine cases where the user isn't
doing anything particularly exciting. The command counter presumably
only needs to be updated once per command, so maybe we could do that
before we start parallelism. However, I think we would need to have
some kind of dynamic memory structure to which new combo CIDs can be
added by any member of the group, and then discovered by other members
of the group later. At the end of the parallel operation, the leader
must discover any combo CIDs added by others to that table before
destroying it, even if it has no immediate use for the information. We
can't allow a situation where the group members have inconsistent
notions of which combo CIDs exist or what their mappings are, and if
KEY SHARE locks are being taken, new combo CIDs could be created.

> Case-4:
> For Deferred Triggers, it seems we record CTIDs of tuples (via
> ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred
> triggers at transaction end using AfterTriggerFireDeferred or at end
> of the statement.

I think this could be left for the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: 2020-05-14 Press Release Draft
Next
From: Robert Haas
Date:
Subject: Re: making update/delete of inheritance trees scale better