Re: Allow parallel plan for referential integrity checks? - Mailing list pgsql-hackers

From Juan José Santamaría Flecha
Subject Re: Allow parallel plan for referential integrity checks?
Date
Msg-id CAC+AXB0QTc3V+019Y3QvEViuu7H9PaYgqMA39JYmR62cxovf9A@mail.gmail.com
Whole thread Raw
In response to Re: Allow parallel plan for referential integrity checks?  (Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>)
List pgsql-hackers

On Thu, Aug 10, 2023 at 5:06 PM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:
On Tue, Jul 4, 2023 at 9:45 AM Daniel Gustafsson <daniel@yesql.se> wrote:
As there is no new patch submitted I will go ahead and do that, please feel
free to resubmit when there is renewed interest in working on this.

Recently I restored a database from a directory format backup and having this feature would have been quite useful. So, I would like to resume the work on this patch, unless OP or someone else is already on it.

Hearing no one advising me otherwise I'll pick up the work on the patch. First, I'll try to address all previous comments.

On Mon, Feb 14, 2022 at 3:34 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 7, 2022 at 5:26 AM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
> I noticed that referential integrity checks aren't currently
> parallelized. Is it on purpose?

It's not 100% clear to me that it is safe. But on the other hand, it's
also not 100% clear to me that it is unsafe.

Generally, I only added CURSOR_OPT_PARALLEL_OK in places where I was
confident that nothing bad would happen, and this wasn't one of those
places. It's something of a nested-query environment -- your criterion
#6. How do we know that the surrounding query isn't already parallel?
Perhaps because it must be DML, but I think it must be more important
to support parallel DML than to support this.

Currently I don't see a scenario where the validation might run inside another parallel query, but I could be missing something. Suggestions on some edge cases to test are more than welcome.

We already have a case where parallel operations can occur when adding a table constraint, and that's the index creation for a primary key. Take for example:

postgres=# SET max_parallel_maintenance_workers TO 4;
postgres=# SET parallel_setup_cost TO 0;
postgres=# SET parallel_tuple_cost TO 0;
postgres=# SET parallel_leader_participation TO 0;
postgres=# SET min_parallel_table_scan_size TO 0;
postgres=# SET client_min_messages TO debug1;
postgres=# CREATE TABLE parallel_pk_table (a int) WITH (autovacuum_enabled = off);
postgres=# ALTER TABLE parallel_pk_table ADD PRIMARY KEY (a);
DEBUG:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "parallel_pk_table_pkey" for table "parallel_pk_table"
DEBUG:  building index "parallel_pk_table_pkey" on table "parallel_pk_table" with request for 1 parallel workers
...
DEBUG:  index "parallel_pk_table_pkey" can safely use deduplication
DEBUG:  verifying table "parallel_pk_table"

This should be better documented. Maybe "parallel index build" [1] could have its own section, so it can be cited from other sections?
 
I'm not sure what the right thing to do here is, but I think it would
be good if your patch included a test case.

I have included a basic test for parallel ADD PRIMARY KEY and ADD FOREIGN KEY.

On Sat, Mar 19, 2022 at 1:57 AM Imseih (AWS), Sami <simseih@amazon.com> wrote:
It would be valuable to add logging to ensure that the ActiveSnapshot and TransactionSnapshot
is the same for the leader and the workers. This logging could be tested in the TAP test.

That machinery is used in any parallel query, I'm not sure this patch should be responsible for carrying that requirement.

Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.

Currently the work_mem is set to maintenance_work_mem. This will also require
a doc change to call out.

Done.

PFA the patch.


Regards,

Juan José Santamaría Flecha
Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Add 'worker_type' to pg_stat_subscription
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw: wrong results with self join + enable_nestloop off