Re: simplifying foreign key/RI checks - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: simplifying foreign key/RI checks
Date
Msg-id CADkLM=fAojeyK0tgg3RqfJhjAcJWxF8kS1uxObGkQXjTuSQTig@mail.gmail.com
Whole thread Raw
In response to Re: simplifying foreign key/RI checks  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: simplifying foreign key/RI checks  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> I was looking at this statement:
>
> insert into f select generate_series(1, 2000000, 2);
>
> Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
> I tried a scaled down version (1000th) of your example:
>
> yugabyte=# insert into f select generate_series(1, 2000, 2);
> ERROR:  insert or update on table "f" violates foreign key constraint "f_a_fkey"
> DETAIL:  Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me.  Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

> For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
>
> +        * Collect partition key values from the unique key.
>
> At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
> This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

I've updated the patch to add the Assert.  Thanks for taking a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert.

The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables.

A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there.

I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers.

This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different.

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] ProcessInterrupts_hook
Next
From: Michael Paquier
Date:
Subject: Re: ResourceOwner refactoring