Thread: simplifying foreign key/RI checks

simplifying foreign key/RI checks

From
Amit Langote
Date:
While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads.  The two main aspects of  how we do these checks that
seemingly cause the most overhead are:

* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them

* Using plain SQL queries issued over SPI

There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here.  I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation.  The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row.  If the query
returns a row, all good.  Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

              QUERY PLAN
--------------------------------------
 LockRows
   ->  Index Scan using pk_pkey on pk x
         Index Cond: (a = $1)
(3 rows)

So in most cases, the trigger's function would only execute the plan
that's already there, at least in a given session.  That's good but
what we realized would be even better is if we didn't have to
"execute" a full-fledged "plan" for this, that is, to simply find out
whether a row containing the key we're looking for exists in the
referenced relation and if found lock it.  Directly scanning the index
and locking it directly with table_tuple_lock() like ExecLockRows()
does gives us exactly that behavior, which seems simple enough to be
done in a not-so-long local function in ri_trigger.c.  I gave that a
try and came up with the attached.  It also takes care of the case
where the referenced relation is partitioned in which case its
appropriate leaf partition's index is scanned.

The patch results in ~2x improvement in the performance of inserts and
updates on referencing tables:

create table p (a numeric primary key);
insert into p select generate_series(1, 1000000);
create table f (a bigint references p);

-- unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6340.733 ms (00:06.341)

update f set a = a + 1;
UPDATE 1000000
Time: 7490.906 ms (00:07.491)

-- patched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3340.808 ms (00:03.341)

update f set a = a + 1;
UPDATE 1000000
Time: 4178.171 ms (00:04.178)

The improvement is even more dramatic when the referenced table (that
we're no longer querying over SPI) is partitioned.  Here are the
numbers when the PK relation has 1000 hash partitions.

Unpatched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 35898.783 ms (00:35.899)

update f set a = a + 1;
UPDATE 1000000
Time: 37736.294 ms (00:37.736)

Patched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 5633.377 ms (00:05.633)

update f set a = a + 1;
UPDATE 1000000
Time: 6345.029 ms (00:06.345)

That's over ~5x improvement!

While the above case seemed straightforward enough for skipping SPI,
it seems a bit hard to do the same for other cases where we query the
*referencing* relation during an operation on the referenced table
(for example, checking if the row being deleted is still referenced),
because the plan in those cases is not predictably an index scan.
Also, the filters in those queries are more than likely to not match
the partition key of a partitioned referencing relation, so all
partitions will have to scanned.  I have left those cases as future
work.

The patch seems simple enough to consider for inclusion in v14 unless
of course we stumble into some dealbreaker(s).  I will add this to
March CF.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CADkLM%3DcTt_8Fg1Jtij5j%2BQEBOxz9Cuu4DiMDYOwdtktDAKzuLw%40mail.gmail.com

[2] https://www.postgresql.org/message-id/1813.1586363881%40antos

Attachment

Re: simplifying foreign key/RI checks

From
Zhihong Yu
Date:
Hi,
I was looking at this statement:

insert into f select generate_series(1, 2000000, 2);

Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:

yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR:  insert or update on table "f" violates foreign key constraint "f_a_fkey"
DETAIL:  Key (a)=(1001) is not present in table "p".

For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :

+        * Collect partition key values from the unique key.

At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

Cheers

On Mon, Jan 18, 2021 at 4:40 AM Amit Langote <amitlangote09@gmail.com> wrote:
While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads.  The two main aspects of  how we do these checks that
seemingly cause the most overhead are:

* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them

* Using plain SQL queries issued over SPI

There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here.  I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation.  The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row.  If the query
returns a row, all good.  Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

              QUERY PLAN
--------------------------------------
 LockRows
   ->  Index Scan using pk_pkey on pk x
         Index Cond: (a = $1)
(3 rows)

So in most cases, the trigger's function would only execute the plan
that's already there, at least in a given session.  That's good but
what we realized would be even better is if we didn't have to
"execute" a full-fledged "plan" for this, that is, to simply find out
whether a row containing the key we're looking for exists in the
referenced relation and if found lock it.  Directly scanning the index
and locking it directly with table_tuple_lock() like ExecLockRows()
does gives us exactly that behavior, which seems simple enough to be
done in a not-so-long local function in ri_trigger.c.  I gave that a
try and came up with the attached.  It also takes care of the case
where the referenced relation is partitioned in which case its
appropriate leaf partition's index is scanned.

The patch results in ~2x improvement in the performance of inserts and
updates on referencing tables:

create table p (a numeric primary key);
insert into p select generate_series(1, 1000000);
create table f (a bigint references p);

-- unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6340.733 ms (00:06.341)

update f set a = a + 1;
UPDATE 1000000
Time: 7490.906 ms (00:07.491)

-- patched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3340.808 ms (00:03.341)

update f set a = a + 1;
UPDATE 1000000
Time: 4178.171 ms (00:04.178)

The improvement is even more dramatic when the referenced table (that
we're no longer querying over SPI) is partitioned.  Here are the
numbers when the PK relation has 1000 hash partitions.

Unpatched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 35898.783 ms (00:35.899)

update f set a = a + 1;
UPDATE 1000000
Time: 37736.294 ms (00:37.736)

Patched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 5633.377 ms (00:05.633)

update f set a = a + 1;
UPDATE 1000000
Time: 6345.029 ms (00:06.345)

That's over ~5x improvement!

While the above case seemed straightforward enough for skipping SPI,
it seems a bit hard to do the same for other cases where we query the
*referencing* relation during an operation on the referenced table
(for example, checking if the row being deleted is still referenced),
because the plan in those cases is not predictably an index scan.
Also, the filters in those queries are more than likely to not match
the partition key of a partitioned referencing relation, so all
partitions will have to scanned.  I have left those cases as future
work.

The patch seems simple enough to consider for inclusion in v14 unless
of course we stumble into some dealbreaker(s).  I will add this to
March CF.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CADkLM%3DcTt_8Fg1Jtij5j%2BQEBOxz9Cuu4DiMDYOwdtktDAKzuLw%40mail.gmail.com

[2] https://www.postgresql.org/message-id/1813.1586363881%40antos

Re: simplifying foreign key/RI checks

From
Pavel Stehule
Date:


po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads.  The two main aspects of  how we do these checks that
seemingly cause the most overhead are:

* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them

* Using plain SQL queries issued over SPI

There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here.  I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation.  The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row.  If the query
returns a row, all good.  Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

              QUERY PLAN
--------------------------------------
 LockRows
   ->  Index Scan using pk_pkey on pk x
         Index Cond: (a = $1)
(3 rows)




What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows.


Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
>> I started with the check that's performed when inserting into or
>> updating the referencing table to confirm that the new row points to a
>> valid row in the referenced relation.  The corresponding SQL is this:
>>
>> SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
>>
>> $1 is the value of the foreign key of the new row.  If the query
>> returns a row, all good.  Thanks to SPI, or its use of plan caching,
>> the query is re-planned only a handful of times before making a
>> generic plan that is then saved and reused, which looks like this:
>>
>>               QUERY PLAN
>> --------------------------------------
>>  LockRows
>>    ->  Index Scan using pk_pkey on pk x
>>          Index Cond: (a = $1)
>> (3 rows)
>
>
> What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows.

I see the same ~2x improvement.

create table p (a numeric primary key);
insert into p select generate_series(1, 1000);
create table f (a bigint references p);

Unpatched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 5461.377 ms (00:05.461)


Patched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2357.440 ms (00:02.357)

That's expected because the overhead of using SPI to check the PK
table, which the patch gets rid of, is the same no matter the size of
the index to be scanned.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> I was looking at this statement:
>
> insert into f select generate_series(1, 2000000, 2);
>
> Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
> I tried a scaled down version (1000th) of your example:
>
> yugabyte=# insert into f select generate_series(1, 2000, 2);
> ERROR:  insert or update on table "f" violates foreign key constraint "f_a_fkey"
> DETAIL:  Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me.  Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

> For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
>
> +        * Collect partition key values from the unique key.
>
> At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been
found?
 
> This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

I've updated the patch to add the Assert.  Thanks for taking a look.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Zhihong Yu
Date:
Thanks for the quick response.

+               if (mapped_partkey_attnums[i] == pk_attnums[j])
+               {
+                   partkey_vals[i] = pk_vals[j];
+                   partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
+                   i++;
+                   break;

The way counter (i) is incremented is out of my expectation.
In the rare case, where some i doesn't have corresponding pk_attnums[j], wouldn't there be a dead loop ?

I think the goal of adding the assertion should be not loop infinitely even if the invariant is not satisfied.

I guess a counter other than i would be better for this purpose.

Cheers

On Mon, Jan 18, 2021 at 6:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> I was looking at this statement:
>
> insert into f select generate_series(1, 2000000, 2);
>
> Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
> I tried a scaled down version (1000th) of your example:
>
> yugabyte=# insert into f select generate_series(1, 2000, 2);
> ERROR:  insert or update on table "f" violates foreign key constraint "f_a_fkey"
> DETAIL:  Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me.  Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

> For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
>
> +        * Collect partition key values from the unique key.
>
> At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
> This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

I've updated the patch to add the Assert.  Thanks for taking a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: simplifying foreign key/RI checks

From
japin
Date:
On Tue, 19 Jan 2021 at 10:45, Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>>
>> Hi,
>> I was looking at this statement:
>>
>> insert into f select generate_series(1, 2000000, 2);
>>
>> Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
>> I tried a scaled down version (1000th) of your example:
>>
>> yugabyte=# insert into f select generate_series(1, 2000, 2);
>> ERROR:  insert or update on table "f" violates foreign key constraint "f_a_fkey"
>> DETAIL:  Key (a)=(1001) is not present in table "p".
>
> Sorry, a wrong copy-paste by me.  Try this:
>
> create table p (a numeric primary key);
> insert into p select generate_series(1, 2000000);
> create table f (a bigint references p);
>
> -- Unpatched
> insert into f select generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 6527.652 ms (00:06.528)
>
> update f set a = a + 1;
> UPDATE 1000000
> Time: 8108.310 ms (00:08.108)
>
> -- Patched:
> insert into f select generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 3312.193 ms (00:03.312)
>
> update f set a = a + 1;
> UPDATE 1000000
> Time: 4292.807 ms (00:04.293)
>
>> For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
>>
>> +        * Collect partition key values from the unique key.
>>
>> At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been
found? 
>> This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.
>
> I've updated the patch to add the Assert.  Thanks for taking a look.

After apply the v2 patches, here are some warnings:

In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
                 from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through
[-Wimplicit-fallthrough=]
  do { \
     ^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
  ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
  ^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
  ereport(elevel, errmsg_internal(__VA_ARGS__))
  ^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
     elog(ERROR, "unexpected table_tuple_lock status: %u", res);
     ^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
    default:
    ^~~~~~~

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: simplifying foreign key/RI checks

From
Pavel Stehule
Date:


út 19. 1. 2021 v 3:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
>> I started with the check that's performed when inserting into or
>> updating the referencing table to confirm that the new row points to a
>> valid row in the referenced relation.  The corresponding SQL is this:
>>
>> SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
>>
>> $1 is the value of the foreign key of the new row.  If the query
>> returns a row, all good.  Thanks to SPI, or its use of plan caching,
>> the query is re-planned only a handful of times before making a
>> generic plan that is then saved and reused, which looks like this:
>>
>>               QUERY PLAN
>> --------------------------------------
>>  LockRows
>>    ->  Index Scan using pk_pkey on pk x
>>          Index Cond: (a = $1)
>> (3 rows)
>
>
> What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows.

I see the same ~2x improvement.

create table p (a numeric primary key);
insert into p select generate_series(1, 1000);
create table f (a bigint references p);

Unpatched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 5461.377 ms (00:05.461)


Patched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2357.440 ms (00:02.357)

That's expected because the overhead of using SPI to check the PK
table, which the patch gets rid of, is the same no matter the size of
the index to be scanned.

It looks very well.

Regards

Pavel


--
Amit Langote
EDB: http://www.enterprisedb.com

Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:

In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
                 from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
  do { \
     ^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
  ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
  ^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
  ereport(elevel, errmsg_internal(__VA_ARGS__))
  ^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
     elog(ERROR, "unexpected table_tuple_lock status: %u", res);
     ^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
    default:
    ^~~~~~~

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

I also get this warning. Adding a "break;" at line 418 resolves the warning.
 

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jan 19, 2021 at 2:56 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>> In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
>>                  from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
>> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
>> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through
[-Wimplicit-fallthrough=]
>>   do { \
>>      ^
>> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
>>   ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
>>   ^~~~~~~~~~~~~~
>> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
>>   ereport(elevel, errmsg_internal(__VA_ARGS__))
>>   ^~~~~~~
>> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
>>      elog(ERROR, "unexpected table_tuple_lock status: %u", res);
>>      ^~~~
>> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
>>     default:
>>     ^~~~~~~

Thanks, will fix it.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> I was looking at this statement:
>
> insert into f select generate_series(1, 2000000, 2);
>
> Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
> I tried a scaled down version (1000th) of your example:
>
> yugabyte=# insert into f select generate_series(1, 2000, 2);
> ERROR:  insert or update on table "f" violates foreign key constraint "f_a_fkey"
> DETAIL:  Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me.  Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

> For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
>
> +        * Collect partition key values from the unique key.
>
> At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
> This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

I've updated the patch to add the Assert.  Thanks for taking a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert.

The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables.

A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there.

I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers.

This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different.

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jan 19, 2021 at 3:46 PM Corey Huinker <corey.huinker@gmail.com> wrote:
> v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any
testsfailing, we could add another regression test if we're into 100% code path coverage. As it is, I think the
compilerwarning was a sufficient alert. 

Thanks for the review.  I will look into checking the coverage.

> The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned
tables.
>
> A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and
functionnames making the assumption that the foreign key is referencing the primary key of the referenced table.
Foreignkey constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique
indexis behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing
naming(pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced
relationand get the primary key index from there, which would not always be correct. This patch correctly grabs the
indexfrom the constraint itself, so no problem there. 

I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file.  Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

> I like that this patch changes the absolute minimum of the code in order to get a very significant performance
benefit.It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce
theperformance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer
performanceboost than was achieved with previous efforts at statement level triggers. 
>
> [1] https://www.postgresql.org/message-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com

Thanks.  I hadn't noticed [1] before today, but after looking it over,
it seems that what is being proposed there can still be of use.  As
long as SPI is used in ri_trigger.c, it makes sense to consider any
tweaks addressing its negative impact, especially if they are not very
invasive.  There's this patch too from the last month:
https://commitfest.postgresql.org/32/2930/

> This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far
lesscommon, and whose solution will likely be very different. 

Yeah, we should continue looking into the ways to make referenced-side
RI checks be less bloated.

I've attached the updated patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jan 19, 2021 at 12:00 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> +               if (mapped_partkey_attnums[i] == pk_attnums[j])
> +               {
> +                   partkey_vals[i] = pk_vals[j];
> +                   partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
> +                   i++;
> +                   break;
>
> The way counter (i) is incremented is out of my expectation.
> In the rare case, where some i doesn't have corresponding pk_attnums[j], wouldn't there be a dead loop ?
>
> I think the goal of adding the assertion should be not loop infinitely even if the invariant is not satisfied.
>
> I guess a counter other than i would be better for this purpose.

I have done that in v3.  Thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:
I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file.  Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I agree with leaving the existing terminology where it is for this patch. Changing the function name is probably enough to alert the reader that the things that are called pks may not be precisely that.


Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:


I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file.  Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I think that's a nice compromise, it makes the reader aware of the concept.
 

I've attached the updated patch.

Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is introduced.
No docs required, as there is nothing user-facing.

Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>> I decided not to deviate from pk_ terminology so that the new code
>> doesn't look too different from the other code in the file.  Although,
>> I guess we can at least call the main function
>> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
>> changed that.
>
> I think that's a nice compromise, it makes the reader aware of the concept.
>>
>> I've attached the updated patch.
>
> Missing "break" added. Check.
> Comment updated. Check.
> Function renamed. Check.
> Attribute mapping matching test (and assertion) added. Check.
> Patch applies to an as-of-today master, passes make check and check world.
> No additional regression tests required, as no new functionality is introduced.
> No docs required, as there is nothing user-facing.

Thanks for the review.

> Questions:
> 1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived
enoughthat we don't care?
 
> 2. Same question for the AtrrMap map, should there be a free_attrmap().

I hadn't checked, but yes, the prevailing context is
AfterTriggerTupleContext, a short-lived one that is reset for every
trigger event tuple.  I'm still inclined to explicitly free those
objects, so changed like that.  While at it, I also changed
mapped_partkey_attnums to root_partattrs for readability.

Attached v4.
--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Zhihong Yu
Date:
Hi,

+       for (i = 0; i < riinfo->nkeys; i++)
+       {
+           Oid     eq_opr = eq_oprs[i];
+           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+           if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))

It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'.

+           case TM_Updated:
+               if (IsolationUsesXactSnapshot())
...
+           case TM_Deleted:
+               if (IsolationUsesXactSnapshot())

It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code).

Cheers

On Fri, Jan 22, 2021 at 11:10 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>> I decided not to deviate from pk_ terminology so that the new code
>> doesn't look too different from the other code in the file.  Although,
>> I guess we can at least call the main function
>> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
>> changed that.
>
> I think that's a nice compromise, it makes the reader aware of the concept.
>>
>> I've attached the updated patch.
>
> Missing "break" added. Check.
> Comment updated. Check.
> Function renamed. Check.
> Attribute mapping matching test (and assertion) added. Check.
> Patch applies to an as-of-today master, passes make check and check world.
> No additional regression tests required, as no new functionality is introduced.
> No docs required, as there is nothing user-facing.

Thanks for the review.

> Questions:
> 1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care?
> 2. Same question for the AtrrMap map, should there be a free_attrmap().

I hadn't checked, but yes, the prevailing context is
AfterTriggerTupleContext, a short-lived one that is reset for every
trigger event tuple.  I'm still inclined to explicitly free those
objects, so changed like that.  While at it, I also changed
mapped_partkey_attnums to root_partattrs for readability.

Attached v4.
--
Amit Langote
EDB: http://www.enterprisedb.com

Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:


On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,

+       for (i = 0; i < riinfo->nkeys; i++)
+       {
+           Oid     eq_opr = eq_oprs[i];
+           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+           if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))

It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'.

+           case TM_Updated:
+               if (IsolationUsesXactSnapshot())
...
+           case TM_Deleted:
+               if (IsolationUsesXactSnapshot())

It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code).

Cheers

I'll pause on reviewing v4 until you've addressed the suggestions above.

 

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <corey.huinker@gmail.com> wrote:
> On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>>
>> Hi,

Thanks for the review.

>> +       for (i = 0; i < riinfo->nkeys; i++)
>> +       {
>> +           Oid     eq_opr = eq_oprs[i];
>> +           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
>> +           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
>> +
>> +           if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
>>
>> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way,
ri_HashCompareOpwouldn't be called when pk_nulls[i] == 'n'.
 

Good idea, so done.  Although, there can't be nulls right now.

>> +           case TM_Updated:
>> +               if (IsolationUsesXactSnapshot())
>> ...
>> +           case TM_Deleted:
>> +               if (IsolationUsesXactSnapshot())
>>
>> It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to
eachother (saving one block of code).
 

Ah, yes.  The TM_Updated case used to be handled a bit differently in
earlier unposted versions of the patch, though at some point I
concluded that the special handling was unnecessary, but didn't
realize what you just pointed out.  Fixed.

> I'll pause on reviewing v4 until you've addressed the suggestions above.

Here's v5.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:


On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <corey.huinker@gmail.com> wrote:
> On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>>
>> Hi,

Thanks for the review.

>> +       for (i = 0; i < riinfo->nkeys; i++)
>> +       {
>> +           Oid     eq_opr = eq_oprs[i];
>> +           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
>> +           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
>> +
>> +           if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
>>
>> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'.

Good idea, so done.  Although, there can't be nulls right now.

>> +           case TM_Updated:
>> +               if (IsolationUsesXactSnapshot())
>> ...
>> +           case TM_Deleted:
>> +               if (IsolationUsesXactSnapshot())
>>
>> It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code).

Ah, yes.  The TM_Updated case used to be handled a bit differently in
earlier unposted versions of the patch, though at some point I
concluded that the special handling was unnecessary, but didn't
realize what you just pointed out.  Fixed.

> I'll pause on reviewing v4 until you've addressed the suggestions above.

Here's v5.

v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct?

Anybody else want to look this patch over before I mark it Ready For Committer? 

Re: simplifying foreign key/RI checks

From
Keisuke Kuroda
Date:
Hi, Amit-san,

Nice patch. I have confirmed that this solves the problem in [1] with
INSERT/UPDATE.

HEAD + patch
       name       | bytes | pg_size_pretty
------------------+-------+----------------
 CachedPlanQuery  | 10280 | 10 kB
 CachedPlanSource | 14616 | 14 kB
 CachedPlan       | 13168 | 13 kB ★ 710MB -> 13kB
(3 rows)

> > This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far
lesscommon, and whose solution will likely be very different. 
>
> Yeah, we should continue looking into the ways to make referenced-side
> RI checks be less bloated.

However, as already mentioned, the problem of memory bloat on DELETE remains.
This can be solved by the patch in [1], but I think it is too much to apply
this patch only for DELETE. What do you think?

[1] https://www.postgresql.org/message-id/flat/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1

--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
> On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> Here's v5.
>
> v5 patches apply to master.
> Suggested If/then optimization is implemented.
> Suggested case merging is implemented.
> Passes make check and make check-world yet again.
> Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is
TopMemoryContextaka lifetime of the session, correct?
 

Right.

> Anybody else want to look this patch over before I mark it Ready For Committer?

Would be nice to have others look it over.  Thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
Kuroda-san,

On Mon, Jan 25, 2021 at 6:06 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
> Hi, Amit-san,
>
> Nice patch. I have confirmed that this solves the problem in [1] with
> INSERT/UPDATE.

Thanks for testing.

> HEAD + patch
>        name       | bytes | pg_size_pretty
> ------------------+-------+----------------
>  CachedPlanQuery  | 10280 | 10 kB
>  CachedPlanSource | 14616 | 14 kB
>  CachedPlan       | 13168 | 13 kB ★ 710MB -> 13kB
> (3 rows)

If you only tested insert/update on the referencing table, I would've
expected to see nothing in the result of that query, because the patch
eliminates all use of SPI in that case.  I suspect the CachedPlan*
memory contexts you are seeing belong to some early activity in the
session.  So if you try the insert/update in a freshly started
session, you would see 0 rows in the result of that query.

> > > This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also
farless common, and whose solution will likely be very different. 
> >
> > Yeah, we should continue looking into the ways to make referenced-side
> > RI checks be less bloated.
>
> However, as already mentioned, the problem of memory bloat on DELETE remains.
> This can be solved by the patch in [1], but I think it is too much to apply
> this patch only for DELETE. What do you think?
>
> [1] https://www.postgresql.org/message-id/flat/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1

Hmm, the patch tries to solve a general problem that SPI plans are not
being shared among partitions whereas they should be.   So I don't
think that it's necessarily specific to DELETE.  Until we have a
solution like the patch on this thread for DELETE, it seems fine to
consider the other patch as a stopgap solution.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Jan 25, 2021 at 7:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Jan 25, 2021 at 6:06 PM Keisuke Kuroda
> <keisuke.kuroda.3862@gmail.com> wrote:
> > However, as already mentioned, the problem of memory bloat on DELETE remains.
> > This can be solved by the patch in [1], but I think it is too much to apply
> > this patch only for DELETE. What do you think?
> >
> > [1] https://www.postgresql.org/message-id/flat/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1
>
> Hmm, the patch tries to solve a general problem that SPI plans are not
> being shared among partitions whereas they should be.   So I don't
> think that it's necessarily specific to DELETE.  Until we have a
> solution like the patch on this thread for DELETE, it seems fine to
> consider the other patch as a stopgap solution.

Forgot to mention one thing.  Alvaro, in his last email on that
thread, characterized that patch as fixing a bug, although I may have
misread that.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Tatsuro Yamada
Date:
Hi Amit-san,

On 2021/01/25 18:19, Amit Langote wrote:
> On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
>> On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com> wrote:
>>> Here's v5.
>>
>> v5 patches apply to master.
>> Suggested If/then optimization is implemented.
>> Suggested case merging is implemented.
>> Passes make check and make check-world yet again.
>> Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is
TopMemoryContextaka lifetime of the session, correct?
 
> 
> Right.
> 
>> Anybody else want to look this patch over before I mark it Ready For Committer?
> 
> Would be nice to have others look it over.  Thanks.


Thanks for creating the patch!

I tried to review the patch. Here is my comment.

* According to this thread [1], it might be better to replace elog() with
   ereport() in the patch.

[1]: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com

Thanks,
Tatsuro Yamada




Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
Yamada-san,

On Wed, Jan 27, 2021 at 8:51 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> On 2021/01/25 18:19, Amit Langote wrote:
> > On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
> >> Anybody else want to look this patch over before I mark it Ready For Committer?
> >
> > Would be nice to have others look it over.  Thanks.
>
> Thanks for creating the patch!
>
> I tried to review the patch. Here is my comment.

Thanks for the comment.

> * According to this thread [1], it might be better to replace elog() with
>    ereport() in the patch.
>
> [1]: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com

Could you please tell which elog() of the following added by the patch
you are concerned about?

+           case TM_Invisible:
+               elog(ERROR, "attempted to lock invisible tuple");
+               break;
+
+           case TM_SelfModified:
+           case TM_BeingModified:
+           case TM_WouldBlock:
+               elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+               break;

+           default:
+               elog(ERROR, "unrecognized table_tuple_lock status: %u", res);

All of these are meant as debugging elog()s for cases that won't
normally occur.  IIUC, the discussion at the linked thread excludes
those from consideration.


--
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Tatsuro Yamada
Date:
Hi Amit-san,

> +           case TM_Invisible:
> +               elog(ERROR, "attempted to lock invisible tuple");
> +               break;
> +
> +           case TM_SelfModified:
> +           case TM_BeingModified:
> +           case TM_WouldBlock:
> +               elog(ERROR, "unexpected table_tuple_lock status: %u", res);
> +               break;
> 
> +           default:
> +               elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
> 
> All of these are meant as debugging elog()s for cases that won't
> normally occur.  IIUC, the discussion at the linked thread excludes
> those from consideration.

Thanks for your explanation.
Ah, I reread the thread, and I now realized that user visible log messages
are the target to replace. I understood that that elog() for the cases won't
normally occur. Sorry for the noise.

Regards,
Tatsuro Yamada




Re: simplifying foreign key/RI checks

From
Keisuke Kuroda
Date:
Hi Amit-san,

Thanks for the answer!

> If you only tested insert/update on the referencing table, I would've
> expected to see nothing in the result of that query, because the patch
> eliminates all use of SPI in that case.  I suspect the CachedPlan*
> memory contexts you are seeing belong to some early activity in the
> session.  So if you try the insert/update in a freshly started
> session, you would see 0 rows in the result of that query.

That's right.
CREATE PARTITION TABLE included in the test script(rep.sql) was using SPI.
In a new session, I confirmed that CachedPlan is not generated when only
execute INSERT.

# only execute INSERT

postgres=# INSERT INTO ps SELECT generate_series(1,4999);
INSERT 0 4999
postgres=#
postgres=# INSERT INTO pr SELECT i, i from generate_series(1,4999)i;
INSERT 0 4999

postgres=# SELECT name, sum(used_bytes) as bytes,
pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts
WHERE name LIKE 'Cached%' GROUP BY name;

 name | bytes | pg_size_pretty
------+-------+----------------
(0 rows) ★ No CachedPlan

> Hmm, the patch tries to solve a general problem that SPI plans are not
> being shared among partitions whereas they should be.   So I don't
> think that it's necessarily specific to DELETE.  Until we have a
> solution like the patch on this thread for DELETE, it seems fine to
> consider the other patch as a stopgap solution.

I see.
So this is a solution to the problem of using SPI plans in partitions,
not just DELETE.
I agree with you, I think this is a solution to the current problem.

Best Regards,



--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com



Re: simplifying foreign key/RI checks

From
Kyotaro Horiguchi
Date:
At Sun, 24 Jan 2021 20:51:39 +0900, Amit Langote <amitlangote09@gmail.com> wrote in 
> Here's v5.

At Mon, 25 Jan 2021 18:19:56 +0900, Amit Langote <amitlangote09@gmail.com> wrote in 
> > Anybody else want to look this patch over before I mark it Ready For Committer?
> 
> Would be nice to have others look it over.  Thanks.

This nice improvement.

0001 just looks fine.

0002:

 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_ReferencedKeyExists().
 #define RI_PLAN_CHECK_LOOKUPPK            1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK    2
 #define RI_PLAN_LAST_ON_PK                RI_PLAN_CHECK_LOOKUPPK_FROM_PK

However, this patch does.

+    if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+        ri_ReportViolation(riinfo,
+                           pk_rel, fk_rel,
+                           newslot,
+                           NULL,
+                           RI_PLAN_CHECK_LOOKUPPK, false);

It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
know that doesn't mean the usefulness of the macro but the mechanism
the macro suggests, but it is confusing.) On the other hand,
RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
longer used.  (Couldn't we remove them?)

(about the latter, we can rewrite the only use of it "if
(qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)" not to use the macro.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Wed, Jan 27, 2021 at 5:32 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Sun, 24 Jan 2021 20:51:39 +0900, Amit Langote <amitlangote09@gmail.com> wrote in
> > Here's v5.
>
> At Mon, 25 Jan 2021 18:19:56 +0900, Amit Langote <amitlangote09@gmail.com> wrote in
> > > Anybody else want to look this patch over before I mark it Ready For Committer?
> >
> > Would be nice to have others look it over.  Thanks.
>
> This nice improvement.
>
> 0001 just looks fine.
>
> 0002:
>
>  /* RI query type codes */
> -/* these queries are executed against the PK (referenced) table: */
> +/*
> + * 1 and 2  are no longer used, because PK (referenced) table is looked up
> + * directly using ri_ReferencedKeyExists().
>  #define RI_PLAN_CHECK_LOOKUPPK                 1
>  #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK 2
>  #define RI_PLAN_LAST_ON_PK                             RI_PLAN_CHECK_LOOKUPPK_FROM_PK
>
> However, this patch does.
>
> +       if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
> +               ri_ReportViolation(riinfo,
> +                                                  pk_rel, fk_rel,
> +                                                  newslot,
> +                                                  NULL,
> +                                                  RI_PLAN_CHECK_LOOKUPPK, false);
>
> It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
> know that doesn't mean the usefulness of the macro but the mechanism
> the macro suggests, but it is confusing.) On the other hand,
> RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
> longer used.  (Couldn't we remove them?)

Yeah, better to just remove those _PK macros and say this module no
longer runs any queries on the PK table.

How about the attached?

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Mar 8, 2021 at 11:41 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Mar 4, 2021 at 5:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >  I guess I'm disturbed by the idea
> > that we'd totally replace the implementation technology for only one
> > variant of foreign key checks.  That means that there'll be a lot
> > of minor details that don't act the same depending on context.  One
> > point I was just reminded of by [1] is that the SPI approach enforces
> > permissions checks on the table access, which I do not see being done
> > anywhere in your patch.  Now, maybe it's fine not to have such checks,
> > on the grounds that the existence of the RI constraint is sufficient
> > permission (the creator had to have REFERENCES permission to make it).
> > But I'm not sure about that.  Should we add SELECT permissions checks
> > to this code path to make it less different?
> >
> > In the same vein, the existing code actually runs the query as the
> > table owner (cf. SetUserIdAndSecContext in ri_PerformCheck), another
> > nicety you haven't bothered with.  Maybe that is invisible for a
> > pure SELECT query but I'm not sure I would bet on it.  At the very
> > least you're betting that the index-related operators you invoke
> > aren't going to care, and that nobody is going to try to use this
> > difference to create a security exploit via a trojan-horse index.
>
> How about we do at the top of ri_ReferencedKeyExists() what
> ri_PerformCheck() always does before executing a query, which is this:
>
>    /* Switch to proper UID to perform check as */
>    GetUserIdAndSecContext(&save_userid, &save_sec_context);
>    SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
>                           save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
>                           SECURITY_NOFORCE_RLS);
>
> And then also check the permissions of the switched user on the scan
> target relation's schema (ACL_USAGE) and the relation itself
> (ACL_SELECT).
>
> IOW, this:
>
> +   Oid         save_userid;
> +   int         save_sec_context;
> +   AclResult   aclresult;
> +
> +   /* Switch to proper UID to perform check as */
> +   GetUserIdAndSecContext(&save_userid, &save_sec_context);
> +   SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
> +                          save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
> +                          SECURITY_NOFORCE_RLS);
> +
> +   /* Check namespace permissions. */
> +   aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
> +                                     GetUserId(), ACL_USAGE);
> +   if (aclresult != ACLCHECK_OK)
> +       aclcheck_error(aclresult, OBJECT_SCHEMA,
> +                      get_namespace_name(RelationGetNamespace(pk_rel)));
> +   /* Check the user has SELECT permissions on the referenced relation. */
> +   aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
> +                                 ACL_SELECT);
> +   if (aclresult != ACLCHECK_OK)
> +       aclcheck_error(aclresult, OBJECT_TABLE,
> +                      RelationGetRelationName(pk_rel));
>
>     /*
>      * Extract the unique key from the provided slot and choose the equality
> @@ -414,6 +436,9 @@ ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
>     index_endscan(scan);
>     ExecDropSingleTupleTableSlot(outslot);
>
> +   /* Restore UID and security context */
> +   SetUserIdAndSecContext(save_userid, save_sec_context);
> +
>     /* Don't release lock until commit. */
>     index_close(idxrel, NoLock);

I've included these changes in the updated patch.

> > Shall we mention RLS restrictions?  If we don't worry about that,
> > I think REFERENCES privilege becomes a full bypass of RLS, at
> > least for unique-key columns.
>
> Seeing what check_enable_rls() does when running under the security
> context set by ri_PerformCheck(), it indeed seems that RLS is bypassed
> when executing these RI queries.  The following comment in
> check_enable_rls() seems to say so:
>
>          * InNoForceRLSOperation indicates that we should not apply RLS even
>          * if the table has FORCE RLS set - IF the current user is the owner.
>          * This is specifically to ensure that referential integrity checks
>          * are able to still run correctly.

I've added a comment to note that the new way of "selecting" the
referenced tuple effectively bypasses RLS, as is the case when
selecting via SPI.

> > I wonder also what happens if the referenced table isn't a plain
> > heap with a plain btree index.  Maybe you're accessing it at the
> > right level of abstraction so things will just work with some
> > other access methods, but I'm not sure about that.
>
> I believe that I've made ri_ReferencedKeyExists() use the appropriate
> APIs to scan the index, lock the returned table tuple, etc., but do
> you think we might be better served by introducing a new set of APIs
> for this use case?

I concur that by using the interfaces defined in genam.h and
tableam.h, patch accounts for cases involving other access methods.

That said, I had overlooked one bit in the new code that is specific
to btree AM, which is the hard-coding of BTEqualStrategyNumber in the
following:

        /* Initialize the scankey. */
        ScanKeyInit(&skey[i],
                    pkattno,
                    BTEqualStrategyNumber,
                    regop,
                    pk_vals[i]);

In the updated patch, I've added code to look up the index-specific
strategy number to pass here.

> > Lastly, ri_PerformCheck is pretty careful about not only which
> > snapshot it uses, but which *pair* of snapshots it uses, because
> > sometimes it needs to worry about data changes since the start
> > of the transaction.  You've ignored all of that complexity AFAICS.
> > That's okay (I think) for RI_FKey_check which was passing
> > detectNewRows = false, but for sure it's not okay for
> > ri_Check_Pk_Match.  (I kind of thought we had isolation tests
> > that would catch that, but apparently not.)
>
> Okay, let me closely check the ri_Check_Pk_Match() case and see if
> there's any live bug.

As mentioned in my earlier reply, there doesn't seem to be a need for
ri_Check_Pk_Match() to set the crosscheck snapshot as it is basically
unused.

> > So, this is a cute idea, and the speedup is pretty impressive,
> > but I don't think it's anywhere near committable.  I also wonder
> > whether we really want ri_triggers.c having its own copy of
> > low-level stuff like the tuple-locking code you copied.  Seems
> > like a likely maintenance hazard, so maybe some more refactoring
> > is needed.
>
> Okay, I will see if there's a way to avoid copying too much code.

I thought sharing the tuple-locking code with ExecLockRows(), which
seemed closest in semantics to what the new code is doing, might not
be such a bad idea, but not sure I came up with a great interface for
the shared function.  Actually, there are other places having their
own copies of tuple-locking logic, but they deal with the locking
result in their own unique ways, so I didn't get excited about finding
a way to make the new function accommodate their needs.  I also admit
that I may have totally misunderstood what refactoring you were
referring to in your comment.

Updated patches attached.  Sorry about the delay.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Updated patches attached.  Sorry about the delay.

Rebased over the recent DETACH PARTITION CONCURRENTLY work.
Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Zhihong Yu
Date:
Hi,

+       skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
+                                  estate->es_snapshot, estate->es_output_cid,
+                                  lockmode, erm->waitPolicy, &epq_needed);
+       if (skip)

It seems the variable skip is only used above. The variable is not needed - if statement can directly check the return value.

+ *         Locks tuple with given TID with given lockmode following given wait

given appears three times in the above sentence. Maybe the following is bit easier to read:

Locks tuple with the specified TID, lockmode following given wait policy

+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.

I think 'same' is not needed here since the remaining part of the sentence has adequately identified the key.

+       if (leaf_pk_rel == NULL)
+           goto done;

It would be better to avoid goto by including the cleanup statements in the if block and return.

+   if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+       found = true;
+
+   /* Found tuple, try to lock it in key share mode. */
+   if (found)

Since found is only assigned in one place, the two if statements can be combined into one.

Cheers

On Fri, Apr 2, 2021 at 5:46 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Updated patches attached.  Sorry about the delay.

Rebased over the recent DETACH PARTITION CONCURRENTLY work.
Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: simplifying foreign key/RI checks

From
Alvaro Herrera
Date:
On 2021-Apr-02, Amit Langote wrote:

> On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Updated patches attached.  Sorry about the delay.
> 
> Rebased over the recent DETACH PARTITION CONCURRENTLY work.
> Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

Hmm, I wonder if that stuff should be using a PartitionDirectory?  (I
didn't actually understand what your code is doing, so please forgive if
this is a silly question.)

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
Hi Alvaro,

On Sat, Apr 3, 2021 at 12:01 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2021-Apr-02, Amit Langote wrote:
>
> > On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Updated patches attached.  Sorry about the delay.
> >
> > Rebased over the recent DETACH PARTITION CONCURRENTLY work.
> > Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.
>
> Hmm, I wonder if that stuff should be using a PartitionDirectory?  (I
> didn't actually understand what your code is doing, so please forgive if
> this is a silly question.)

No problem, I wondered about that too when rebasing.

My instinct *was* that maybe there's no need for it, because
find_leaf_pk_rel()'s use of a PartitionDesc is pretty limited in
duration and scope of the kind of things it calls that there's no need
to worry about it getting invalidated while in use.  But I may be
wrong about that, because get_partition_for_tuple() can call arbitrary
user-defined functions, which may result in invalidation messages
being processed and an unguarded PartitionDesc getting wiped out under
us.

So, I've added PartitionDirectory protection in find_leaf_pk_rel() in
the attached updated version.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Fri, Apr 2, 2021 at 11:55 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
>
> +       skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
> +                                  estate->es_snapshot, estate->es_output_cid,
> +                                  lockmode, erm->waitPolicy, &epq_needed);
> +       if (skip)
>
> It seems the variable skip is only used above. The variable is not needed - if statement can directly check the
returnvalue.
 
>
> + *         Locks tuple with given TID with given lockmode following given wait
>
> given appears three times in the above sentence. Maybe the following is bit easier to read:
>
> Locks tuple with the specified TID, lockmode following given wait policy
>
> + * Checks whether a tuple containing the same unique key as extracted from the
> + * tuple provided in 'slot' exists in 'pk_rel'.
>
> I think 'same' is not needed here since the remaining part of the sentence has adequately identified the key.
>
> +       if (leaf_pk_rel == NULL)
> +           goto done;
>
> It would be better to avoid goto by including the cleanup statements in the if block and return.
>
> +   if (index_getnext_slot(scan, ForwardScanDirection, outslot))
> +       found = true;
> +
> +   /* Found tuple, try to lock it in key share mode. */
> +   if (found)
>
> Since found is only assigned in one place, the two if statements can be combined into one.

Thanks for taking a look.  I agree with most of your suggestions and
have incorporated them in the v8 just posted.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
vignesh C
Date:
On Sun, Apr 4, 2021 at 1:51 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Apr 2, 2021 at 11:55 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> >
> > Hi,
> >
> > +       skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
> > +                                  estate->es_snapshot, estate->es_output_cid,
> > +                                  lockmode, erm->waitPolicy, &epq_needed);
> > +       if (skip)
> >
> > It seems the variable skip is only used above. The variable is not needed - if statement can directly check the
returnvalue.
 
> >
> > + *         Locks tuple with given TID with given lockmode following given wait
> >
> > given appears three times in the above sentence. Maybe the following is bit easier to read:
> >
> > Locks tuple with the specified TID, lockmode following given wait policy
> >
> > + * Checks whether a tuple containing the same unique key as extracted from the
> > + * tuple provided in 'slot' exists in 'pk_rel'.
> >
> > I think 'same' is not needed here since the remaining part of the sentence has adequately identified the key.
> >
> > +       if (leaf_pk_rel == NULL)
> > +           goto done;
> >
> > It would be better to avoid goto by including the cleanup statements in the if block and return.
> >
> > +   if (index_getnext_slot(scan, ForwardScanDirection, outslot))
> > +       found = true;
> > +
> > +   /* Found tuple, try to lock it in key share mode. */
> > +   if (found)
> >
> > Since found is only assigned in one place, the two if statements can be combined into one.
>
> Thanks for taking a look.  I agree with most of your suggestions and
> have incorporated them in the v8 just posted.

The 2nd patch does not apply on Head, please post a rebased version:
error: patch failed: src/backend/utils/adt/ri_triggers.c:337
error: src/backend/utils/adt/ri_triggers.c: patch does not apply

Regards,
Vignesh



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jul 6, 2021 at 1:56 AM vignesh C <vignesh21@gmail.com> wrote:
> The 2nd patch does not apply on Head, please post a rebased version:
> error: patch failed: src/backend/utils/adt/ri_triggers.c:337
> error: src/backend/utils/adt/ri_triggers.c: patch does not apply

Thanks for the heads up.

Rebased patches attached.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:
Rebased patches attached.

I'm reviewing the changes since v6, which was my last review.

Making ExecLockTableTuple() it's own function makes sense.
Snapshots are now accounted for.
The changes that account for n-level partitioning makes sense as well.

Passes make check-world.
Not user facing, so no user documentation required.
Marking as ready for committer again.


Re: simplifying foreign key/RI checks

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> Rebased patches attached.

I've spent some more time digging into the snapshot-management angle.
I think you are right that the crosscheck_snapshot isn't really an
issue because the executor pays no attention to it for SELECT, but
that doesn't mean that there's no problem, because the test_snapshot
behavior is different too.  By my reading of it, the intention of the
existing code is to insist that when IsolationUsesXactSnapshot()
is true and we *weren't* saying detectNewRows, the query should be
restricted to only see rows visible to the transaction snapshot.
Which I think is proper: an RR transaction shouldn't be allowed to
insert referencing rows that depend on a referenced row it can't see.
On the other hand, it's reasonable for ri_Check_Pk_Match to use
detectNewRows=true, because in that case what we're doing is allowing
an RR transaction to depend on the continued existence of a PK value
that was deleted and replaced since the start of its transaction.

It appears to me that commit 71f4c8c6f (DETACH PARTITION CONCURRENTLY)
broke the semantics here, because now things work differently with a
partitioned PK table than with a plain table, thanks to not bothering
to distinguish questions of how to handle partition detachment from
questions of visibility of individual data tuples.  We evidently
haven't got test coverage for this :-(, which is perhaps not so
surprising because all this behavior long predates the isolationtester
infrastructure that would've allowed us to test it mechanically.

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

It looks like you've addressed the other complaints I raised back in
March, so that's forward progress anyway.  I do still find myself a
bit dissatisfied with the code factorization, because it seems like
find_leaf_pk_rel() doesn't belong here but rather in some partitioning
module.  OTOH, if that means exposing RI_ConstraintInfo to the world,
that wouldn't be nice either.

            regards, tom lane



Re: simplifying foreign key/RI checks

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I think we (I) should definitely pursue fixing whatever was broken by
> DETACH CONCURRENTLY, back to pg14, independently of this patch ...  but
> I would appreciate some insight into what the problem is.

Here's what I'm on about:

regression=# create table pk (f1 int primary key);
CREATE TABLE
regression=# insert into pk values(1);
INSERT 0 1
regression=# create table fk (f1 int references pk);
CREATE TABLE
regression=# begin isolation level repeatable read ;
BEGIN
regression=*# select * from pk;  -- to establish xact snapshot
 f1
----
  1
(1 row)

now, in another session, do:

regression=# insert into pk values(2);
INSERT 0 1

back at the RR transaction, we can't see that:

regression=*# select * from pk;  -- still no row 2
 f1
----
  1
(1 row)

so we get:

regression=*# insert into fk values(1);
INSERT 0 1
regression=*# insert into fk values(2);
ERROR:  insert or update on table "fk" violates foreign key constraint "fk_f1_fkey"
DETAIL:  Key (f1)=(2) is not present in table "pk".

IMO that behavior is correct.  If you use READ COMMITTED, then
SELECT can see row 2 as soon as it's committed, and so can the
FK check, and again that's correct.

In v13, the behavior is the same if "pk" is a partitioned table instead
of a plain one.  In HEAD, it's not:

regression=# drop table pk, fk;
DROP TABLE
regression=# create table pk (f1 int primary key) partition by list(f1);
CREATE TABLE
regression=# create table pk1 partition of pk for values in (1,2);
CREATE TABLE
regression=# insert into pk values(1);
INSERT 0 1
regression=# create table fk (f1 int references pk);
CREATE TABLE
regression=# begin isolation level repeatable read ;
BEGIN
regression=*# select * from pk;  -- to establish xact snapshot
 f1
----
  1
(1 row)

--- now insert row 2 in another session

regression=*# select * from pk;  -- still no row 2
 f1
----
  1
(1 row)

regression=*# insert into fk values(1);
INSERT 0 1
regression=*# insert into fk values(2);
INSERT 0 1
regression=*#

So I say that's busted, and the cause is this hunk from 71f4c8c6f:

@@ -392,11 +392,15 @@ RI_FKey_check(TriggerData *trigdata)

     /*
      * Now check that foreign key exists in PK table
+     *
+     * XXX detectNewRows must be true when a partitioned table is on the
+     * referenced side.  The reason is that our snapshot must be fresh
+     * in order for the hack in find_inheritance_children() to work.
      */
     ri_PerformCheck(riinfo, &qkey, qplan,
                     fk_rel, pk_rel,
                     NULL, newslot,
-                    false,
+                    pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
                     SPI_OK_SELECT);

     if (SPI_finish() != SPI_OK_FINISH)

I think you need some signalling mechanism that's less global than
ActiveSnapshot to tell the partition-lookup machinery what to do
in this context.

            regards, tom lane



Re: simplifying foreign key/RI checks

From
Tom Lane
Date:
I wrote:
> Anyway, I think that (1) we should write some more test cases around
> this behavior, (2) you need to establish the snapshot to use in two
> different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
> and (3) something's going to have to be done to repair the behavior
> in v14 (unless we want to back-patch this into v14, which seems a
> bit scary).

I wrote that thinking that point (2), ie fix the choice of snapshots for
these RI queries, would solve the brokenness in partitioned tables,
so that (3) would potentially only require hacking up v14.

However after thinking more I realize that (2) will break the desired
behavior for concurrent partition detaches, because that's being driven
off ActiveSnapshot.  So we really need a solution that decouples the
partition detachment logic from ActiveSnapshot, in both branches.

            regards, tom lane



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Fri, Nov 12, 2021 at 8:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > Rebased patches attached.
>
> I've spent some more time digging into the snapshot-management angle.

Thanks for looking at this.

> I think you are right that the crosscheck_snapshot isn't really an
> issue because the executor pays no attention to it for SELECT, but
> that doesn't mean that there's no problem, because the test_snapshot
> behavior is different too.  By my reading of it, the intention of the
> existing code is to insist that when IsolationUsesXactSnapshot()
> is true and we *weren't* saying detectNewRows, the query should be
> restricted to only see rows visible to the transaction snapshot.
> Which I think is proper: an RR transaction shouldn't be allowed to
> insert referencing rows that depend on a referenced row it can't see.
> On the other hand, it's reasonable for ri_Check_Pk_Match to use
> detectNewRows=true, because in that case what we're doing is allowing
> an RR transaction to depend on the continued existence of a PK value
> that was deleted and replaced since the start of its transaction.
>
> It appears to me that commit 71f4c8c6f (DETACH PARTITION CONCURRENTLY)
> broke the semantics here, because now things work differently with a
> partitioned PK table than with a plain table, thanks to not bothering
> to distinguish questions of how to handle partition detachment from
> questions of visibility of individual data tuples.  We evidently
> haven't got test coverage for this :-(, which is perhaps not so
> surprising because all this behavior long predates the isolationtester
> infrastructure that would've allowed us to test it mechanically.
>
> Anyway, I think that (1) we should write some more test cases around
> this behavior, (2) you need to establish the snapshot to use in two
> different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
> and (3) something's going to have to be done to repair the behavior
> in v14 (unless we want to back-patch this into v14, which seems a
> bit scary).

Okay, I'll look into getting 1 and 2 done for this patch and I guess
work with Alvaro on 3.

> It looks like you've addressed the other complaints I raised back in
> March, so that's forward progress anyway.  I do still find myself a
> bit dissatisfied with the code factorization, because it seems like
> find_leaf_pk_rel() doesn't belong here but rather in some partitioning
> module.  OTOH, if that means exposing RI_ConstraintInfo to the world,
> that wouldn't be nice either.

Hm yeah, fair point about the undesirability of putting partitioning
details into ri_triggers.c, so will look into refactoring to avoid
that.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Fri, Nov 12, 2021 at 10:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Anyway, I think that (1) we should write some more test cases around
> > this behavior, (2) you need to establish the snapshot to use in two
> > different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
> > and (3) something's going to have to be done to repair the behavior
> > in v14 (unless we want to back-patch this into v14, which seems a
> > bit scary).
>
> I wrote that thinking that point (2), ie fix the choice of snapshots for
> these RI queries, would solve the brokenness in partitioned tables,
> so that (3) would potentially only require hacking up v14.
>
> However after thinking more I realize that (2) will break the desired
> behavior for concurrent partition detaches, because that's being driven
> off ActiveSnapshot.  So we really need a solution that decouples the
> partition detachment logic from ActiveSnapshot, in both branches.

ISTM that the latest snapshot would still have to be passed to the
find_inheritance_children_extended() *somehow* by ri_trigger.c.  IIUC
the problem with using the ActiveSnapshot mechanism to do that is that
it causes the SPI query to see even user table rows that it shouldn't
be able to, so that is why you say it is too global a mechanism for
this hack.

Whatever mechanism we will use would still need to involve setting a
global Snapshot variable though, right?

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: simplifying foreign key/RI checks

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> Whatever mechanism we will use would still need to involve setting a
> global Snapshot variable though, right?

In v14 we'll certainly still be passing the snapshot(s) to SPI, which will
eventually make the snapshot active.  With your patch, since we're just
handing the snapshot to the scan mechanism, it seems at least
theoretically possible that we'd not have to do PushActiveSnapshot on it.
Not doing so might be a bad idea however; if there is any user-defined
code getting called, it might have expectations about ActiveSnapshot being
relevant.  On the whole I'd be inclined to say that we still want the
RI test_snapshot to be the ActiveSnapshot while performing the test. 

            regards, tom lane



Re: simplifying foreign key/RI checks

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Fri, Nov 12, 2021 at 8:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, I think that (1) we should write some more test cases around
>> this behavior, (2) you need to establish the snapshot to use in two
>> different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
>> and (3) something's going to have to be done to repair the behavior
>> in v14 (unless we want to back-patch this into v14, which seems a
>> bit scary).

> Okay, I'll look into getting 1 and 2 done for this patch and I guess
> work with Alvaro on 3.

Actually, it seems that DETACH PARTITION is broken for concurrent
serializable/repeatable-read transactions quite independently of
whether they attempt to make any FK checks [1].  If we do what
I speculated about there, namely wait out all such xacts before
detaching, it might be possible to fix (3) just by reverting the
problematic change in ri_triggers.c.  I'm thinking the wait would
render it unnecessary to get FK checks to do anything weird about
partition lookup.  But I might well be missing something.

            regards, tom lane

[1] https://www.postgresql.org/message-id/1849918.1636748862%40sss.pgh.pa.us



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Sat, Nov 13, 2021 at 5:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Fri, Nov 12, 2021 at 8:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Anyway, I think that (1) we should write some more test cases around
> >> this behavior, (2) you need to establish the snapshot to use in two
> >> different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
> >> and (3) something's going to have to be done to repair the behavior
> >> in v14 (unless we want to back-patch this into v14, which seems a
> >> bit scary).
>
> > Okay, I'll look into getting 1 and 2 done for this patch and I guess
> > work with Alvaro on 3.
>
> Actually, it seems that DETACH PARTITION is broken for concurrent
> serializable/repeatable-read transactions quite independently of
> whether they attempt to make any FK checks [1].  If we do what
> I speculated about there, namely wait out all such xacts before
> detaching, it might be possible to fix (3) just by reverting the
> problematic change in ri_triggers.c.  I'm thinking the wait would
> render it unnecessary to get FK checks to do anything weird about
> partition lookup.  But I might well be missing something.

I wasn't able to make much inroads into how we might be able to get
rid of the DETACH-related partition descriptor hacks, the item (3),
though I made some progress on items (1) and (2).

For (1), the attached 0001 patch adds a new isolation suite
fk-snapshot.spec to exercise snapshot behaviors in the cases where we
no longer go through SPI.  It helped find some problems with the
snapshot handling in the earlier versions of the patch, mainly with
partitioned PK tables.  It also contains a test along the lines of the
example you showed upthread, which shows that the partition descriptor
hack requiring ActiveSnapshot to be set results in wrong results.
Patch includes the buggy output for that test case and marked as such
in a comment above the test.

In updated 0002, I fixed things such that the snapshot-setting
required by the partition descriptor hack is independent of
snapshot-setting of the RI query such that it no longer causes the PK
index scan to return rows that the RI query mustn't see.  That fixes
the visibility bug illustrated in your example, and as mentioned, also
exercised in the new test suite.

I also moved find_leaf_pk_rel() into execPartition.c with a new name
and a new set of parameters.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:


I wasn't able to make much inroads into how we might be able to get
rid of the DETACH-related partition descriptor hacks, the item (3),
though I made some progress on items (1) and (2).

For (1), the attached 0001 patch adds a new isolation suite
fk-snapshot.spec to exercise snapshot behaviors in the cases where we
no longer go through SPI.  It helped find some problems with the
snapshot handling in the earlier versions of the patch, mainly with
partitioned PK tables.  It also contains a test along the lines of the
example you showed upthread, which shows that the partition descriptor
hack requiring ActiveSnapshot to be set results in wrong results.
Patch includes the buggy output for that test case and marked as such
in a comment above the test.

In updated 0002, I fixed things such that the snapshot-setting
required by the partition descriptor hack is independent of
snapshot-setting of the RI query such that it no longer causes the PK
index scan to return rows that the RI query mustn't see.  That fixes
the visibility bug illustrated in your example, and as mentioned, also
exercised in the new test suite.

I also moved find_leaf_pk_rel() into execPartition.c with a new name
and a new set of parameters.

--
Amit Langote
EDB: http://www.enterprisedb.com

Sorry for the delay. This patch no longer applies, it has some conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com> wrote:
> Sorry for the delay. This patch no longer applies, it has some conflict with
d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Thanks Corey for the heads up.  Rebased with some cosmetic adjustments.


--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Zhihong Yu
Date:


On Sun, Dec 19, 2021 at 10:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com> wrote:
> Sorry for the delay. This patch no longer applies, it has some conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Thanks Corey for the heads up.  Rebased with some cosmetic adjustments.

Hi, 

+       Assert(partidx < 0 || partidx < partdesc->nparts);
+       partoid = partdesc->oids[partidx];

If partidx < 0, do we still need to fill out partoid and is_leaf ? It seems we can return early based on (should call table_close(rel) first):

+       /* No partition found. */
+       if (partidx < 0)
+           return NULL;

Cheers

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Dec 20, 2021 at 6:19 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> On Sun, Dec 19, 2021 at 10:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>> > Sorry for the delay. This patch no longer applies, it has some conflict with
d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a
>>
>> Thanks Corey for the heads up.  Rebased with some cosmetic adjustments.
>>
> Hi,
>
> +       Assert(partidx < 0 || partidx < partdesc->nparts);
> +       partoid = partdesc->oids[partidx];
>
> If partidx < 0, do we still need to fill out partoid and is_leaf ? It seems we can return early based on (should call
table_close(rel)first):
 
>
> +       /* No partition found. */
> +       if (partidx < 0)
> +           return NULL;

Good catch, thanks.  Patch updated.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Corey Huinker
Date:


Good catch, thanks.  Patch updated.



Applies clean. Passes check-world.
 

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
Thanks for the review.

On Tue, Dec 21, 2021 at 5:54 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> Hi,
>
> +   int         lockflags = 0;
> +   TM_Result   test;
> +
> +   lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
>
> The above assignment can be meged with the line where variable lockflags is declared.

Sure.

> +   GetUserIdAndSecContext(&save_userid, &save_sec_context);
>
> save_userid -> saved_userid
> save_sec_context -> saved_sec_context

I agree that's better though I guess I had kept the names as they were
in other functions.

Fixed nevertheless.

> +        * the transaction-snapshot mode.  If we didn't push one already, do
>
> didn't push -> haven't pushed

Done.

> For ri_PerformCheck():
>
> +   bool        source_is_pk = true;
>
> It seems the value of source_is_pk doesn't change - the value true can be plugged into ri_ExtractValues() calls
directly.

OK, done.

v13 is attached.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Tue, Jan 18, 2022 at 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
> v13 is attached.

I noticed that the recent 641f3dffcdf's changes to
get_constraint_index() made it basically unusable for this patch's
purposes.

Reading in the thread that led to 641f3dffcdf why
get_constraint_index() was changed the way it was, I invented in the
attached updated patch a get_fkey_constraint_index() that is local to
ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
get_constraint_index() that no longer gives it the index it's looking
for.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Zhihong Yu
Date:


On Mon, Mar 14, 2022 at 1:33 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Jan 18, 2022 at 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
> v13 is attached.

I noticed that the recent 641f3dffcdf's changes to
get_constraint_index() made it basically unusable for this patch's
purposes.

Reading in the thread that led to 641f3dffcdf why
get_constraint_index() was changed the way it was, I invented in the
attached updated patch a get_fkey_constraint_index() that is local to
ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
get_constraint_index() that no longer gives it the index it's looking
for.

--
Amit Langote
EDB: http://www.enterprisedb.com
Hi,
+                   partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false); 

The above can be shortened as:

  partkey_isnull[j] = key_nulls[k] == 'n';

+        * May neeed to cast each of the individual values of the foreign key

neeed -> need

Cheers

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Mar 14, 2022 at 6:28 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> On Mon, Mar 14, 2022 at 1:33 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Tue, Jan 18, 2022 at 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > v13 is attached.
>>
>> I noticed that the recent 641f3dffcdf's changes to
>> get_constraint_index() made it basically unusable for this patch's
>> purposes.
>>
>> Reading in the thread that led to 641f3dffcdf why
>> get_constraint_index() was changed the way it was, I invented in the
>> attached updated patch a get_fkey_constraint_index() that is local to
>> ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
>> get_constraint_index() that no longer gives it the index it's looking
>> for.
>>
>
> Hi,
> +                   partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);
>
> The above can be shortened as:
>
>   partkey_isnull[j] = key_nulls[k] == 'n';
>
> +        * May neeed to cast each of the individual values of the foreign key
>
> neeed -> need

Both fixed, thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
There were rebase conflicts with the recently committed
execPartition.c/h changes.  While fixing them, I thought maybe
find_leaf_part_for_key() doesn't quite match in style with its
neighbors in execPartition.h, so changed it to
ExecGetLeafPartitionForKey().

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Thu, Apr 7, 2022 at 10:05 AM Amit Langote <amitlangote09@gmail.com> wrote:
> There were rebase conflicts with the recently committed
> execPartition.c/h changes.  While fixing them, I thought maybe
> find_leaf_part_for_key() doesn't quite match in style with its
> neighbors in execPartition.h, so changed it to
> ExecGetLeafPartitionForKey().

This one has been marked Returned with Feedback in the CF app, which
makes sense given the discussion on -committers [1].

Agree with the feedback given that it would be better to address *all*
RI trigger check/action functions in the project of sidestepping SPI
when doing those checks/actions, not only RI_FKey_check_ins / upd() as
the current patch does.  I guess that will require thinking a little
bit harder about how to modularize the new implementation so that the
various trigger functions don't end up with their own bespoke
check/action implementations.

I'll think about that, also consider what Corey proposed in [2], and
try to reformulate this for v16.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/flat/E1ncXX2-000mFt-Pe%40gemulon.postgresql.org
[2]
https://www.postgresql.org/message-id/flat/CADkLM%3DeZJddpx6RDop-oCrQ%2BJ9R-wfbf6MoLxUUGjbpwTkoUXQ%40mail.gmail.com



Re: simplifying foreign key/RI checks

From
Amit Langote
Date:
On Mon, Apr 11, 2022 at 4:47 PM Amit Langote <amitlangote09@gmail.com> wrote:
> This one has been marked Returned with Feedback in the CF app, which
> makes sense given the discussion on -committers [1].
>
> Agree with the feedback given that it would be better to address *all*
> RI trigger check/action functions in the project of sidestepping SPI
> when doing those checks/actions, not only RI_FKey_check_ins / upd() as
> the current patch does.  I guess that will require thinking a little
> bit harder about how to modularize the new implementation so that the
> various trigger functions don't end up with their own bespoke
> check/action implementations.
>
> I'll think about that, also consider what Corey proposed in [2], and
> try to reformulate this for v16.

I've been thinking about this and wondering if the SPI overhead is too
big in the other cases (cases where it is the FK table that is to be
scanned) that it makes sense to replace the actual planner (invoked
via SPI) by a hard-coded mini-planner for the task of figuring out the
best way to scan the FK table for a given PK row affected by the main
query.  Planner's involvement seems necessary in those cases, because
the choice of how to scan the FK table is not as clear cut as how to
scan the PK table.

ISTM, the SPI overhead consists mainly of performing GetCachedPlan()
and executor setup/shutdown, which can seem substantial when compared
to the core task of scanning the PK/FK table, and does add up over
many rows affected by the main query, as seen by the over 2x speedup
for the PK table case gained by shaving it off with the proposed patch
[1].  In the other cases, the mini-planner will need some cycles of
its own, even though maybe not as many as by the use of SPI, so the
speedup might be less impressive.

Other than coming up with an acceptable implementation for the
mini-planner (maybe we have an example in plan_cluster_use_sort() to
ape), one more challenge is to figure out a way to implement the
CASCADE/SET trigger routines.  For those, we might need to introduce
restricted forms of ExecUpdate(), ExecDelete() that can be called
directly, that is, without a full-fledged plan.  Not having to worry
about those things does seem like a benefit of just continuing to use
the SPI in those cases.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]
drop table pk, fk;
create table pk (a int primary key);
create table fk (a int references pk);
insert into pk select generate_series(1, 1000000);
insert into fk select i%1000000+1 from generate_series(1, 10000000) i;

Time for the last statement:

HEAD: 67566.845 ms (01:07.567)

Patched: 26759.627 ms (00:26.760)