Thread: RE: Foreign Key locking / deadlock issue.... v2
OK,
Let's try again, with a full script, and including the bit that makes the difference…
Hi,
I’m trying to understand why I’m getting a deadlock issue, and how to work around it.
At base, I think the problem is:
1. Updates to a parent table are creating row level write locks.
2. updates to a child table set the foreign key value to the parent table, which are then blocked – because there is Row Level Security on the parent table.
-- The setup
------------
drop table if exists ELN;
drop table if exists PL;
Create table PL
(pl_id integer,
m_number text
);
alter table PL ENABLE row level security;
alter table PL FORCE row level security;
-- A dummy security policy, that allows everybody access to everything
drop policy if exists security_policy on PL ;
create policy security_policy on PL TO public using (true); -- (1)
Alter table PL add constraint PL_PK primary key (pl_id);
Insert into PL values (1, null);
Insert into PL values (2, null);
Insert into PL values (3, null);
Create table ELN
(event_id integer,
pl_id integer
);
Alter table ELN add constraint ELN_PK primary key (event_id);
Alter table ELN add constraint ELN_PL_FK foreign key (pl_id) references PL (pl_id);
Insert into ELN values (301, null);
Insert into ELN values (302, null);
Insert into ELN values (303, null);
-- Parent table processing – long running process
------------------------------
start transaction;
update pl set m_number = '234' where pl_id = 2; -- (2)
update pl set m_number = '345' where pl_id = 3; -- (3)
-- …. Pause while other processing happens …..
(commit;)
-- Child table processing – occurs often & quickly. Starts after parent update.
----------------------------
start transaction;
update eln set pl_id = 3 where event_id = 303; -- (4)
update eln set pl_id = 2 where event_id = 302; -- (5)
-- blocked by parent Update.
(commit;)
I think that the RLS on the parent table is upgrading the row lock, even if the RLS policy doesn’t do anything.
I can understand that the RLS will normally want access to fields in the row to validate the user request – but it has access to those anyway, for the update.
The RLS documentation says that Foreign Keys bypass RLS, so RLS on the parent shouldn’t make any difference to it.
If the RLS policy at (1) is dropped, or RLS disabled on table PL, the problem goes away.
With RLS, statement (4) is blocked by statement (3).
If the parent updates interleave with child updates, deadlock occurs, as statement (2) is also blocked by (4).
Is the RLS responsible for a lock upgrade?
Can I get around it, without disabling RLS for the update process on PL?
Thanks.
Phil Horder
Database Mechanic
Thales
Land and Air Systems
Horizon House, Throop Road, Templecombe, Somerset, BA8 0DH, UK
Hello Phil, On Wed, 2018-03-21 at 15:26 +0000, HORDER Phil wrote: > OK, > Let's try again, with a full script, and including the bit that makes > the difference… > > Hi, > I’m trying to understand why I’m getting a deadlock issue, and how to > work around it. > > At base, I think the problem is: > 1. Updates to a parent table are creating row level write > locks. > 2. updates to a child table set the foreign key value to > the parent table, which are then blocked – because there is Row Level > Security on the parent table. > > > > -- The setup > ------------ > > drop table if exists ELN; > drop table if exists PL; > > Create table PL > (pl_id integer, > m_number text > ); > > > alter table PL ENABLE row level security; > alter table PL FORCE row level security; > > -- A dummy security policy, that allows everybody access to > everything > drop policy if exists security_policy on PL ; I don't use row level security but after reading the documentation, I'd alter this:- > create policy security_policy on PL TO public using (true); -- (1) to CREATE POLICY security_policy on PL FOR ALL TO PUBLIC USING (true) WITH CHECK (true); and trying again. HTH, Rob
Thanks Rob, The security policy in the example was reduced to the bare minimum, to demonstrate the issue was with the existence of thepolicy, not what it did. Obviously my real-application policy does much more, and uses other columns to do both read & write checks. Adding the write check as suggested makes no difference to the demo - the blocking lock still occurs. Phil Horder Database Mechanic -----Original Message----- From: rob stone [mailto:floriparob@gmail.com] Sent: 22 March 2018 11:11 To: HORDER Phil; Adrian Klaver Cc: pgsql-general Subject: Re: Foreign Key locking / deadlock issue.... v2 Hello Phil, On Wed, 2018-03-21 at 15:26 +0000, HORDER Phil wrote: > OK, > Let's try again, with a full script, and including the bit that makes > the difference… > > Hi, > I’m trying to understand why I’m getting a deadlock issue, and how to > work around it. > > At base, I think the problem is: > 1. Updates to a parent table are creating row level write > locks. > 2. updates to a child table set the foreign key value to > the parent table, which are then blocked – because there is Row Level > Security on the parent table. > > > > -- The setup > ------------ > > drop table if exists ELN; > drop table if exists PL; > > Create table PL > (pl_id integer, > m_number text > ); > > > alter table PL ENABLE row level security; alter table PL FORCE row > level security; > > -- A dummy security policy, that allows everybody access to everything > drop policy if exists security_policy on PL ; I don't use row level security but after reading the documentation, I'd alter this:- > create policy security_policy on PL TO public using (true); -- (1) to CREATE POLICY security_policy on PL FOR ALL TO PUBLIC USING (true) WITH CHECK (true); and trying again. HTH, Rob
Hello Phil, I've run your sample script on 9.6.5 and 10.3. The only thing that I added was a commit; after the initial inserts just to ensure the rows were saved. No errors were reported for either version. The output of \dp after running was:- Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+------+-------+-------------------+-------------------+------- ----------- public | eln | table | | | public | pl | table | | | security_policy:+ | | | | | (u): true --> including the FOR ALL in the create policy statement as well as WITH CHECK(true). Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+------+-------+-------------------+-------------------+------- ----------- public | eln | table | | | public | pl | table | | | security_policy:+ | | | | | (u): true + | | | | | (c): true The only mystery is what happens here:- <snip> -- …. Pause while other processing happens ….. (commit;) -- Child table processing – occurs often & quickly. Starts after parent update. <\snip> I'd like to know more about RLS and trying to de-bug your script. On a production application you'd be testing for errors and raising exceptions so as to inform users that a problem occurred. So, without knowing what occurs during "Pause while other processing happens" I can't help any further. Cheers, Rob
Rob, thanks for looking.
The "pause" is only to not-do-the-commit yet, so that the child process can then try and access the record - I've not left anything out.
This code is my own demo, not a cut from our production code.
Did you run this as the 'postgres' superuser? That would bypass the RLS, and probably avoid the problem.
I checked by creating a new user, and ran my code in that:
Sql> create user test password 'password';
After running my test script, psql \dp shows:
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+-------------------+-------+---------------------------------+-------------------+------------------
public | eln | table | | |
public | pl | table | | | security_policy:+
| | | | | (u): true +
| | | | | (c): true
(plus some other stuff for postGIS)
Here’s my code again:
drop table if exists ELN;
drop table if exists PL;
Create table PL
(pl_id integer,
m_number text
);
alter table PL ENABLE row level security;
alter table PL FORCE row level security;
drop policy if exists security_policy on PL ;
CREATE POLICY security_policy on PL FOR ALL TO PUBLIC USING (true) WITH CHECK (true);
Alter table PL add constraint PL_PK primary key (pl_id);
Insert into PL values (1, null);
Insert into PL values (2, null);
Insert into PL values (3, null);
Create table ELN
(event_id integer,
pl_id integer
);
Alter table ELN add constraint ELN_PK primary key (event_id);
Alter table ELN add constraint ELN_PL_FK foreign key (pl_id) references PL (pl_id);
Insert into ELN values (301, null);
Insert into ELN values (302, null);
Insert into ELN values (303, null);
commit;
-- process 1:
start transaction;
update pl set m_number = '234' where pl_id = 2;
update pl set m_number = '345' where pl_id = 3;
-- wait here, and run process 2
commit;
-- process 2:
start transaction;
update eln set pl_id = 3 where event_id = 303;
update eln set pl_id = 2 where event_id = 302;
-- Gets blocked by process 1
commit;
Phil Horder
Database Mechanic
-----Original Message-----
From: rob stone [mailto:floriparob@gmail.com]
Sent: 23 March 2018 11:43
To: HORDER Phil; pgsql-general
Subject: Re: Foreign Key locking / deadlock issue.... v2
Hello Phil,
I've run your sample script on 9.6.5 and 10.3.
The only thing that I added was a commit; after the initial inserts just to ensure the rows were saved.
No errors were reported for either version.
The output of \dp after running was:-
Access privileges Schema | Name | Type | Access privileges | Column privileges
| Policies
--------+------+-------+-------------------+-------------------+-------
-----------
public | eln | table | | |
public | pl | table | | |
security_policy:+
| | | | | (u):
true
--> including the FOR ALL in the create policy statement as well as
WITH CHECK(true).
Access privileges Schema | Name | Type | Access privileges | Column privileges
| Policies
--------+------+-------+-------------------+-------------------+-------
-----------
public | eln | table | | |
public | pl | table | | |
security_policy:+
| | | | | (u):
true +
| | | | | (c):
true
The only mystery is what happens here:-
<snip>
-- …. Pause while other processing happens …..
(commit;)
-- Child table processing – occurs often & quickly. Starts after parent update.
<\snip>
I'd like to know more about RLS and trying to de-bug your script.
On a production application you'd be testing for errors and raising exceptions so as to inform users that a problem occurred.
So, without knowing what occurs during "Pause while other processing happens" I can't help any further.
Cheers,
Rob
Hello Phil, On Fri, 2018-03-23 at 15:42 +0000, HORDER Phil wrote: > Rob, thanks for looking. > > The "pause" is only to not-do-the-commit yet, so that the child > process can then try and access the record - I've not left anything > out. > This code is my own demo, not a cut from our production code. > > Did you run this as the 'postgres' superuser? That would bypass the > RLS, and probably avoid the problem. No. Never use super user to test scripts. > > I checked by creating a new user, and ran my code in that: > > Sql> create user test password 'password'; > > After running my test script, psql \dp shows: > > Schema | Name | Type | Access privileges > | Column privileges | Policies > --------+-------------------+-------+------------------------------ > ---+-------------------+------------------ > public | eln | table | > | | > public | pl | table | > | | security_policy:+ > | | | > | | (u): true + > | | | > | | (c): true > > (plus some other stuff for postGIS) > > Here’s my code again: > > drop table if exists ELN; > drop table if exists PL; > > Create table PL > (pl_id integer, > m_number text > ); > > alter table PL ENABLE row level security; > alter table PL FORCE row level security; > > drop policy if exists security_policy on PL ; > CREATE POLICY security_policy on PL FOR ALL TO PUBLIC USING (true) > WITH CHECK (true); > > Alter table PL add constraint PL_PK primary key (pl_id); > Insert into PL values (1, null); > Insert into PL values (2, null); > Insert into PL values (3, null); > > Create table ELN > (event_id integer, > pl_id integer > ); > > > Alter table ELN add constraint ELN_PK primary key (event_id); > Alter table ELN add constraint ELN_PL_FK foreign key (pl_id) > references PL (pl_id); > > Insert into ELN values (301, null); > Insert into ELN values (302, null); > Insert into ELN values (303, null); > > commit; > > -- process 1: > start transaction; > update pl set m_number = '234' where pl_id = 2; > update pl set m_number = '345' where pl_id = 3; > > -- wait here, and run process 2 This was never explained. You are running "process 2" in an entirely different session???? If that is true, why don't you commit the updates to table pl and release the locks? The fact that eln.pl_id is a foreign key is irrelevant. If you tried to alter eln.pl_id to a value that doesn't exist in pl.pl_id will cause an exception to occur due to the constraint. Cheers, Rob > commit; > > > -- process 2: > start transaction; > update eln set pl_id = 3 where event_id = 303; > update eln set pl_id = 2 where event_id = 302; > -- Gets blocked by process 1 > commit; > > > Phil Horder > Database Mechanic > > -----Original Message----- > From: rob stone [mailto:floriparob@gmail.com] > Sent: 23 March 2018 11:43 > To: HORDER Phil; pgsql-general > Subject: Re: Foreign Key locking / deadlock issue.... v2 > > Hello Phil, > > I've run your sample script on 9.6.5 and 10.3. > The only thing that I added was a commit; after the initial inserts > just to ensure the rows were saved. > No errors were reported for either version. > > The output of \dp after running was:- > > Access privileges Schema | Name | > Type | Access privileges | Column privileges > | Policies > --------+------+-------+-------------------+-------------------+----- > -- > ----------- > public | eln | table | | | > public | pl | table | | | > security_policy:+ > | | | | | > (u): > true > > > --> including the FOR ALL in the create policy statement as well as > WITH CHECK(true). > > Access privileges Schema | Name | > Type | Access privileges | Column privileges > | Policies > --------+------+-------+-------------------+-------------------+----- > -- > ----------- > public | eln | table | | | > public | pl | table | | | > security_policy:+ > | | | | | > (u): > true + > | | | | | > (c): > true > > > The only mystery is what happens here:- > > <snip> > > -- …. Pause while other processing happens ….. > (commit;) > > -- Child table processing – occurs often & quickly. Starts after > parent update. > > <\snip> > > > I'd like to know more about RLS and trying to de-bug your script. > > On a production application you'd be testing for errors and raising > exceptions so as to inform users that a problem occurred. > > So, without knowing what occurs during "Pause while other processing > happens" I can't help any further. > > Cheers, > Rob > >
> This was never explained. You are running "process 2" in an entirely different session???? Yes, two processes are required to get a deadlock. > If that is true, why don't you commit the updates to table pl and release the locks? It's a long story... but I can't change it at the moment, the PL update is a single long-running batch process, which shouldn'tget in the way of the ELN process. > The fact that eln.pl_id is a foreign key is irrelevant. If you tried to alter eln.pl_id to a value that doesn't exist inpl.pl_id will cause an exception to occur due to the constraint. I don't see how this is irrelevant, it's the FK check against the PL table that is causing the lock contention. We're not getting FK violations - the data is being constructed correctly. The issue I have is that the FK check is being blocked by an apparently upgraded lock on the PL table, caused by the RLSpolicy. I'm running Postgres 9.6.1 (I think) Phil Horder Database Mechanic -----Original Message----- From: rob stone [mailto:floriparob@gmail.com] Sent: 24 March 2018 11:30 To: HORDER Phil; pgsql-general Subject: Re: Foreign Key locking / deadlock issue.... v2 Hello Phil, On Fri, 2018-03-23 at 15:42 +0000, HORDER Phil wrote: > Rob, thanks for looking. > > The "pause" is only to not-do-the-commit yet, so that the child > process can then try and access the record - I've not left anything > out. > This code is my own demo, not a cut from our production code. > > Did you run this as the 'postgres' superuser? That would bypass the > RLS, and probably avoid the problem. No. Never use super user to test scripts. > > I checked by creating a new user, and ran my code in that: > > Sql> create user test password 'password'; > > After running my test script, psql \dp shows: > > Schema | Name | Type | Access privileges > | Column privileges | Policies > --------+-------------------+-------+------------------------------ > ---+-------------------+------------------ > public | eln | table | > | | > public | pl | table | > | | security_policy:+ > | | | > | | (u): true + > | | | > | | (c): true > > (plus some other stuff for postGIS) > > Here’s my code again: > > drop table if exists ELN; > drop table if exists PL; > > Create table PL > (pl_id integer, > m_number text > ); > > alter table PL ENABLE row level security; alter table PL FORCE row > level security; > > drop policy if exists security_policy on PL ; CREATE POLICY > security_policy on PL FOR ALL TO PUBLIC USING (true) WITH CHECK > (true); > > Alter table PL add constraint PL_PK primary key (pl_id); Insert into > PL values (1, null); Insert into PL values (2, null); Insert into PL > values (3, null); > > Create table ELN > (event_id integer, > pl_id integer > ); > > > Alter table ELN add constraint ELN_PK primary key (event_id); Alter > table ELN add constraint ELN_PL_FK foreign key (pl_id) references PL > (pl_id); > > Insert into ELN values (301, null); > Insert into ELN values (302, null); > Insert into ELN values (303, null); > > commit; > > -- process 1: > start transaction; > update pl set m_number = '234' where pl_id = 2; > update pl set m_number = '345' where pl_id = 3; > > -- wait here, and run process 2 This was never explained. You are running "process 2" in an entirely different session???? If that is true, why don't you commit the updates to table pl and release the locks? The fact that eln.pl_id is a foreign key is irrelevant. If you tried to alter eln.pl_id to a value that doesn't exist inpl.pl_id will cause an exception to occur due to the constraint. Cheers, Rob > commit; > > > -- process 2: > start transaction; > update eln set pl_id = 3 where event_id = 303; > update eln set pl_id = 2 where event_id = 302; > -- Gets blocked by process 1 > commit; > > > Phil Horder > Database Mechanic > > -----Original Message----- > From: rob stone [mailto:floriparob@gmail.com] > Sent: 23 March 2018 11:43 > To: HORDER Phil; pgsql-general > Subject: Re: Foreign Key locking / deadlock issue.... v2 > > Hello Phil, > > I've run your sample script on 9.6.5 and 10.3. > The only thing that I added was a commit; after the initial inserts > just to ensure the rows were saved. > No errors were reported for either version. > > The output of \dp after running was:- > > Access privileges Schema | Name | > Type | Access privileges | Column privileges > | Policies > --------+------+-------+-------------------+-------------------+----- > -- > ----------- > public | eln | table | | | > public | pl | table | | | > security_policy:+ > | | | | | > (u): > true > > > --> including the FOR ALL in the create policy statement as well as > WITH CHECK(true). > > Access privileges Schema | Name | Type > | Access privileges | Column privileges > | Policies > --------+------+-------+-------------------+-------------------+----- > -- > ----------- > public | eln | table | | | > public | pl | table | | | > security_policy:+ > | | | | | > (u): > true + > | | | | | > (c): > true > > > The only mystery is what happens here:- > > <snip> > > -- …. Pause while other processing happens ….. > (commit;) > > -- Child table processing – occurs often & quickly. Starts after > parent update. > > <\snip> > > > I'd like to know more about RLS and trying to de-bug your script. > > On a production application you'd be testing for errors and raising > exceptions so as to inform users that a problem occurred. > > So, without knowing what occurs during "Pause while other processing > happens" I can't help any further. > > Cheers, > Rob > >
On 03/24/2018 11:03 AM, HORDER Phil wrote: > I'm running Postgres 9.6.1 (I think) To find out for sure do: psql> select version(); > > > Phil Horder > Database Mechanic -- Adrian Klaver adrian.klaver@aklaver.com
psql> select version(); PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit