Thread: Partitioning/inherited tables vs FKs

Partitioning/inherited tables vs FKs

From
Boszormenyi Zoltan
Date:
Hi,

we came across an interesting problem.

=# create table parent (id serial primary key, t text);
NOTICE:  CREATE TABLE will create implicit sequence "parent_id_seq" for
serial column "parent.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"parent_pkey" for table "parent"
CREATE TABLE
=# create table child () inherits (parent);
CREATE TABLE
=# create table refer (id serial primary key, parent_id integer
references parent (id));
NOTICE:  CREATE TABLE will create implicit sequence "refer_id_seq" for
serial column "refer.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"refer_pkey" for table "refer"
CREATE TABLE
=# begin;
BEGIN
=# insert into child (t) values ('a') returning id;id
---- 1
(1 sor)

INSERT 0 1
=# select * from parent;id | t
----+--- 1 | a
(1 sor)

=# insert into refer (parent_id) values (1);
ERROR:  insert or update on table "refer" violates foreign key
constraint "refer_parent_id_fkey"
DETAIL:  Key (parent_id)=(1) is not present in table "parent".

The use case for this was there were different news items,
and there were another table for summaries, that could point
to any of the news items table. Another use case could be
a large partitioned table with an FK to the main table where
the referring table might only contain very few "interesting" data.

No matter what are the semantics, the parent table in the
inheritance chain cannot be used as and endpoint for FKs.

Is it a bug, or intentional?

The only solution currently is that the referring table has to be
partitioned the same way as the referred table in the FK, and
its parent table has to be queried.

Best regards,
Zoltán Böszörményi

-- 
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/



Re: Partitioning/inherited tables vs FKs

From
Florian Pflug
Date:
On May 6, 2010, at 10:52 , Boszormenyi Zoltan wrote:
> =# create table parent (id serial primary key, t text);
> ...
> =# create table child () inherits (parent);
> ...
> =# create table refer (id serial primary key, parent_id integer
> ...
> =# insert into child (t) values ('a') returning id;
> ...
> =# select * from parent;
> id | t
> ----+---
>  1 | a
> (1 sor)
>
> =# insert into refer (parent_id) values (1);
> ERROR:  insert or update on table "refer" violates foreign key
> constraint "refer_parent_id_fkey"
> DETAIL:  Key (parent_id)=(1) is not present in table "parent".
>
> The use case for this was there were different news items,
> and there were another table for summaries, that could point
> to any of the news items table. Another use case could be
> a large partitioned table with an FK to the main table where
> the referring table might only contain very few "interesting" data.

Yeah, this is a long-standing issue with inheritance. Table inheritance in postgres isn't much more than an implicit
UNIONdone on selects plus some logic in ALTER TABLE to keep propagate structural changes. Indices and constraints
basicallyalways behave as if ONLY had been specified. I'm not even sure if the ids are globally unique in your example
-it might be that each child's "id serial" column gets its very own sequence. 

One possible workaround is no create a table, say referred_ids, that contains all the ids from parent and all of its
children,kept up-to-date via triggers, and point the FK constraint to that table. That also allows for a global unique
constrainton the ids by definition a suitable unique or primary key constraint on referred_ids. 

What lies at the heart of this problem is the lack of multi-table indices and hence multi-table unique constraints in
postgres.AFAIK with those in place the rest amounts to the removal of ONLY from the constraint check queries plus some
codeto propagate constraint triggers to child tables. 

best regards,
Florian Pflug


Re: Partitioning/inherited tables vs FKs

From
Jaime Casanova
Date:
2010/5/6 Boszormenyi Zoltan <zb@cybertec.at>:
>
> =# insert into refer (parent_id) values (1);
> ERROR:  insert or update on table "refer" violates foreign key
> constraint "refer_parent_id_fkey"
> DETAIL:  Key (parent_id)=(1) is not present in table "parent".
>
> The use case for this was there were different news items,
> and there were another table for summaries, that could point
> to any of the news items table. Another use case could be
> a large partitioned table with an FK to the main table where
> the referring table might only contain very few "interesting" data.
>
> No matter what are the semantics, the parent table in the
> inheritance chain cannot be used as and endpoint for FKs.
>
> Is it a bug, or intentional?

i would call it a bug, but this is a known issue

>
> The only solution currently is that the referring table has to be
> partitioned the same way as the referred table in the FK, and
> its parent table has to be queried.
>

no, you can install a trigger on the child table that verifies the
existence of the id on your partitioned parent table, the SELECT
you'll use inside that trigger will look at the entire set of tables
(as long as you don't use FROM ONLY)

also could be useful to put an index (even a PK) on every child to
ensure uniqueness and make the SELECT more efficient, and of course a
check constraint in every child emulating a partition key

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL


Re: Partitioning/inherited tables vs FKs

From
Robert Haas
Date:
On Thu, May 6, 2010 at 6:37 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> i would call it a bug, but this is a known issue
>
>>
>> The only solution currently is that the referring table has to be
>> partitioned the same way as the referred table in the FK, and
>> its parent table has to be queried.
>>
>
> no, you can install a trigger on the child table that verifies the
> existence of the id on your partitioned parent table, the SELECT
> you'll use inside that trigger will look at the entire set of tables
> (as long as you don't use FROM ONLY)
>
> also could be useful to put an index (even a PK) on every child to
> ensure uniqueness and make the SELECT more efficient, and of course a
> check constraint in every child emulating a partition key

The referential integrity triggers contain some extra magic that isn't
easily simulatable in userland, and that is necessary to make the
foreign key constraints airtight.  We've discussed this previously but
I don't remember which thread it was or the details of when things
blow up.  I think it's something like this: the parent has a tuple
that is not referenced by any child.  Transaction 1 begins, deletes
the parent tuple (checking that it has no children), and pauses.
Transaction 2 begins, adds a child tuple that references the parent
tuple (checking that the parent exists, which it does), and commits.
Transaction 1 commits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Partitioning/inherited tables vs FKs

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> What lies at the heart of this problem is the lack of multi-table
> indices and hence multi-table unique constraints in postgres. AFAIK
> with those in place the rest amounts to the removal of ONLY from the
> constraint check queries plus some code to propagate constraint
> triggers to child tables.

Well, the lack of multi-table indexes certainly is the heart of the
problem, but I'm not sure that inventing such a thing is the solution.
Quite aside from the implementation difficulties involved in it,
doing things that way would destroy some of the major reasons to
partition tables at all:

* the index grows as the size of the total data set, it's not limited
by partition size

* can't cheaply drop one partition any more, you have to vacuum the
(big) index first

* probably some other things I'm not thinking of at the moment.

I think the real solution is to upgrade the partitioning infrastructure
so that we can understand that columns are unique across the whole
partitioned table, when the partitioning is done on that column and each
partition has a unique index.
        regards, tom lane


Re: Partitioning/inherited tables vs FKs

From
Florian Pflug
Date:
On May 6, 2010, at 16:38 , Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> What lies at the heart of this problem is the lack of multi-table
>> indices and hence multi-table unique constraints in postgres. AFAIK
>> with those in place the rest amounts to the removal of ONLY from the
>> constraint check queries plus some code to propagate constraint
>> triggers to child tables.
>
> Well, the lack of multi-table indexes certainly is the heart of the
> problem, but I'm not sure that inventing such a thing is the solution.
> Quite aside from the implementation difficulties involved in it,
> doing things that way would destroy some of the major reasons to
> partition tables at all:
>
> * the index grows as the size of the total data set, it's not limited
> by partition size
>
> * can't cheaply drop one partition any more, you have to vacuum the
> (big) index first
>
> * probably some other things I'm not thinking of at the moment.
>
> I think the real solution is to upgrade the partitioning infrastructure
> so that we can understand that columns are unique across the whole
> partitioned table, when the partitioning is done on that column and each
> partition has a unique index.

True, for partitioned tables multi-table indices reintroduce some of the performance problems that partitioning is
supposedto avoid. 

But OTOH if you use table inheritance as a means to map data models (e.g. EER) more naturally to SQL, then multi-table
indiceshave advantages over the partitioning-friendly solution you sketched above. 

With a multi-table index, SELECT * FROM PARENT WHERE ID=?? has complexity LOG(N*M) where M is the number of tables
inheritingfrom PARENT (including PARENT itself), and N the average number of rows in these tables. With one index per
child,the complexity is M*LOG(N) which is significantly higher if M is large. Constraint exclusion could reduce that to
LOG(N),but only if each child is has it's own private ID range which precludes ID assignment from a global sequence and
hencemakes ID assignment much more complex and error-prone. 

Anyway, I was wondering why we need guaranteed uniqueness for FK relationships anyway. Because if we don't (which I
didn'tcheck prior to posting this I must admit), then why can't we simply remove the "ONLY" from the RI queries and let
ALTERTABLE attach the RI triggers not only to the parent but also to all children. What am I missing? 

best regards,
Florian Pflug


Re: Partitioning/inherited tables vs FKs

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> Anyway, I was wondering why we need guaranteed uniqueness for FK
> relationships anyway.

It's required by spec, and the semantics aren't terribly sensible
without it.
        regards, tom lane


Re: Partitioning/inherited tables vs FKs

From
Dmitry Fefelov
Date:
> The referential integrity triggers contain some extra magic that isn't
> easily simulatable in userland, and that is necessary to make the
> foreign key constraints airtight.  We've discussed this previously but
> I don't remember which thread it was or the details of when things
> blow up.  I think it's something like this: the parent has a tuple
> that is not referenced by any child.  Transaction 1 begins, deletes
> the parent tuple (checking that it has no children), and pauses.
> Transaction 2 begins, adds a child tuple that references the parent
> tuple (checking that the parent exists, which it does), and commits.
> Transaction 1 commits.

Will SELECT ... FOR SHARE not help?

Regargs, 
Dmitry


Re: Partitioning/inherited tables vs FKs

From
Robert Haas
Date:
On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy@ac-sw.com> wrote:
>> The referential integrity triggers contain some extra magic that isn't
>> easily simulatable in userland, and that is necessary to make the
>> foreign key constraints airtight.  We've discussed this previously but
>> I don't remember which thread it was or the details of when things
>> blow up.  I think it's something like this: the parent has a tuple
>> that is not referenced by any child.  Transaction 1 begins, deletes
>> the parent tuple (checking that it has no children), and pauses.
>> Transaction 2 begins, adds a child tuple that references the parent
>> tuple (checking that the parent exists, which it does), and commits.
>> Transaction 1 commits.
>
> Will SELECT ... FOR SHARE not help?

Try it, with the example above.  I think you'll find that it doesn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Partitioning/inherited tables vs FKs

From
Marko Tiikkaja
Date:
On 2010-05-11 14:29 +0200, Robert Haas wrote:
> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy@ac-sw.com> wrote:
>>> The referential integrity triggers contain some extra magic that isn't
>>> easily simulatable in userland, and that is necessary to make the
>>> foreign key constraints airtight.  We've discussed this previously but
>>> I don't remember which thread it was or the details of when things
>>> blow up.  I think it's something like this: the parent has a tuple
>>> that is not referenced by any child.  Transaction 1 begins, deletes
>>> the parent tuple (checking that it has no children), and pauses.
>>> Transaction 2 begins, adds a child tuple that references the parent
>>> tuple (checking that the parent exists, which it does), and commits.
>>> Transaction 1 commits.
>>
>> Will SELECT ... FOR SHARE not help?
> 
> Try it, with the example above.  I think you'll find that it doesn't.

TXA => delete from foo;
DELETE 1

TXB => select a from foo for share; -- waits

What am I missing?


Regards,
Marko Tiikkaja


Re: Partitioning/inherited tables vs FKs

From
Nicolas Barbier
Date:
2010/5/11 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>:

> On 2010-05-11 14:29 +0200, Robert Haas wrote:
>
>> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy@ac-sw.com> wrote:
>>
>>>> The referential integrity triggers contain some extra magic that isn't
>>>> easily simulatable in userland, and that is necessary to make the
>>>> foreign key constraints airtight.  We've discussed this previously but
>>>> I don't remember which thread it was or the details of when things
>>>> blow up.  I think it's something like this: the parent has a tuple
>>>> that is not referenced by any child.  Transaction 1 begins, deletes
>>>> the parent tuple (checking that it has no children), and pauses.
>>>> Transaction 2 begins, adds a child tuple that references the parent
>>>> tuple (checking that the parent exists, which it does), and commits.
>>>> Transaction 1 commits.
>>>
>>> Will SELECT ... FOR SHARE not help?
>>
>> Try it, with the example above.  I think you'll find that it doesn't.
>
> TXA => delete from foo;
> DELETE 1
>
> TXB => select a from foo for share; -- waits
>
> What am I missing?

Slightly verbose example of what can go wrong:

CREATE TABLE a (i int PRIMARY KEY);
INSERT INTO a VALUES (1);

CREATE TABLE b (a_id int);

>>>>>> Start with T1:

T1> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
T1> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Does a with i = 1 exist?i
---1
(1 Zeile)

T1> INSERT INTO b VALUES (1); -- Great, it existed, insert row
pointing to it in b.
INSERT 0 1

>>>>>> Switch to T2:

T2> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Evil
transaction T2 is intervening!
BEGIN
T2> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR SHARE.i
---1
(1 Zeile)

T2> SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
anything pointing to it.a_id
------
(0 Zeilen)

T2> DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
blocks, because T1 is still holding the lock).

>>>>>> Switch to T1:

1> COMMIT; -- Commit the insertion of a row pointing to a with i = 1
(this releases all locks that T1 is holding).
COMMIT

>>>>>> T2 continues:

DELETE 1
T2> COMMIT; -- Commit the deletion of a with i = 1.
COMMIT
T2> SELECT * FROM b EXCEPT SELECT * FROM a; -- Check for inconsistencies.a_id
------   1
(1 Zeile)

Woops.

Nicolas


Re: Partitioning/inherited tables vs FKs

From
Marko Tiikkaja
Date:
This is getting way off topic, but:

On 5/11/10 3:55 PM +0300, Nicolas Barbier wrote:
> T2>  SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR SHARE.
>   i
> ---
>   1
> (1 Zeile)
>
> T2>  SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
> anything pointing to it.
>   a_id
> ------
> (0 Zeilen)
>
> T2>  DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
> blocks, because T1 is still holding the lock).

Obviously you wouldn't delete anything with a SHARE lock.


Regards,
Marko Tiikkaja


Re: Partitioning/inherited tables vs FKs

From
Nicolas Barbier
Date:
2010/5/11 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>:

> This is getting way off topic, but:
>
> On 5/11/10 3:55 PM +0300, Nicolas Barbier wrote:
>>
>> T2>  SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR
>> SHARE.
>>  i
>> ---
>>  1
>> (1 Zeile)
>>
>> T2>  SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
>> anything pointing to it.
>>  a_id
>> ------
>> (0 Zeilen)
>>
>> T2>  DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
>> blocks, because T1 is still holding the lock).
>
> Obviously you wouldn't delete anything with a SHARE lock.

So where would you put a SELECT ... FOR SHARE to fix the problem? (Per
"Will SELECT ... FOR SHARE not help?".) I agree that my second FOR
SHARE doesn't really make a lot of sense, but that doesn't disprove
the fact that the first FOR SHARE fails to ensure consistency.

Nicolas


Re: Partitioning/inherited tables vs FKs

From
Marko Tiikkaja
Date:
On 5/11/10 4:07 PM +0300, Nicolas Barbier wrote:
> 2010/5/11 Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi>:
>
>> This is getting way off topic, but:
>>
>> On 5/11/10 3:55 PM +0300, Nicolas Barbier wrote:
>>>
>>> T2>    SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR
>>> SHARE.
>>>   i
>>> ---
>>>   1
>>> (1 Zeile)
>>>
>>> T2>    SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
>>> anything pointing to it.
>>>   a_id
>>> ------
>>> (0 Zeilen)
>>>
>>> T2>    DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
>>> blocks, because T1 is still holding the lock).
>>
>> Obviously you wouldn't delete anything with a SHARE lock.
>
> So where would you put a SELECT ... FOR SHARE to fix the problem? (Per
> "Will SELECT ... FOR SHARE not help?".) I agree that my second FOR
> SHARE doesn't really make a lot of sense, but that doesn't disprove
> the fact that the first FOR SHARE fails to ensure consistency.

I took the "SELECT ... FOR SHARE" suggestion in a more general way, 
suggesting the use of row-level locks.  T2 should be holding an 
exclusive row-level lock (SELECT ... FOR UPDATE) when checking for 
references.


Regards,
Marko Tiikkaja


Re: Partitioning/inherited tables vs FKs

From
Marko Tiikkaja
Date:
On 5/11/10 4:11 PM +0300, I wrote:
> I took the "SELECT ... FOR SHARE" suggestion in a more general way,
> suggesting the use of row-level locks.  T2 should be holding an
> exclusive row-level lock (SELECT ... FOR UPDATE) when checking for
> references.

Hmm.  Right, that transaction wouldn't see the rows in a serializable 
transaction so this doesn't solve the problem.


Regards,
Marko Tiikkaja


Re: Partitioning/inherited tables vs FKs

From
Tom Lane
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 5/11/10 4:11 PM +0300, I wrote:
>> I took the "SELECT ... FOR SHARE" suggestion in a more general way,
>> suggesting the use of row-level locks.  T2 should be holding an
>> exclusive row-level lock (SELECT ... FOR UPDATE) when checking for
>> references.

> Hmm.  Right, that transaction wouldn't see the rows in a serializable 
> transaction so this doesn't solve the problem.

Yeah.  The hidden "magic" in the built-in FK code is not locking
(it does actually use SELECT FOR SHARE to lock rows).  Rather, it's
about doing tuple liveness checks using snapshots that aren't available
at the SQL level, particularly in serializable transactions.
        regards, tom lane


Re: Partitioning/inherited tables vs FKs

From
"Kevin Grittner"
Date:
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>>>>>>> Switch to T1:
> 
> 1> COMMIT; -- Commit the insertion...
> COMMIT
> 
>>>>>>> T2 continues:
> 
> DELETE 1
> T2> COMMIT; -- Commit the deletion of a with i = 1.
> COMMIT
> T2> SELECT * FROM b EXCEPT SELECT * FROM a;
>  a_id
> ------
>     1
> (1 Zeile)
> 
> Woops.
This is exactly the sort of issue for which true serializable
behavior will provide a solution.  I will be offering a patch to
implement that for 9.1 once 9.0 settles down.  FWIW when you commit
T1, the patched code rolls back T2 with this message:
T2> DELETE FROM a WHERE i = 1;
ERROR:  could not serialize access due to read/write dependencies
among transactions
HINT:  The transaction might succeed if retried.
Thanks for the example; I will it to the others.
-Kevin


On May 11, 2010, at 13:29 , Robert Haas wrote:
> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy@ac-sw.com> wrote:
>>> The referential integrity triggers contain some extra magic that isn't
>>> easily simulatable in userland, and that is necessary to make the
>>> foreign key constraints airtight.  We've discussed this previously but
>>> I don't remember which thread it was or the details of when things
>>> blow up.  I think it's something like this: the parent has a tuple
>>> that is not referenced by any child.  Transaction 1 begins, deletes
>>> the parent tuple (checking that it has no children), and pauses.
>>> Transaction 2 begins, adds a child tuple that references the parent
>>> tuple (checking that the parent exists, which it does), and commits.
>>> Transaction 1 commits.
>>
>> Will SELECT ... FOR SHARE not help?
>
> Try it, with the example above.  I think you'll find that it doesn't.

That example does in fact work. Here is the precise sequence of commands I tested with constraint checking triggers
implementedin PL/PGSQL. 

C1: BEGIN
C1: DELETE FROM parent WHERE parent_id = 0
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE -- Optional
C2: INSERT INTO child (parent_id) VALUES (0) -- Waits for C1 to commit
C1: COMMIT -- Now C2 fails either with a constraint_violation or serialization_error

The reason this works is that C2's attempt to SHARE-lock the parent row blocks until C1 commits. In READ COMMITTED mode
C2will then realize that the parent row is now gone. In SERIALIZABLE mode it won't get that far, because the
SHARE-lockingattempt throws a serialization error since the parent row was concurrently modified. 

The serialization error, however, disappears if the two transactions are swapped. The following sequence of commands
succeeds,even though the FK constraint is not satisfied. 

C1: BEGIN
C1: INSERT INTO child (parent_id) VALUES (0)
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
C2: SELECT TRUE -- Take snapshot *before* C1 commits
C1: COMMIT
C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
C2: COMMIT

It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently
SHARE-lockedis allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not
usuallydepend on the other in which they are taken. 

The build-in constraint triggers avoid the second case by checking not only for rows visible under the transaction's
snapshotbut also for rows visible under a freshly taken snapshot in the ri_parent PERFORM statement. I do wonder if the
recheckwas still needed if the DELETE in the second case threw a serialization_error also. Does anyone have an example
thatproves it necessary? 

best regards,
Florian Pflug

Here are the table definitions and trigger functions I used:

CREATE TABLE parent (parent_id SERIAL NOT NULL PRIMARY KEY);
CREATE TABLE child (child_id SERIAL NOT NULL PRIMARY KEY, parent_id INTEGER NOT NULL);

CREATE FUNCTION ri_parent() RETURNS TRIGGER AS $body$
BEGIN PERFORM TRUE FROM child WHERE parent_id = OLD.parent_id; IF FOUND THEN   RAISE SQLSTATE '23503' USING MESSAGE =
'Parent' || OLD.parent_id || ' still referenced during ' || TG_OP; END IF; RETURN NULL; 
END;
$body$ LANGUAGE PLPGSQL VOLATILE;
CREATE TRIGGER ri_parent AFTER UPDATE OR DELETE ON parent FOR EACH ROW EXECUTE PROCEDURE ri_parent();

CREATE FUNCTION ri_child() RETURNS TRIGGER AS $body$
BEGIN PERFORM TRUE FROM parent WHERE parent_id = NEW.parent_id FOR SHARE OF parent; IF NOT FOUND THEN   RAISE SQLSTATE
'23503'USING MESSAGE = 'Parent ' || NEW.parent_id || ' does not exist during ' || TG_OP; END IF; RETURN NULL; 
END;
$body$ LANGUAGE PLPGSQL VOLATILE;
CREATE TRIGGER ri_child AFTER INSERT OR UPDATE ON child FOR EACH ROW EXECUTE PROCEDURE ri_child();



2010/5/11 Florian Pflug <fgp@phlo.org>:
> On May 11, 2010, at 13:29 , Robert Haas wrote:
>> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy@ac-sw.com> wrote:
>>>> The referential integrity triggers contain some extra magic that isn't
>>>> easily simulatable in userland, and that is necessary to make the
>>>> foreign key constraints airtight.  We've discussed this previously but
>>>> I don't remember which thread it was or the details of when things
>>>> blow up.  I think it's something like this: the parent has a tuple
>>>> that is not referenced by any child.  Transaction 1 begins, deletes
>>>> the parent tuple (checking that it has no children), and pauses.
>>>> Transaction 2 begins, adds a child tuple that references the parent
>>>> tuple (checking that the parent exists, which it does), and commits.
>>>> Transaction 1 commits.
>>>
>>> Will SELECT ... FOR SHARE not help?
>>
>> Try it, with the example above.  I think you'll find that it doesn't.
>
> That example does in fact work. Here is the precise sequence of commands I tested with constraint checking triggers
implementedin PL/PGSQL. 
[...]
> The serialization error, however, disappears if the two transactions are swapped. The following sequence of commands
succeeds,even though the FK constraint is not satisfied. 

Thanks for figuring this out.  I thought there was a case like this
but I couldn't remember exactly how to reproduce it.

> C1: BEGIN
> C1: INSERT INTO child (parent_id) VALUES (0)
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE -- Take snapshot *before* C1 commits
> C1: COMMIT
> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
> C2: COMMIT
>
> It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently
SHARE-lockedis allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not
usuallydepend on the other in which they are taken. 

Wait - I'm confused.  The DELETE in your example happens after C1
commits, so C1 can't still be holding any locks (nor does C2 take any
locks prior to the commit of C1).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Florian Pflug <fgp@phlo.org> wrote:
> The serialization error, however, disappears if the two
> transactions are swapped. The following sequence of commands
> succeeds, even though the FK constraint is not satisfied.
> 
> C1: BEGIN
> C1: INSERT INTO child (parent_id) VALUES (0)
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE -- Take snapshot *before* C1 commits
> C1: COMMIT
> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
> C2: COMMIT
Thanks for another good example.  Added to serializable test suite.
C2> DELETE FROM parent WHERE parent_id = 0;
ERROR:  could not serialize access due to read/write dependencies
among transactions
HINT:  The transaction might succeed if retried.
CONTEXT:  SQL statement "SELECT TRUE FROM child WHERE parent_id =
OLD.parent_id"
PL/pgSQL function "ri_parent" line 2 at PERFORM
By the way, when adding these, I'm taking off the "FOR SHARE" or
"FOR UPDATE" clauses; they're not needed with true serializable
transactions.  Otherwise, examples used as presented.
-Kevin


On May 11, 2010, at 17:04 , Robert Haas wrote:
> 2010/5/11 Florian Pflug <fgp@phlo.org>:
>> C1: BEGIN
>> C1: INSERT INTO child (parent_id) VALUES (0)
>> C2: BEGIN
>> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
>> C2: SELECT TRUE -- Take snapshot *before* C1 commits
>> C1: COMMIT
>> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
>> C2: COMMIT
>>
>> It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently
SHARE-lockedis allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not
usuallydepend on the other in which they are taken. 
>
> Wait - I'm confused.  The DELETE in your example happens after C1
> commits, so C1 can't still be holding any locks (nor does C2 take any
> locks prior to the commit of C1).

I used the word "lock" a bit sloppy there.

What I did want to point out is that any UPDATE by a SERIALIZABLE transaction to a row that has been concurrently
updatedcauses a serialization error. The same happens when it instead SHARE- or UPDATE-locks the concurrently updated
row.This is also independent from the commit-time of the concurrent transaction, as long as it is deemed invisible by
theUPDATE/LOCK-ing transaction. In other words, any attempt to UPDATE, SHARE-lock or UPDATE-lock a row from within a
SERIALIZABLEtransaction fails if the visible row version isn't the latest row version. If, however, the order of the
eventsis the other way around, such that the SHARE-locking or UPDATE-locking happens first, and the UPDATE afterwards,
thenno serialization error occurs! 

That might seem sensible if you view SHARE-locks and UPDATE-locks as locks, and the "taint" that marks a row (the
existenceof a newer row version) after it has been updated by a transaction as "something else". After all, as you
pointedout, the lock is gone as soon as the transaction commits. If, however, you view that "taint" as a slightly
strangekind of lock that a transaction holds on the rows it updated even *after* the transaction committed, then it
stopsmaking sense. You now have a "locking" behavior with order-dependent conflicts. 

Viewing those "taints" as locks is consistent with how that true serializability algorithm Kevin Grittner is working on
dealswith those things, I believe - or at least it's probably that paper in the back of my mind that made me call it
"lock"in the first place. 

It would be interesting to formalize this in the language of that paper - unfortunately, I probably lack the time to do
thisin the near future :-(  

To avoid more confusion, here are the sequences of commands I have in mind:

No serialization error (and neither with "FOR UPDATE" instead of "FOR SHARE")
C1: BEGIN
C1: SELECT t.* FROM t WHERE id = 1 FOR SHARE
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
C2: SELECT TRUE --Take snapshot before c1 commits
C1: COMMIT
C2: UPDATE t SET id = 2 WHERE id = 1
C2: COMMIT

Serialization error (and also with "FOR UPDATE" instead of "FOR SHARE")
C1: BEGIN
C1: UPDATE t SET id = 2 WHERE id = 1
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
C2: SELECT TRUE --Take snapshot before c1 commits
C1: COMMIT
C2: SELECT t.* FROM t WHERE id = 1 FOR SHARE --serialization error
C2: COMMIT

best regards,
Florian Pflug



On 5/11/2010 12:39 PM, Florian Pflug wrote:
> On May 11, 2010, at 17:04 , Robert Haas wrote:
>> 2010/5/11 Florian Pflug <fgp@phlo.org>:
>>> C1: BEGIN
>>> C1: INSERT INTO child (parent_id) VALUES (0)
>>> C2: BEGIN
>>> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
>>> C2: SELECT TRUE -- Take snapshot *before* C1 commits
>>> C1: COMMIT
>>> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
>>> C2: COMMIT
>>> 
>>> It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently
SHARE-lockedis allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not
usuallydepend on the other in which they are taken.
 
>> 
>> Wait - I'm confused.  The DELETE in your example happens after C1
>> commits, so C1 can't still be holding any locks (nor does C2 take any
>> locks prior to the commit of C1).
> 
> I used the word "lock" a bit sloppy there.
> 
> What I did want to point out is that any UPDATE by a SERIALIZABLE transaction to a row that has been concurrently
updatedcauses a serialization error. The same happens when it instead SHARE- or UPDATE-locks the concurrently updated
row.This is also independent from the commit-time of the concurrent transaction, as long as it is deemed invisible by
theUPDATE/LOCK-ing transaction. In other words, any attempt to UPDATE, SHARE-lock or UPDATE-lock a row from within a
SERIALIZABLEtransaction fails if the visible row version isn't the latest row version. If, however, the order of the
eventsis the other way around, such that the SHARE-locking or UPDATE-locking happens first, and the UPDATE afterwards,
thenno serialization error occurs!
 
> 
> That might seem sensible if you view SHARE-locks and UPDATE-locks as locks, and the "taint" that marks a row (the
existenceof a newer row version) after it has been updated by a transaction as "something else". After all, as you
pointedout, the lock is gone as soon as the transaction commits. If, however, you view that "taint" as a slightly
strangekind of lock that a transaction holds on the rows it updated even *after* the transaction committed, then it
stopsmaking sense. You now have a "locking" behavior with order-dependent conflicts.
 
> 
> Viewing those "taints" as locks is consistent with how that true serializability algorithm Kevin Grittner is working
ondeals with those things, I believe - or at least it's probably that paper in the back of my mind that made me call it
"lock"in the first place.
 
> 
> It would be interesting to formalize this in the language of that paper - unfortunately, I probably lack the time to
dothis in the near future :-( 
 
> 
> To avoid more confusion, here are the sequences of commands I have in mind:
> 
> No serialization error (and neither with "FOR UPDATE" instead of "FOR SHARE")
> C1: BEGIN
> C1: SELECT t.* FROM t WHERE id = 1 FOR SHARE
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE --Take snapshot before c1 commits
> C1: COMMIT
> C2: UPDATE t SET id = 2 WHERE id = 1
> C2: COMMIT
> 
> Serialization error (and also with "FOR UPDATE" instead of "FOR SHARE")
> C1: BEGIN
> C1: UPDATE t SET id = 2 WHERE id = 1
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE --Take snapshot before c1 commits
> C1: COMMIT
> C2: SELECT t.* FROM t WHERE id = 1 FOR SHARE --serialization error
> C2: COMMIT

The problem really is that in the case of deleting a PK row while a 
concurrent transaction creates such a reference cannot be solved with 
user level visibility rules in case of a serializable transacton, unless 
you go really expensive routes.

One corner case is that the transaction doing the FK INSERT commits 
after the serializable transaction doing the PK DELETE got its snapshot 
and also does the PK check before the PK DELETE got the lock on it. No 
user level visibility allows it to see that newly created reference. And 
unless the FK INSERTer actually UPDATE's the PK row (expensive), the PK 
DELETE will not throw anything. It will wait to get the lock and go 
ahead with the delete.

The PK DELETE needs to be able to do some sort of dirty scan in order to 
see those new references. That is what I think Tom was referring to.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


On May 11, 2010, at 20:05 , Jan Wieck wrote:
> The problem really is that in the case of deleting a PK row while a concurrent transaction creates such a reference
cannotbe solved with user level visibility rules in case of a serializable transacton, unless you go really expensive
routes.

Yeah. The information to detect this is there, though - the xmax of the PK row will be a multixact in this case, and
onemember of that set won't be deemed visible by the deleting transaction.  

> One corner case is that the transaction doing the FK INSERT commits after the serializable transaction doing the PK
DELETEgot its snapshot and also does the PK check before the PK DELETE got the lock on it. No user level visibility
allowsit to see that newly created reference. And unless the FK INSERTer actually UPDATE's the PK row (expensive), the
PKDELETE will not throw anything. It will wait to get the lock and go ahead with the delete. 

Exactly. It consciously waits for the lock (knowing that it was held by a concurrent transaction *not* visible to the
deletingtransaction), and after obtaining the lock goes on to delete the row. If the concurrent transaction hadn't held
amere lock, but had instead UPDATEd the row, this would cause a serialization error. 

> The PK DELETE needs to be able to do some sort of dirty scan in order to see those new references. That is what I
thinkTom was referring to. 

Yeah. Though the need for that "dirty scan" (it's not actually a scan with DIRTY READ semantics, but rather one with
READCOMMITTED semantics) might vanish if a SHARE lock had the same effect (causing a serialization error) on concurrent
transactionsthat an UPDATE has. 

I'm not yet convinced that this is true, nor do I necessarily think that making all SHARE locks behave that way would
bea good idea. But if my assertion is in fact true it would allow for robust user-level referential constraints by
eithermodifying SHARE-lock behavior or adding a new row-lock type.  

best regards,
Florian Pflug



Re: Partitioning/inherited tables vs FKs

From
Jim Nasby
Date:
On May 6, 2010, at 4:31 AM, Florian Pflug wrote:
>> The use case for this was there were different news items,
>> and there were another table for summaries, that could point
>> to any of the news items table. Another use case could be
>> a large partitioned table with an FK to the main table where
>> the referring table might only contain very few "interesting" data.
>
> Yeah, this is a long-standing issue with inheritance. Table inheritance in postgres isn't much more than an implicit
UNIONdone on selects plus some logic in ALTER TABLE to keep propagate structural changes. Indices and constraints
basicallyalways behave as if ONLY had been specified. I'm not even sure if the ids are globally unique in your example
-it might be that each child's "id serial" column gets its very own sequence. 
>
> One possible workaround is no create a table, say referred_ids, that contains all the ids from parent and all of its
children,kept up-to-date via triggers, and point the FK constraint to that table. That also allows for a global unique
constrainton the ids by definition a suitable unique or primary key constraint on referred_ids. 
>
> What lies at the heart of this problem is the lack of multi-table indices and hence multi-table unique constraints in
postgres.AFAIK with those in place the rest amounts to the removal of ONLY from the constraint check queries plus some
codeto propagate constraint triggers to child tables. 

FWIW, we use inheritance for something other than partitioning, and I created a trigger that provides a crude form of a
foreignkey constraint, as well as one that provides a crude global unique constraint on the PK. Both probably have
holesand race conditions, but I figure they're better than just hoping no one screws something up. 

BTW, my intention is to release all the generic tools we've developed to pgFoundry, it just hasn't happened yet. If
enoughpeople find this stuff interesting I can try and up the priority on getting that done. (And if you're *really*
wantingthis stuff you could pay 2nd Quadrant or CMD to get it for you.) 

test_us@workbook.local=# \df+ payment_instruments.tg_payment_instruments_unique
List of functions
-[ RECORD 1
]-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Schema              | payment_instruments
Name                | tg_payment_instruments_unique
Result data type    | trigger
Argument data types |
Volatility          | volatile
Owner               | cnuadmin
Language            | plpgsql
Source code         |                    :                    : DECLARE                   : name CONSTANT text :=
'payment_instruments.tg_payment_instruments_unique';                  : c_full_table_name CONSTANT text :=
TG_TABLE_SCHEMA|| '.' || TG_TABLE_NAME;                   : BEGIN                   :     PERFORM tools.assert( TG_WHEN
='BEFORE', TG_NAME || ' ON ' || c_full_table_name ||' must be an BEFORE trigger' );                   :     PERFORM
tools.assert(TG_LEVEL = 'ROW', TG_NAME || ' ON ' || c_full_table_name ||' must be a row-level trigger' );
   :                        :     -- Deleting would break RI, so don't allow it. Granted, this should probably be a
separatetrigger, but...                   :     PERFORM tools.assert(
'payment_instruments__payment_instruments__inherit__no_delete'                  :         , TG_OP != 'DELETE'
       :         , 'DELETEs are not allowed on ' || c_full_table_name || ' (they would break inheritance RI)'
       :     );                   :                    :     RAISE DEBUG '%:                   :         TG_OP = %
            :         TG_TABLE_NAME = %                   :         NEW.payment_instrument_id = %'                   :
      , name                   :         , TG_OP                   :         , TG_TABLE_NAME                   :
, NEW.payment_instrument_id                    :     ;                   :                    :     -- Changing the PK
wouldbreak RI, so we shouldn't allow it. Granted, this should probably be a separate trigger, but...
:    IF TG_OP = 'UPDATE' THEN                   :         PERFORM tools.assert(
'payment_instruments__payment_instruments__inherit__pk_no_change'                  :             ,
NEW.payment_instrument_idIS NOT DISTINCT FROM OLD.payment_instrument_id                   :             , 'Changing
payment_instrument_idon ' || c_full_table_name || ' is not allowed (it would break inheritance RI)'                   :
       );                   :     ELSE                   :         -- Only check for dupes on insert, otherwise we'll
seeour own ID                   :         PERFORM tools.assert(
'payment_instruments__payment_instruments__inherit__unique'                  :             , NOT EXISTS( SELECT * FROM
payment_instruments.payment_instrumentsWHERE payment_instrument_id = NEW.payment_instrument_id )                   :
        , 'duplicate row violation, payment_instrument_id ' || coalesce( NEW.payment_instrument_id::text, '<NULL>' ) ||
'already exists'                   :         );                   :     END IF;                   :
:    RETURN NEW;                   : END;                   :                    :  
Description         | Trigger to try and prevent duplicated payment_instrument_ids. This trigger is in no way perfect
andhas a huge race condition, but generally these IDs should be getting assigned by a sequence, so we should not
normallyhave an issue with duped IDs anyway. 


test_us@workbook.local=# \df+ payment_instruments.tg_payment_instruments_ri
List of functions
-[ RECORD 1
]-------+------------------------------------------------------------------------------------------------------------------------------------------
Schema              | payment_instruments
Name                | tg_payment_instruments_ri
Result data type    | trigger
Argument data types |
Volatility          | volatile
Owner               | cnuadmin
Language            | plpgsql
Source code         |                    :                    : DECLARE                   :     name CONSTANT text :=
'payment_instruments.tg_payment_instruments_ri';                  :     c_full_table_name CONSTANT text :=
TG_TABLE_SCHEMA|| '.' || TG_TABLE_NAME;                   :     v_payment_instrument_type
payment_instruments.payment_instrument_types.payment_instrument_type%TYPE;                  :     v_table_name text;
              :     v_only text := '';                   :     v_result int;                   :     sql text;
        : BEGIN                   :     PERFORM tools.assert( TG_WHEN = 'AFTER', TG_NAME || ' ON ' || c_full_table_name
||'must be an AFTER trigger' );                   :     PERFORM tools.assert( TG_LEVEL = 'ROW', TG_NAME || ' ON ' ||
c_full_table_name||' must be a row-level trigger' );                   :     PERFORM tools.assert( TG_OP IN ( 'INSERT',
'UPDATE'), TG_NAME || ' ON ' || c_full_table_name ||' must be on INSERT or UPDATE' );                   :
    :     -- Th generally won't be allowed, but the trigger should still support it                   :     IF
NEW.payment_instrument_idIS NULL THEN                   :         RAISE DEBUG '%: payment_instrument_id is NULL,
skippingcheck', name;                   :         RETURN NULL;                   :     END IF;                   :
             :     -- If we're updating and we haven't modified payment_instrument_id, just bail                   :
IF TG_OP = 'UPDATE' THEN                   :         IF NEW.payment_instrument_id IS NOT DISTINCT FROM
OLD.payment_instrument_idTHEN                   :             RAISE DEBUG '%: payment_instrument_id ( = % ) is
unchanged,skipping check', name, NEW.payment_instrument_id;                   :             RETURN NULL;
  :         END IF;                   :     END IF;                   :                    :     /*                   :
    * We want to not only check for existence of the desired row, we also want                   :      * to share-lock
it.Unfortunately, sharelocks aren't implemented for                   :      * inherited tables, so we need to find the
recordin the correct table. We                   :      * can do this fairly easily if we can find out what "type" of
recordit is.                   :      */                   :                    :     -- Figure out what type of record
thisis (of course, record might not exist here)                   :     SELECT INTO v_payment_instrument_type
'payment_instruments.'|| payment_instrument_type || 's'                   :         FROM
payment_instruments.payment_instrument_types__get((                   :                     SELECT
payment_instrument_type_id                  :                         FROM payment_instruments.payment_instruments pi
               :                         WHERE pi.payment_instrument_id = NEW.payment_instrument_id                   :
               ) )                   :     ;                   :     IF NOT FOUND OR v_payment_instrument_type IS NULL
THEN                  :         -- There wasn't a record at all                   :         v_only := 'ONLY '; -- FOR
SHAREwon't work if we select from both the parent and the children                   :         v_table_name :=
'payment_instruments.payment_instruments';                  :     ELSE                   :         -- We figured out
whattype of record this is, now try and lock the row in the right table                   :         v_table_name :=
v_payment_instrument_type;                  :     END IF;                   :                    :     sql := 'SELECT 1
FROM' || v_only || v_table_name || '                   :         WHERE payment_instrument_id = ' ||
NEW.payment_instrument_id|| '                   :         FOR SHARE'                   :     ;                   :
RAISEDEBUG '%: Executing SQL %', name, sql;                   :     -- Note that simply using a PERFORM here might get
optimizedout.                   :     EXECUTE sql INTO v_result;                   :                    :     /*
          :      * Normally we should always find an record, but if it was somehow                   :      * put in
thewrong table, or if it was deleted after our initial                   :      * select then we wouldn't have one.
             :      */                   :     IF v_result = 1 THEN                   :         RAISE DEBUG '%: record
forpayment_instrument_id = % found in table %', name, NEW.payment_instrument_id, v_table_name;                   :
  RETURN NULL;                   :     END IF;                   :                    :     RAISE EXCEPTION 'insert on
updateon table "%.%" violates foreign key; payment_instrument_id=(%) is not present in table "%"'                   :
     , TG_TABLE_SCHEMA                   :         , TG_TABLE_NAME                   :         ,
NEW.payment_instrument_id                  :         , v_table_name                   :     ;                   : END;
                :                    :  
Description         | Enables Refferential Integrity at the payment_instruments level (normal RI on a table that is an
inheritanceparent does not really work) 


--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: Partitioning/inherited tables vs FKs

From
Greg Stark
Date:
On Thu, May 6, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> * the index grows as the size of the total data set, it's not limited
> by partition size
>
> * can't cheaply drop one partition any more, you have to vacuum the
> (big) index first

So I wholeheartedly agree with the general sentiment that if you need
global indexes then partitioning just isn't really the right tool for
you.

But it occurs to me that we could defer the vacuum safely. I'm
assuming a index heap-tid pointer for a global index would include a
relid or some other identifier to specify which partition the tuple is
in. If you drop that partition those can all just be left as dangling
pointers as long as we don't reuse that id. So all we would need is
some way to leave a catalog entry reserving that id. The data files
can be truncated and deleted normally and whenever vacuum does run
against the index it can clean up the catalog entries for the deleted
partitions.

But I would rather work on having unique and foreign key constraints
that work on keys which include the partition key than work on global
indexes.
-- 
greg