Guarenteeing complex referencial integrity through custom triggers - Mailing list pgsql-hackers

From Joris Dobbelsteen
Subject Guarenteeing complex referencial integrity through custom triggers
Date
Msg-id 73427AD314CC364C8DF0FFF9C4D693FF037A3A@nehemiah.joris2k.local
Whole thread Raw
Responses Re: Guarenteeing complex referencial integrity through custom triggers
Re: Guarenteeing complex referencial integrity through custom triggers
Re: Guarenteeing complex referencial integrity through custom triggers
List pgsql-hackers
At this time Postgresql is unable to guarentee that complex* referencial
integrity (RI) constraints can be enforced using standard (read non-C)
triggers or SQL. This is an unfortunate side-effect of the MVCC model,
as it allows multiple transactions to modify a snapshot of the data at a
certain point in time. Later these modifications are 'merged' together,
resuling in a violation of a constraint that was not in the individual
snapshots. (You might notice that the same happens with most version
constrol software).

Some previous discussion was in
<http://archives.postgresql.org/pgsql-general/2007-02/msg01234.php>.

As a real-world example where the constraint cannot be enforced in
postgresql.
"For every tuple t in cartridge_change, there must exists a tuple t' in
printers with t.id = t'.id, and a tuple t'' in cartridge_types with
t.color = t''.color and t'.printertype = t''.printertype"

Under serializable isolation its impossible to enforce this constraint
in all cases (without triggers written in C). For read committed its
probably possible. The failure lies in scenarios where the MVCC rules
hide newly inserted rows (by a transaction that starts later). This is
also a concern in the referencial integrity (RI) constraints. Here the
problem is solved by cross-checking against the 'latest' snapshot.
However this functionality is not exposed to the user and can only be
used from functions written in C.

My intention is to expose the functionality to the outside world for
general use. This provides means to ensure custom complex constraints
can be enforced properly. I hope to push it into 8.3 if possible.


Now lets get down to the details:

The problem boils down to:
* Newly inserted child (dependent on existance of others) tuples are not
detected, while we should be aware of their presence.

This happens under the following situations:
* Checking the existance of child tuples.
* Deleting the child tuples.
* Updating the child tuples.

We can exclude these situations:
* Inserting a child, since the parent must exist for the current
transaction.
* Doing something to a parent, it's the 'propagation' that is a possible
problem.

One of the commercial competition, Oracle, which also has an
implementation of the MVCC model, allows the "FOR UPDATE" clause to
function as a way to ensure referencial integrity. See
<http://download-uk.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns
_sqlproc.htm#sthref294> (and beyond) for details. It seems Oracle treats
SELECT FOR UPDATE as something special.
Unfortunally the manual is not too in-depth and slight confusing on the
details, so I could be wrong.

Oracle has choosen to allow constraint enforcement by locking on the
parent tuple. In contrast postgres has chosen (historically, see RI
triggers) to fail on detecting conflicting newly inserted rows (the
cross-check).
I can't debate which is better, since the situations where the problem
can happen are restricted and transactions are not normally in
serializable isolation, its doesn't seem to be a problem in practice.
Nevertheless continuing as currently is probably the way to go, as the
infrastructure is available and working properly.

A few things remain:
* Language extension? "for referencial" seems not too intrusive.
* SPI_execute_snapshot interoperation, should give a failure.
* Doing parallel tests?

I think about hacking it in... Any opinions?

- Joris Dobbelsteen



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Server-side support of all encodings
Next
From: Oleg Bartunov
Date:
Subject: Re: tsearch_core for inclusion