Thread: Question about RI checks

Question about RI checks

From

Nick Barnes

Date:

16 October 2014, 13:51:38

<div dir="ltr">One of the queries in ri_triggers.c has be a little baffled.<br /><br />For (relatively) obvious
reasons,a FK insert triggers a SELECT 1 FROM pk_rel ... FOR KEY SHARE.<br />For not-so-obvious reasons, a PK delete
triggersa SELECT 1 FROM fk_rel ... FOR KEY SHARE.<br /><br />I can't see what the lock on fk_rel achieves. Both
operationsare already contending for the lock on the PK row, which seems like enough to cover every eventuality.<br
/><br/>And even if the lock serves a purpose, KEY SHARE is an odd choice, since the referencing field is, in general,
nota "key" in this sense.<br /><br />Can anyone provide an explanation / counterexample?<br /></div>

Re: Question about RI checks

From

Florian Pflug

Date:

21 October 2014, 15:05:04

(CCing Alvaro, since he implemented KEY SHARE locks)

On Oct16, 2014, at 15:51 , Nick Barnes <nickbarnes01@gmail.com> wrote:
> One of the queries in ri_triggers.c has be a little baffled.
>
> For (relatively) obvious reasons, a FK insert triggers a SELECT 1 FROM pk_rel ... FOR KEY SHARE.
> For not-so-obvious reasons, a PK delete triggers a SELECT 1 FROM fk_rel ... FOR KEY SHARE.
>
> I can't see what the lock on fk_rel achieves. Both operations are already
> contending for the lock on the PK row, which seems like enough to cover
> every eventuality.

I remember wondering about this too, quite a while ago. That was before we
had KEY locks, i.e. it simply read "FOR SHARE" back then.

I don't think I ever figured out if and why that lock is necessary - so here's
an attempt at unraveling this.

The lock certainly isn't required to ensure that we see all the child rows in
the fk_rel -- since we're doing this in an AFTER trigger, we're already holding
the equivalent of an UPDATE lock on the parent row, so no new fk_rel rows can
appear concurrently. Note that "seeing" here refers to an up-to-date snapshot
-- in REPEATABLE READ mode and above, our transaction's snapshot doesn't
necessarily see those rows, but ri_PerformCheck ensure that we error out if our
transaction's snapshot prevents us from seeing any newly added child rows
(c.f. detectNewRows).

However, DELETing from fk_rel isn't restricted by any of the constraint triggers,
so child rows may still be deleted concurrently. So without the lock, it might
happen that we report a constraint violation, yet the offending row is already
gone by the time we report the error. Note that ri_PerformCheck's detectNewRows
flag doesn't enter here -- AFAIK, it only raises an error if the transaction's
snapshot sees *fewer* rows than a current snapshot.

So AFAICS, the current behaviour (sans the KEY SHARE / SHARE confusion, see
below) is this:

In READ COMMITED mode, we'll ignore rows that are deleted or reparented
concurrently (after waiting for the concurrent transaction to commit, of course)

In REPEATABLE READ mode and above, we'll report a serialization error if a child
row is deleted or reparented concurrently (if the concurrent transaction committed,
of course)

Without the lock, we'd report a constraint violation in both cases.

So in conclusion, the lock avoids raising constraint violation errors in
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
constraint violation errors into serialization failures. Or at least that's
how it looks to me.

> And even if the lock serves a purpose, KEY SHARE is an odd choice, since
> the referencing field is, in general, not a "key" in this sense.

Hm, yeah, that's certainly weird. AFAICS, what this will do is prevent any
concurrent update of any columns mentions in any unique index on the *fk_rel*
- but as you say, that set doesn't necessarily the set of columns in the
FK constraint at all.

So currently, it seems that the lock only prevent concurrent DELETES, but
not necessarily concurrent UPDATEs, even if such an UPDATE changes the parent
that a child row refers to.

Independent from whether the lock is actually desirable or not, that
inconsistency certainly looks like a bug to me...

best regards,
Florian Pflug

Re: Question about RI checks

From

Kevin Grittner

Date:

21 October 2014, 16:20:25

Florian Pflug <fgp@phlo.org> wrote:

> So in conclusion, the lock avoids raising constraint violation errors in

> a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
> constraint violation errors into serialization failures. Or at least that's
> how it looks to me.

It doesn't seem like this analysis considers all of the available ON
DELETE and ON UPDATE behaviors available.  Besides RESTRICT there is
CASCADE, SET NULL, SET DEFAULT, and NO ACTION.  Some of those
require updating the referencing rows.

>> And even if the lock serves a purpose, KEY SHARE is an odd choice, since
>> the referencing field is, in general, not a "key" in this sense.
>
> Hm, yeah, that's certainly weird.

I don't think I understand that either.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Nick Barnes

Date:

21 October 2014, 16:54:37

Thanks! I've been mulling this over for weeks; nice to know it wasn't just staring me in the face...

So in conclusion, the lock avoids raising constraint violation errors in
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
constraint violation errors into serialization failures. Or at least that's
how it looks to me.

Yeah, it had occurred to me that this is one place you might see some benefit. But waiting around on a potentially irrelevant update, just in case the RI violation resolves itself, didn't really sound like a net win. Not to mention the possibility of a deadlock, if the other transaction updates our PK or adds another reference to it.

Thanks again,
Nick Barnes

Re: Question about RI checks

From

Nick Barnes

Date:

21 October 2014, 17:28:25

On Wed, Oct 22, 2014 at 3:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

It doesn't seem like this analysis considers all of the available ON
DELETE and ON UPDATE behaviors available. Besides RESTRICT there is
CASCADE, SET NULL, SET DEFAULT, and NO ACTION. Some of those
require updating the referencing rows.

I think the logic in question is specific to RESTRICT and NO ACTION. The other cases don't look like they need to explicitly lock anything; the UPDATE / DELETE itself should take care of that.

On Wed, Oct 22, 2014 at 3:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Florian Pflug <fgp@phlo.org> wrote:

> So in conclusion, the lock avoids raising constraint violation errors in

> a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
> constraint violation errors into serialization failures. Or at least that's
> how it looks to me.

It doesn't seem like this analysis considers all of the available ON
DELETE and ON UPDATE behaviors available. Besides RESTRICT there is
CASCADE, SET NULL, SET DEFAULT, and NO ACTION. Some of those
require updating the referencing rows.

>> And even if the lock serves a purpose, KEY SHARE is an odd choice, since
>> the referencing field is, in general, not a "key" in this sense.
>
> Hm, yeah, that's certainly weird.

I don't think I understand that either.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

22 October 2014, 10:31:21

Florian Pflug <fgp@phlo.org> wrote:
> So in conclusion, the lock avoids raising constraint violation errors in
> a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
> constraint violation errors into serialization failures. Or at least that's
> how it looks to me.

I go the REPEATABLE READ case wrong there, and that case is actually broken!.
I initially assumed that trying to lock a row whose latest version is invisible
a transaction in mode REPEATABLE READ or above will result in a serialization
error. But for the queries run by ri_triggers.c, this is not the case.

AFAICS, the only executor node which takes the crosscheck snapshot into account
is nodeModifyTable. So both a plain SELECT and a SELECT FOR [KEY] SHARE | UPDATE
will simply run with the snapshot passed to the SPI, which for the query in
question is always a *current* snapshot (we pass the original snapshot as the
crosscheck snapshot in REPEATABLE READ). Thus, if there's a child row that
is visible to the transaction deleting the parent row, but that child was meanwhile
removed, it seems that we currently *won't* complain. The same holds, of course
for mode SERIALIZABLE -- we don't distinguish the two in ri_triggers.c.

But that's wrong. The transaction's snapshot *would* see that row, so we
ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
breaks true serializability in some cases, since we don't do conflict detection
for RI enforcement queries.

Here's a test case, involving two transaction A and B. I tried this on
REL9_4_STABLE.

Setup
-----
CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
CREATE TABLE child (id int NOT NULL PRIMARY KEY,                   parent_id int NOT NULL REFERENCES parent (id));
INSERT INTO parent (id) VALUES (1);
INSERT INTO child (id, parent_id) VALUES (1, 1);

Failure Case
------------
A:: set default_transaction_isolation='serializable';
A:: begin;
A:: select * from child;->  id | parent_id    ----+-----------     1 |         1
B:: set default_transaction_isolation='serializable';
B:: begin;
B:: delete from child;-> DELETE 1
B:: commit;
A:: delete from parent;-> DELETE 1
A:: commit;

A can neither come before B in any serializable history (since the DELETE
should fail then), nor after it (since it shouldn't see the row in the child
table then). Yet we don't complain, even though both transaction run in
SERIALIZABLE mode.

On Oct21, 2014, at 19:27 , Nick Barnes <nickbarnes01@gmail.com> wrote:
>> On Wed, Oct 22, 2014 at 3:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>
>> It doesn't seem like this analysis considers all of the available ON
>> DELETE and ON UPDATE behaviors available.  Besides RESTRICT there is
>> CASCADE, SET NULL, SET DEFAULT, and NO ACTION.  Some of those
>> require updating the referencing rows.
>
> I think the logic in question is specific to RESTRICT and NO ACTION.
> The other cases don't look like they need to explicitly lock anything;
> the UPDATE / DELETE itself should take care of that.

Exactly. We run the dubious SELECT FROM <fkrel> ... FOR KEY SHARE query *only*
for RESTRICT and NO ACTION.

I've traced through the revisions of ri_triggers.c a bit. Locking the
conflicting child rows in the parent table's DELETE and UPDATE triggers was
added in commit 0882951b0cdb4c6686e57121d56216cb2044f7eb which is dated
1999-12-08. (This was before we even has row-level SHARE locks, so the code
did a SELECT .. FOR UPDATE back then). This is also the commit which added
FOR UPDATE locking to RI_FKey_check, i.e. which ensured that parent rows are
locked when adding new child rows. It's commit message unfortunately doesn't
offer much in terms of explanations - it just says "Fixed concurrent visibility
bug."

When SHARE locks where implemented in commit bedb78d386a47fd66b6cda2040e0a5fb545ee371,
dating from 2005-04-28, it seems that "FOR UPDATE" was simply replaced by
"FOR SHARE" throughout ri_triggers.c. The same seems to have happened when
KEY SHARE locks where introduced on 2013-01-13 with commit
0ac5ad5134f2769ccbaefec73844f8504c4d6182.

best regards,
Florian Pflug

Re: Question about RI checks

From

Kevin Grittner

Date:

22 October 2014, 15:46:47

Florian Pflug <fgp@phlo.org> wrote:
> Florian Pflug <fgp@phlo.org> wrote:

> But that's wrong. The transaction's snapshot *would* see that row, so we
> ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
> breaks true serializability in some cases, since we don't do conflict detection
> for RI enforcement queries.
>
> Here's a test case, involving two transaction A and B. I tried this on
> REL9_4_STABLE.
>
> Setup
> -----
> CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
> CREATE TABLE child (id int NOT NULL PRIMARY KEY,
>                     parent_id int NOT NULL REFERENCES parent (id));
> INSERT INTO parent (id) VALUES (1);
> INSERT INTO child (id, parent_id) VALUES (1, 1);
>
> Failure Case
> ------------
> A:: set default_transaction_isolation='serializable';
> A:: begin;
> A:: select * from child;
> ->  id | parent_id
>     ----+-----------
>       1 |        1
> B:: set default_transaction_isolation='serializable';
> B:: begin;
> B:: delete from child;
> -> DELETE 1
> B:: commit;
> A:: delete from parent;
> -> DELETE 1
> A:: commit;
>
> A can neither come before B in any serializable history (since the DELETE
> should fail then), nor after it (since it shouldn't see the row in the child
> table then). Yet we don't complain, even though both transaction run in
> SERIALIZABLE mode.

Simplifying the display of this a bit:
 Tables parent and child each have one row.
 Transaction A ============= select * from child; [it sees the child row]                               Transaction B
                           =============                               delete from child;
[childrow is deleted] delete from parent; [parent row deleted]

TA sees the row that TB deletes, creating a rw-dependency that
implies that TA ran first.  On the other hand, the FK trigger
fired by the delete from parent *doesn't* see the row deleted by
TB, which implies that TB ran first.  I agree, this is a problem
for serializable transactions.

This should not be considered a problem for repeatable read
transactions because the change in visible rows meet the definition
of phantom reads, which are allowed in repeatable read: "A
transaction re-executes a query returning a set of rows that
satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another
recently-committed transaction."  Phantom reads are not *required*
to occur in repeatable read transactions, and in PostgreSQL they
generally don't, so we *might* want to change this behavior; I'm
just saying that we are conforming to requirements of the standard
even if we don't.  Leaving this alone for repeatable read
transactions would require a change to our documentation, though,
since we currently assert that we don't allow phantom reads in our
repeatable read implementation.

Either SSI needs to include the RI checking queries in its tracking
*or* TA needs to throw an error if there are child rows visible
according to its transaction snapshot -- at least in serializable
transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

22 October 2014, 16:44:56

> This should not be considered a problem for repeatable read
> transactions because the change in visible rows meet the definition
> of phantom reads, which are allowed in repeatable read: "A
> transaction re-executes a query returning a set of rows that
> satisfy a search condition and finds that the set of rows
> satisfying the condition has changed due to another
> recently-committed transaction."

Now I'm confused. Isn't the whole point of REPEATABLE READ to provide, well, repeatable reads? Also, note that after
theDELETE FROM parent, further SELECTS in the same transaction will use the original snapshot again, und thus will see
theconflicting child rows again that were ignored by the RI trigger. But they won't, of course, see the parent row. 

IOW, transaction A will, after the delete, see a state of the database in which the PK constraint is broken. I don't
thinkthat's acceptable in any isolation level. 

Best regards,
Florian Pflug

Re: Question about RI checks

From

Kevin Grittner

Date:

22 October 2014, 17:07:22

Florian Pflug <fgp@phlo.org> wrote:

>> This should not be considered a problem for repeatable read
>> transactions because the change in visible rows meet the
>> definition of phantom reads, which are allowed in repeatable
>> read: "A transaction re-executes a query returning a set of rows
>> that satisfy a search condition and finds that the set of rows
>> satisfying the condition has changed due to another
>> recently-committed transaction."
>
> Now I'm confused. Isn't the whole point of REPEATABLE READ to
> provide, well, repeatable reads?

What the standard requires for REPEATABLE READ is that if you
re-read the same row later in a transaction it will containt the
same values -- the read of any particular *row* is repeatable (if
it exists at both times); it does not guarantee that the same
selection criteria will return the same set of rows every time.
The standard only requires that of SERIALIZABLE transactions.

PostgreSQL has historically provided more rigorous protections at
REPEATABLE READ than the standard requires.  Our docs claim:

| When you select the level Read Uncommitted you really get Read
| Committed, and phantom reads are not possible in the PostgreSQL
| implementation of Repeatable Read, so the actual isolation level
| might be stricter than what you select. This is permitted by the
| SQL standard: the four isolation levels only define which
| phenomena must not happen, they do not define which phenomena
| must happen.

As you have shown, our FK/RI implementation exposes phantom reads
to clients, so at a minimum our docs are wrong.

> Also, note that after the DELETE FROM parent, further SELECTS in
> the same transaction will use the original snapshot again, und
> thus will see the conflicting child rows again that were ignored
> by the RI trigger. But they won't, of course, see the parent row.
>
> IOW, transaction A will, after the delete, see a state of the
> database in which the PK constraint is broken. I don't think
> that's acceptable in any isolation level.

Good point.  Based on that observation, I agree that our RI is
broken at both the REPEATABLE READ and SERIALIZABLE isolation
levels.  I think that READ COMMITTED is OK, because it will see the
child row as deleted in time to prevent problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Kevin Grittner

Date:

23 October 2014, 15:46:00

Kevin Grittner <kgrittn@ymail.com> wrote:
> Florian Pflug <fgp@phlo.org> wrote:

>> Also, note that after the DELETE FROM parent, further SELECTS in
>> the same transaction will use the original snapshot again, und
>> thus will see the conflicting child rows again that were ignored
>> by the RI trigger. But they won't, of course, see the parent
>> row.
>>
>> IOW, transaction A will, after the delete, see a state of the
>> database in which the PK constraint is broken. I don't think
>> that's acceptable in any isolation level.
>
> Good point.  Based on that observation, I agree that our RI is
> broken at both the REPEATABLE READ and SERIALIZABLE isolation
> levels.  I think that READ COMMITTED is OK, because it will see
> the child row as deleted in time to prevent problems.

Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
transaction a check for child rows when validating a parent DELETE
should consider both rows which exist according to the transaction
snapshot and according to a "current" snapshot.  Interestingly, the
run of the query passes both snapshots through to the executor, but
for this query the estate->es_crosscheck_snapshot field (which
contains the transaction snapshot) doesn't seem to be consulted.
It makes me wonder whether we were at some point doing this right
and it later got broken.

Before I write a patch to fix this, does anyone feel that we should
not use that -- in other words, does anyone consider that it is OK
for a REPEATABLE READ or SERIALIZABLE transaction to delete a
referenced row if that transaction can see referencing rows but a
concurrent transaction has deleted them?  (This currently allows
subsequent queries in the transaction to see orphaned "child" rows
when they can no longer see the parent.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

23 October 2014, 18:41:58

On Oct23, 2014, at 17:45 , Kevin Grittner <kgrittn@ymail.com> wrote:
> Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
> transaction a check for child rows when validating a parent DELETE
> should consider both rows which exist according to the transaction
> snapshot and according to a "current" snapshot.  Interestingly, the
> run of the query passes both snapshots through to the executor, but
> for this query the estate->es_crosscheck_snapshot field (which
> contains the transaction snapshot) doesn't seem to be consulted.
> It makes me wonder whether we were at some point doing this right
> and it later got broken.

I've been pondering a completely different way to fix this. Many years
ago I tried to get rid of the crosscheck snapshot completely by changing
the way locking conflicts are treated for REPEATABLE READ transactions
above.

The basic idea is that taking a share lock on a row implies that you're
going to apply further changes whose correctness depends on existence
of the row you lock. That, in particular, applies to the locks taken
by RI triggers -- we lock the parent row before we add children, because
the children's existence necessitates the existence of the parent. If
you take an exclusive lock, OTOH, that implies a modification of the row
itself (we never explicitly take that lock during an UPDATE or DELETE,
but we do so implicitly, because UPDATEs and DELETEs conflict with SHARE
locks). So after obtaining such a lock, its the lock holder's responsibility
to check that the desired update doesn't break anything, i.e. in the case
of RI that it doesn't create any orphaned children.

The only reason we need the crosscheck snapshot to do that is because
children may have been added (and the change committed) *after* the
transaction which removed the parent has taken its snapshot, but *before*
that transaction locks the parent row.

My proposal is to instead extend the locking protocol to prevent that.
Essentially, we have to raise a serialization error whenever
 1) We attempt to exclusively lock a row (this includes updating or deleting    it), and
 2) somebody else did hold a share lock on that row recently, and
 3) That somebody is invisible to us according to our snapshot.

My initial attempt to do that failed, because we used to have very little
means of storing the locking history - the only source of information was
the xmax field, so any update of a tuple removed information about previous
lock holders - even if that update was later aborted. I pondered using
multi-xids for this, but at the time I deemed that too risky - back at the
time, they had a few wraparound issues and the like which were OK for share
locks, but not for what I intended.

But now that we have KEY SHARE locks, the situation changes. We now rely on
multi-xids to a much greater extent, and AFAIK multi-xid wraparound is now
correctly dealt with. We also already ensure that the information contained
in multi-xids are preserved across tuple upgrades (otherwise, updating a row
on which someone holds a KEY SHARE lock would be broken).

So all that is missing, I think, is
 1) To make sure we only remove a multi-xid if none of the xids are invisible    to any snapshot (i.e. lie before
GlobalXminor something like that). 
 2) When we acquire a lock (either explicitly or implicitly by doing an    UPDATE or DELETE) check if all previous
committedlock holders are    visible according to our snapshot, and raise a serialization error    if not. 

The big advantage of doing that over fixing the crosscheck logic would be
that it'd make it possible to write concurrency-safe FK triggers in any
procedural language.

best regards,
Florian Pflug

Re: Question about RI checks

From

Florian Pflug

Date:

24 October 2014, 11:53:59

On Oct23, 2014, at 17:45 , Kevin Grittner <kgrittn@ymail.com> wrote:
> Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
> transaction a check for child rows when validating a parent DELETE
> should consider both rows which exist according to the transaction
> snapshot and according to a "current" snapshot.

I've pondered this further, and unfortunately it seems that this
isn't sufficient to guarantee true serializability :-(

Verifying that both snapshots contain exactly the same rows does not
prevent a child row from being inserted and immediately deleted again,
not if both actions happen *after* the parent-updating transaction
took its snapshot, but *before* it takes the crosscheck snapshot.

Let parent(id) and child(id, parent_id) again be two tables with a
FK constraint between them, let <child> be initially empty, and let
<parent> contain a single row (1).

Assume PD is a transaction which deletes all rows from <parent>,
CI a transaction which inserts the row (1, 1) into <child>, and
CD a transaction which deletes that row again.

Even with the extended cross-checking you propose, we'd still allow
the following concurrent schedule

 1. PD takes snapshot
 2. CI starts and completes
 3. CD starts and completes
 4. PD deletes from <parent> without complaining, since there were
    no conflicting rows at time (1), and none at time (4).

So far, all is well. But add two more tables, called <ci_before_pd>
and <pd_before_cd>, both initially containing one row. Let CI scan
<ci_before_pd>, let PD delete from <ci_before_pd> and scan <pd_before_cd>,
and let CD delete from <pd_before_cd>. In the concurrent schedule from
above, CI will see the row in <ci_before_pd>, and PD will delete it, and
PD will see the row in <pd_before_cd> that CD deletes. Note that SSI *will*
allow that schedule to occur without raising a serialization error
The only serial schedule which yields the same results for the various
queries pertaining <ci_before_pd> and <pd_before_cd> is

  CI -> PD -> CD,

i.e. PD has to run *between* CI and CD. But in that serial schedule,
PD *should* see a FK key violation, since CI has inserted a child which
CD hasn't yet deleted.

There is thus *no* serial schedule which yields the same results as the
concurrent schedule above for the queries pertaining <parent> and <child>,
*and* for the queries pertaining <ci_before_pd> and <pd_before_cd>, i.e
the concurrent schedule is *not* serializable. Yet even the extended cross-
check won't detect this.

Attached is an isolationtester spec file which implements this example,
and the corresponding out-file which shows that SSI permits the concurrent
schedule. Since SSI doesn't concern itself with RI enforcement queries,
it would also permit that schedule if we extended the cross-check, I think.

(I used REL9_4_STABLE as of today to try this, commit
 1cf54b00ba2100083390223a8244430643c1ec07)

best regards,
Florian Pflug

Attachment

Re: Question about RI checks

From

Robert Haas

Date:

24 October 2014, 16:42:23

On Thu, Oct 23, 2014 at 2:41 PM, Florian Pflug <fgp@phlo.org> wrote:
> The only reason we need the crosscheck snapshot to do that is because
> children may have been added (and the change committed) *after* the
> transaction which removed the parent has taken its snapshot, but *before*
> that transaction locks the parent row.
>
> My proposal is to instead extend the locking protocol to prevent that.
> Essentially, we have to raise a serialization error whenever
>
>   1) We attempt to exclusively lock a row (this includes updating or deleting
>      it), and
>
>   2) somebody else did hold a share lock on that row recently, and
>
>   3) That somebody is invisible to us according to our snapshot.
>
> My initial attempt to do that failed, because we used to have very little
> means of storing the locking history - the only source of information was
> the xmax field, so any update of a tuple removed information about previous
> lock holders - even if that update was later aborted. I pondered using
> multi-xids for this, but at the time I deemed that too risky - back at the
> time, they had a few wraparound issues and the like which were OK for share
> locks, but not for what I intended.
>
> But now that we have KEY SHARE locks, the situation changes. We now rely on
> multi-xids to a much greater extent, and AFAIK multi-xid wraparound is now
> correctly dealt with. We also already ensure that the information contained
> in multi-xids are preserved across tuple upgrades (otherwise, updating a row
> on which someone holds a KEY SHARE lock would be broken).
>
> So all that is missing, I think, is
>
>   1) To make sure we only remove a multi-xid if none of the xids are invisible
>      to any snapshot (i.e. lie before GlobalXmin or something like that).
>
>   2) When we acquire a lock (either explicitly or implicitly by doing an
>      UPDATE or DELETE) check if all previous committed lock holders are
>      visible according to our snapshot, and raise a serialization error
>      if not.

I don't think you can count on being able to figure out all of the
recent lockers by looking at xmax; it can get overwritten.  For
example, suppose transaction A locks the row and then commits.  Then
transaction B comes along and again locks the row and commits.  My
understanding is that B won't create a multi-xact since there's just
one locker; it'll just stomp on the previous xmax.

Now, you could change that, but I suspect it would have pretty drastic
implications on systems that have both foreign keys and long-running
transactions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

24 October 2014, 17:28:33

On Oct24, 2014, at 18:42 , Robert Haas <robertmhaas@gmail.com> wrote:
> I don't think you can count on being able to figure out all of the
> recent lockers by looking at xmax; it can get overwritten.  For
> example, suppose transaction A locks the row and then commits.  Then
> transaction B comes along and again locks the row and commits.  My
> understanding is that B won't create a multi-xact since there's just
> one locker; it'll just stomp on the previous xmax.
> 
> Now, you could change that, but I suspect it would have pretty drastic
> implications on systems that have both foreign keys and long-running
> transactions.

Yes, we'd have to create multi-xids in some cases were we don't do that
today. How big the effect would be, I honestly don't know. I guess it
depends on how costly multi-xid creation is, compared to updating the
parent row (to change its xmax) and updating or deleting child rows.
My hope would be that it's cheap compared to that, but maybe that's not
true, especially since multi-xids are xlog'ed nowadays.

The only other option I see would be so teach the executor to check
whether *any* snapshot between the transaction's snapshot and a current
snapshot would see a different set of rows. Simply checking whether both
the current snapshot and the transaction's snapshot see the same set isn't
sufficient, per the counter-example in my other mail :-(

An advantage of the latter is that I'd but the burden on the session
that attempts to remove a parent row, instead of on the sessions which
add or remove children.

best regards,
Florian Pflug

Re: Question about RI checks

From

Robert Haas

Date:

24 October 2014, 17:32:40

On Fri, Oct 24, 2014 at 1:28 PM, Florian Pflug <fgp@phlo.org> wrote:
> The only other option I see would be so teach the executor to check
> whether *any* snapshot between the transaction's snapshot and a current
> snapshot would see a different set of rows. Simply checking whether both
> the current snapshot and the transaction's snapshot see the same set isn't
> sufficient, per the counter-example in my other mail :-(

What about doing one scan using SnapshotAny and then testing each
returned row for visibility under both relevant snapshots?  See
whether there is any tuple for which they disagree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

24 October 2014, 18:12:14

On Oct24, 2014, at 19:32 , Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 24, 2014 at 1:28 PM, Florian Pflug <fgp@phlo.org> wrote:
>> The only other option I see would be so teach the executor to check
>> whether *any* snapshot between the transaction's snapshot and a current
>> snapshot would see a different set of rows. Simply checking whether both
>> the current snapshot and the transaction's snapshot see the same set isn't
>> sufficient, per the counter-example in my other mail :-(
> 
> What about doing one scan using SnapshotAny and then testing each
> returned row for visibility under both relevant snapshots?  See
> whether there is any tuple for which they disagree.

See my other mail - testing whether the snapshots agree isn't enough,
you'd have to check whether there could have been *any* snapshot taken
between the two which would see a different result. Or at least what I
currently think - I don't yet have an actual proof at hand that doing
this always preserves serializability. But it seems plausible that it
does, because it ensures that there is a rather large time frame during
which the enforcement query can be placed in a serial schedule without
changing its result.

This applies only to SERIALIZABLE transactions, though. For REPEATABLE
READ, it might be that checking whether the two snapshots agree is
sufficient.

So a third way forward would be to not skip the SSI logic for RI enforcement
queries. Though I fear that, due to the imprecise nature of SIREAD locks,
doing so would drive the false positive rate way up for workloads involving 
lots of parent and child row modifications.

best regards,
Florian Pflug

Re: Question about RI checks

From

Robert Haas

Date:

24 October 2014, 18:30:11

On Fri, Oct 24, 2014 at 2:12 PM, Florian Pflug <fgp@phlo.org> wrote:
>> What about doing one scan using SnapshotAny and then testing each
>> returned row for visibility under both relevant snapshots?  See
>> whether there is any tuple for which they disagree.
>
> See my other mail - testing whether the snapshots agree isn't enough,
> you'd have to check whether there could have been *any* snapshot taken
> between the two which would see a different result.

Oh, hmm.  I had thought what I was proposing was strong enough to
handle that case, but now I see that it isn't.  However, I'm not
entirely sure that it's the RI code's job to prevent such cases, or at
least not when the transaction isolation level is less than
serializable.  Is there an argument that the anomaly that results is
unacceptable at REPEATABLE READ?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

24 October 2014, 18:58:51

On Oct24, 2014, at 20:24 , Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 24, 2014 at 2:12 PM, Florian Pflug <fgp@phlo.org> wrote:
>>> What about doing one scan using SnapshotAny and then testing each
>>> returned row for visibility under both relevant snapshots?  See
>>> whether there is any tuple for which they disagree.
>> 
>> See my other mail - testing whether the snapshots agree isn't enough,
>> you'd have to check whether there could have been *any* snapshot taken
>> between the two which would see a different result.
> 
> Oh, hmm.  I had thought what I was proposing was strong enough to
> handle that case, but now I see that it isn't.  However, I'm not
> entirely sure that it's the RI code's job to prevent such cases, or at
> least not when the transaction isolation level is less than
> serializable.

Well, the responsibility was laid onto the RI code by the decision to
not do SIREAD locking for RI enforcement queries. Simply enabling SIREAD
locking without doing anything else is not a good solution, I think -
it will drive up the false-positive rate. And it would make workloads
consisting purely of serializable transactions pay the price for the
row-level locking done by the RI triggers, without getting anything in
return. 

> Is there an argument that the anomaly that results is
> unacceptable at REPEATABLE READ?

I'm not aware of any case that is clearly unacceptable for REPEATABLE
READ, but neither am I convinced that no such case exists. REPEATABLE READ
mode is hard because there doesn't seem to be any formal definition of
what we do and do not guarantee. The guarantees provided by the SQL
standard seem to be so much weaker than what we offer that they're not
really helpful :-(

I believe the best way forward is to first find a solution for SERIALIZABLE
transactions, and then check if it can be applied to REPEATABLE READ
mode too. For SERIALIZABLE mode, it's at least clear what we're aiming
for -- offering true serializability.

best regards,
Florian Pflug

Re: Question about RI checks

From

Robert Haas

Date:

24 October 2014, 20:46:15

On Fri, Oct 24, 2014 at 2:58 PM, Florian Pflug <fgp@phlo.org> wrote:
> I believe the best way forward is to first find a solution for SERIALIZABLE
> transactions, and then check if it can be applied to REPEATABLE READ
> mode too. For SERIALIZABLE mode, it's at least clear what we're aiming
> for -- offering true serializability.

That sounds like a reasonable plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Kevin Grittner

Date:

24 October 2014, 20:50:16

Florian Pflug <fgp@phlo.org> wrote:
> On Oct24, 2014, at 20:24 , Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Oct 24, 2014 at 2:12 PM, Florian Pflug <fgp@phlo.org> wrote:
>>>> What about doing one scan using SnapshotAny and then testing each
>>>> returned row for visibility under both relevant snapshots?  See
>>>> whether there is any tuple for which they disagree.
>>>
>>> See my other mail - testing whether the snapshots agree isn't enough,
>>> you'd have to check whether there could have been *any* snapshot taken
>>> between the two which would see a different result.
>>
>> Oh, hmm.  I had thought what I was proposing was strong enough to
>> handle that case, but now I see that it isn't.  However, I'm not
>> entirely sure that it's the RI code's job to prevent such cases, or at
>> least not when the transaction isolation level is less than
>> serializable.
>
> Well, the responsibility was laid onto the RI code by the decision to
> not do SIREAD locking for RI enforcement queries. Simply enabling SIREAD
> locking without doing anything else is not a good solution, I think -
> it will drive up the false-positive rate. And it would make workloads
> consisting purely of serializable transactions pay the price for the
> row-level locking done by the RI triggers, without getting anything in
> return.

Yeah, it would be far better to fix RI behavior if we can;
otherwise those using serializable transactions to protect data
integrity essentially pay the price for concurrency control around
foreign keys twice -- once using blocking locks and again using
non-blocking SIRead locks just to cover some odd corner cases.

I need to spend some more time looking at it, and I have another
couple things in front of this on my personal TODO list, but I
think that if we had a row lock which was stronger than current
SELECT FOR UPDATE behavior, and the delete of a parent row (or
updated of its referenced key) read the children using it, this
would be solved.  Essentially I'm thinking of something just like
FOR UPDATE except that a transaction which attempted a concurrent
UPDATE or DELETE of the row (before the locking transaction ended)
would error out with some form of serialization failure.  I believe
this would be consistent with the behavior of SELECT FOR UPDATE in
Oracle, so it would allow a path for those migrating from Oracle to
maintain their current logic (with a slight change to the FOR
strength clause).

>> Is there an argument that the anomaly that results is
>> unacceptable at REPEATABLE READ?
>
> I'm not aware of any case that is clearly unacceptable for REPEATABLE
> READ, but neither am I convinced that no such case exists. REPEATABLE READ
> mode is hard because there doesn't seem to be any formal definition of
> what we do and do not guarantee. The guarantees provided by the SQL
> standard seem to be so much weaker than what we offer that they're not
> really helpful :-(
>
> I believe the best way forward is to first find a solution for SERIALIZABLE
> transactions, and then check if it can be applied to REPEATABLE READ
> mode too. For SERIALIZABLE mode, it's at least clear what we're aiming
> for -- offering true serializability.

What I outline above would clearly work for both.  Arguably we
could just use FOR UPDATE in REPEATABLE READ and only use the new,
stronger lock for SERIALIZABLE.  In fact, that seems to me to be
the appropriate course.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question about RI checks

From

Florian Pflug

Date:

24 October 2014, 21:33:02

On Oct24, 2014, at 22:50 , Kevin Grittner <kgrittn@ymail.com> wrote:
> I need to spend some more time looking at it, and I have another
> couple things in front of this on my personal TODO list, but I
> think that if we had a row lock which was stronger than current
> SELECT FOR UPDATE behavior, and the delete of a parent row (or
> updated of its referenced key) read the children using it, this
> would be solved.  Essentially I'm thinking of something just like
> FOR UPDATE except that a transaction which attempted a concurrent
> UPDATE or DELETE of the row (before the locking transaction ended)
> would error out with some form of serialization failure.  I believe
> this would be consistent with the behavior of SELECT FOR UPDATE in
> Oracle, so it would allow a path for those migrating from Oracle to
> maintain their current logic (with a slight change to the FOR
> strength clause).

Hm, I don't believe any form of traditional row-level lock on the
child row can fix this. If you look at the second failure case
(an updated isolation tester spec file which includes comments is
attached), you see that it fails even if you define the FK constraint
as CASCADE instead of RESTRICT. So even a full UPDATE or DELETE
of the child rows doesn't help.

But maybe I miss-understood what you proposed.

best regards,
Florian Pflug

Attachment

fk-consistency2.spec

Re: Question about RI checks

From

Alvaro Herrera

Date:

14 November 2014, 19:59:42

Kevin, are you handling this?


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services