Thread: 9.3 reference constraint regression

9.3 reference constraint regression

From
Daniel Wood
Date:
In 9.3 I can delete the parent of a parent-child relation if the child row is an uncommitted insert and I first update the parent.

USER1:
drop table child;
drop table parent;
create table parent (i int, c char(3));
create unique index parent_idx on parent (i);
insert into parent values (1, 'AAA');
create table child (i int references parent(i));


USER2:
BEGIN;
insert into child values (1);


USER1:
BEGIN;
update parent set c=lower(c);
delete from parent;
COMMIT;

USER2:
COMMIT;

Note that the problem also happens if the update is "set i=i".  I was expecting this update to block as the UPDATE is on a "unique index" "that can be used in a foreign key".  The "i=i" update should get a UPDATE lock and not a "NO KEY UPDATE" lock as I believe the c=... update does.

Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Daniel Wood wrote:
> In 9.3 I can delete the parent of a parent-child relation if the child row
> is an uncommitted insert and I first update the parent.

Interesting test case, thanks.  I traced through it and promptly noticed
that the problem is that HeapTupleSatisfiesUpdate is failing to consider
the case that there are foreign lockers when a tuple was created by our
own transaction.  In the past, a tuple created by our own transaction
can't possibly be seen by other transactions -- so it wouldn't be
possible for them to lock them.  But this is no longer the case, because
an UPDATE can carry forward the locks from the previous version of the
tuple.

The fix for the immediate bug is to add some code to HTSU so that it
checks for locks by other transactions even when the tuple was created
by us.  I haven't looked at the other tqual routines yet, but I imagine
they will need equivalent fixes.

One thought on the whole HTSU issue is that it's a bit strange that it
is returning HeapTupleBeingUpdated when the tuple is actually only
locked; and thus the callers can update the tuple in that case anyway.
While I haven't explored the idea fully, perhaps we should consider
adding a new return code to HTSU, with the meaning "this tuple is not
updated, but it is locked by someone".  Then it's the caller's
responsibility to check the lock for conflicts, and perhaps add more
locks.  When BeingUpdate is returned, then a tuple cannot possibly be
updated.  This may help disentangling the mess in the heapam.c routines
a bit ... or it may not (at least heap_lock_tuple will still need to
consider doing some stuff in the BeingUpdated case, so it's unlikely
that we will see much simplification there).  Another consideration is
that perhaps it's too late for the 9.3 series for a change this large.

I will start working on a patch for this tomorrow.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> The fix for the immediate bug is to add some code to HTSU so that it
> checks for locks by other transactions even when the tuple was created
> by us.  I haven't looked at the other tqual routines yet, but I imagine
> they will need equivalent fixes.

This POC patch changes the two places in HeapTupleSatisfiesUpdate that
need to be touched for this to work.  This is probably too simplistic,
in that I make the involved cases return HeapTupleBeingUpdated without
checking that there actually are remote lockers, which is the case of
concern.  I'm not yet sure if this is the final form of the fix, or
instead we should expand the Multi (in the cases where there is a multi)
and verify whether any lockers are transactions other than the current
one.  As is, nothing seems to break, but I think that's probably just
chance and should not be relied upon.

Attached are two isolation specs which illustrate both the exact issue
reported by Dan, and a similar one which involves an aborted
subtransaction having updated the second version of the row.  (This
takes a slightly different code path.)

As far as I can tell, no other routine in tqual.c needs to change other
than HeapTupleSatisfiesUpdate.  The ones that test for visibility
(Dirty, Any, Self) are only concerned with whether the tuple is visible,
and of course that won't be affected by the tuple being locked; and
HeapTupleSatisfiesVacuum is only concerned with the tuple being dead,
which similarly won't.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> This POC patch changes the two places in HeapTupleSatisfiesUpdate that
> need to be touched for this to work.  This is probably too simplistic,
> in that I make the involved cases return HeapTupleBeingUpdated without
> checking that there actually are remote lockers, which is the case of
> concern.  I'm not yet sure if this is the final form of the fix, or
> instead we should expand the Multi (in the cases where there is a multi)
> and verify whether any lockers are transactions other than the current
> one.  As is, nothing seems to break, but I think that's probably just
> chance and should not be relied upon.

After playing with this, I think the reason this seems to work without
fail is that all callers of HeapTupleSatisfiesUpdate are already
prepared to deal with the case where HeapTupleBeingUpdated is returned
but there is no actual transaction that would block the operation.
So I think the proposed patch is okay, barring a few more comments.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: 9.3 reference constraint regression

From
Andres Freund
Date:
On 2013-12-16 17:43:37 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
> > This POC patch changes the two places in HeapTupleSatisfiesUpdate that
> > need to be touched for this to work.  This is probably too simplistic,
> > in that I make the involved cases return HeapTupleBeingUpdated without
> > checking that there actually are remote lockers, which is the case of
> > concern.  I'm not yet sure if this is the final form of the fix, or
> > instead we should expand the Multi (in the cases where there is a multi)
> > and verify whether any lockers are transactions other than the current
> > one.  As is, nothing seems to break, but I think that's probably just
> > chance and should not be relied upon.
> 
> After playing with this, I think the reason this seems to work without
> fail is that all callers of HeapTupleSatisfiesUpdate are already
> prepared to deal with the case where HeapTupleBeingUpdated is returned
> but there is no actual transaction that would block the operation.
> So I think the proposed patch is okay, barring a few more comments.

Are you sure? the various wait/nowait cases don't seem to handle that
correctly.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-16 17:43:37 -0300, Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> > 
> > > This POC patch changes the two places in HeapTupleSatisfiesUpdate that
> > > need to be touched for this to work.  This is probably too simplistic,
> > > in that I make the involved cases return HeapTupleBeingUpdated without
> > > checking that there actually are remote lockers, which is the case of
> > > concern.  I'm not yet sure if this is the final form of the fix, or
> > > instead we should expand the Multi (in the cases where there is a multi)
> > > and verify whether any lockers are transactions other than the current
> > > one.  As is, nothing seems to break, but I think that's probably just
> > > chance and should not be relied upon.
> > 
> > After playing with this, I think the reason this seems to work without
> > fail is that all callers of HeapTupleSatisfiesUpdate are already
> > prepared to deal with the case where HeapTupleBeingUpdated is returned
> > but there is no actual transaction that would block the operation.
> > So I think the proposed patch is okay, barring a few more comments.
> 
> Are you sure? the various wait/nowait cases don't seem to handle that
> correctly.

Well, it would help if those cases weren't dead code.  Neither
heap_update nor heap_delete are ever called in the "no wait" case at
all.  Only heap_lock_tuple is, and I can't see any misbehavior there
either, even with HeapTupleBeingUpdated returned when there's a
non-local locker, or when there's a MultiXact as xmax, regardless of its
status.

Don't get me wrong --- it's not like this case is all that difficult to
handle.  All that's required is something like this in
HeapTupleSatisfiesUpdate:
       if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))       {           ...           if
(HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))   /* not deleter */           {               if (tuple->t_infomask &
HEAP_XMAX_IS_MULTI)              {                   int        nmembers;                   bool    remote;
     int        i;                   MultiXactMember *members;
 
                   nmembers =                       GetMultiXactIdMembers(HeapTupleHeaderGetRawXmax(tuple),
                               &members, false);                   remote = false;                   for (i = 0; i <
nmembers;i++)                   {                       if (!TransactionIdIsCurrentTransactionId(members[i].xid))
               {                           remote = true;                           break;                       }
            }                   if (nmembers > 0)                       pfree(members);
 
                   if (remote)                       return HeapTupleBeingUpdated;                   else
       return HeapTupleMayBeUpdated;               }               else if
(!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))                  return HeapTupleBeingUpdated;
             return HeapTupleMayBeUpdated;           }       }
 

The simpler code just does away with the GetMultiXactIdMembers() and
returns HeapTupleBeingUpdated always.  In absence of a test case that
misbehaves with that, it's hard to see that it is a good idea to go all
this effort there.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Well, it would help if those cases weren't dead code.  Neither
> heap_update nor heap_delete are ever called in the "no wait" case at
> all.  Only heap_lock_tuple is, and I can't see any misbehavior there
> either, even with HeapTupleBeingUpdated returned when there's a
> non-local locker, or when there's a MultiXact as xmax, regardless of its
> status.

I spent some more time trying to generate a test case that would show a
problem with the changed return values here, and was unable to.

I intend to apply this patch soon.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: 9.3 reference constraint regression

From
Andres Freund
Date:
Hi,

On 2013-12-17 18:27:26 -0300, Alvaro Herrera wrote:
> > Well, it would help if those cases weren't dead code.  Neither
> > heap_update nor heap_delete are ever called in the "no wait" case at
> > all.

Yea, that sucks, maybe we ought to change that in HEAD. But there very
well might be external callers that use it, I don't think we can just
break the API in a stable release.

> > Only heap_lock_tuple is, and I can't see any misbehavior there
> > either, even with HeapTupleBeingUpdated returned when there's a
> > non-local locker, or when there's a MultiXact as xmax, regardless of its
> > status.
> 
> I spent some more time trying to generate a test case that would show a
> problem with the changed return values here, and was unable to.
> 
> I intend to apply this patch soon.

I have to say, it makes me really uncomfortable to take such
shortcuts. In other branches we are doing liveliness checks with
MultiXactIdIsRunning() et al. Why isn't that necessary here? And how
likely is that this won't cause breakage somewhere down the line because
somebody doesn't know of that subtlety?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Andres Freund wrote:

> I have to say, it makes me really uncomfortable to take such
> shortcuts. In other branches we are doing liveliness checks with
> MultiXactIdIsRunning() et al. Why isn't that necessary here? And how
> likely is that this won't cause breakage somewhere down the line because
> somebody doesn't know of that subtlety?

I came up with the attached last night, which should do the right thing
in both cases.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: 9.3 reference constraint regression

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > I have to say, it makes me really uncomfortable to take such
> > shortcuts. In other branches we are doing liveliness checks with
> > MultiXactIdIsRunning() et al. Why isn't that necessary here? And how
> > likely is that this won't cause breakage somewhere down the line because
> > somebody doesn't know of that subtlety?
> 
> I came up with the attached last night, which should do the right thing
> in both cases.

That one had a silly bug, which I fixed and pushed now.

Thanks for the feedback,

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services