FOR KEY LOCK foreign keys - Mailing list pgsql-hackers

Hello,

As previously commented, here's a proposal with patch to turn foreign
key checks into something less intrusive.

The basic idea, as proposed by Simon Riggs, was discussed in a previous
pgsql-hackers thread here:
http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com

It goes like this: instead of acquiring a shared lock on the involved
tuple, we only acquire a "key lock", that is, something that prevents
the tuple from going away entirely but not from updating fields that are
not covered by any unique index.

As discussed, this is still more restrictive than necessary (we could
lock only those columns that are involved in the foreign key being
checked), but that has all sorts of implementation level problems, so we
settled for this, which is still much better than the current state of
affairs.

I published about this here:
http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/

So, as a rough design,

1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.
4. It also conflicts with UPDATE if the UPDATE modifies an attribute
   indexed by a unique index.

Here's a patch for this, on which I need to do some more testing and
update docs.

Some patch details:

1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010.
2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the
   Xmax to the locker (just like the other lock marks). If the tuple is
   already key-locked, a MultiXactId needs to be created from the
   original locker(s) and the new transaction.
3. The original tuple needs to be marked with the Cmax of the locking
   command, to prevent it from being seen in the same transaction.
4. A non-conflicting update to the tuple must carry forward some fields
   from the original tuple into the updated copy. Those include Xmax,
   XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.
5. We check for the is-indexed condition early in heap_update.  This
   check is independent of the HOT check, which occurs later in the
   routine.
6. The relcache entry now keeps two lists of indexed attributes; the new
   one only covers unique indexes.  Both lists are built in a single
   pass over the index list and saved in the relcache entry, so a
   heap_update call only does this once.  The main difference between
   the two checks is that the one for HOT is done after the tuple has
   been toasted.  This cannot be done for this check, because the
   toaster runs too late.  This means some work is duplicated.  We
   could optimize this further.


Something else that might be of interest: the patch as presented here
does NOT solve the deadlock problem originally presented by Joel
Jacobson.  It does solve the second, simpler example I presented in my
blog article referenced above, however.  I need to have a closer look at
that problem to figure out if we could fix the deadlock too.

I need to thank Simon Riggs for the original idea, and Robert Haas for
some thoughtful discussion on IM that helped me figure out some
roadblocks.  Of course, without the pgsql-hackers discussion there
wouldn't be any patch at all.

I also have to apologize to everyone for the lateness in this.  Some
severe illness brought me down, then the holiday season slowed
everything almost to a halt, then a rushed but very much welcome move to
a larger house prevented me from dedicating the time I originally
intended.  All those things are settled now, hopefully.

--
Álvaro Herrera

Attachment

pgsql-hackers by date:

Previous
From: Itagaki Takahiro
Date:
Subject: Re: SQL/MED - file_fdw
Next
From: Alvaro Herrera
Date:
Subject: Re: Bug in pg_dump