Re: foreign key locks, 2nd attempt - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: foreign key locks, 2nd attempt
Date
Msg-id 1323808901-sup-7243@alvh.no-ip.org
Whole thread Raw
In response to Re: foreign key locks, 2nd attempt  (Noah Misch <noah@leadboat.com>)
Responses Re: foreign key locks, 2nd attempt
Re: foreign key locks, 2nd attempt
List pgsql-hackers
Excerpts from Noah Misch's message of dom dic 04 09:20:27 -0300 2011:

> > +    /*
> > +     * If the tuple we're updating is locked, we need to preserve this in the
> > +     * new tuple's Xmax as well as in the old tuple.  Prepare the new xmax
> > +     * value for these uses.
> > +     *
> > +     * Note there cannot be an xmax to save if we're changing key columns; in
> > +     * this case, the wait above should have only returned when the locking
> > +     * transactions finished.
> > +     */
> > +    if (TransactionIdIsValid(keep_xmax))
> > +    {
> > +        if (keep_xmax_multi)
> > +        {
> > +            keep_xmax_old = MultiXactIdExpand(keep_xmax,
> > +                                              xid, MultiXactStatusUpdate);
> > +            keep_xmax_infomask = HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_IS_MULTI;
>
> Not directly related to this line, but is the HEAP_IS_NOT_UPDATE bit getting
> cleared where needed?

AFAICS it's reset along the rest of the HEAP_LOCK_BITS when the tuple is
modified.

> > @@ -2725,11 +2884,20 @@ l2:
> >          oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |
> >                                         HEAP_XMAX_INVALID |
> >                                         HEAP_XMAX_IS_MULTI |
> > -                                       HEAP_IS_LOCKED |
> > +                                       HEAP_LOCK_BITS |
> >                                         HEAP_MOVED);
> > +        oldtup.t_data->t_infomask2 &= ~HEAP_UPDATE_KEY_INTACT;
> >          HeapTupleClearHotUpdated(&oldtup);
> >          /* ... and store info about transaction updating this tuple */
> > -        HeapTupleHeaderSetXmax(oldtup.t_data, xid);
> > +        if (TransactionIdIsValid(keep_xmax_old))
> > +        {
> > +            HeapTupleHeaderSetXmax(oldtup.t_data, keep_xmax_old);
> > +            oldtup.t_data->t_infomask |= keep_xmax_old_infomask;
> > +        }
> > +        else
> > +            HeapTupleHeaderSetXmax(oldtup.t_data, xid);
> > +        if (key_intact)
> > +            oldtup.t_data->t_infomask2 |= HEAP_UPDATE_KEY_INTACT;
> >          HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
> >          /* temporarily make it look not-updated */
> >          oldtup.t_data->t_ctid = oldtup.t_self;
>
> Shortly after this, we release the content lock and go off toasting the tuple
> and finding free space.  When we come back, could the old tuple have
> accumulated additional KEY SHARE locks that we need to re-copy?

Yeah, I've been wondering about this: do we have a problem already with
exclusion constraints?  I mean, if a concurrent inserter doesn't see the
tuple that we've marked here as deleted while we toast it, it could
result in a violated constraint, right?  I haven't built a test case to
prove it.


> > @@ -3231,30 +3462,70 @@ l3:
> >      {
> >          TransactionId xwait;
> >          uint16        infomask;
> > +        uint16        infomask2;
> > +        bool        require_sleep;
> >
> >          /* must copy state data before unlocking buffer */
> >          xwait = HeapTupleHeaderGetXmax(tuple->t_data);
> >          infomask = tuple->t_data->t_infomask;
> > +        infomask2 = tuple->t_data->t_infomask2;
> >
> >          LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
> >
> >          /*
> > -         * If we wish to acquire share lock, and the tuple is already
> > -         * share-locked by a multixact that includes any subtransaction of the
> > -         * current top transaction, then we effectively hold the desired lock
> > -         * already.  We *must* succeed without trying to take the tuple lock,
> > -         * else we will deadlock against anyone waiting to acquire exclusive
> > -         * lock.  We don't need to make any state changes in this case.
> > +         * If we wish to acquire share or key lock, and the tuple is already
> > +         * key or share locked by a multixact that includes any subtransaction
> > +         * of the current top transaction, then we effectively hold the desired
> > +         * lock already (except if we own key share lock and now desire share
> > +         * lock).  We *must* succeed without trying to take the tuple lock,
>
> This can now apply to FOR UPDATE as well.
>
> For the first sentence, I suggest the wording "If any subtransaction of the
> current top transaction already holds a stronger lock, we effectively hold the
> desired lock already."

I settled on this:
    /*     * If any subtransaction of the current top transaction already holds a     * lock as strong or stronger than
whatwe're requesting, we     * effectively hold the desired lock already.  We *must* succeed     * without trying to
takethe tuple lock, else we will deadlock against     * anyone wanting to acquire a stronger lock.     */    if
(infomask& HEAP_XMAX_IS_MULTI)    {        int        i;        int        nmembers;        MultiXactMember *members;
    MultiXactStatus cutoff = get_mxact_status_for_tuplelock(mode); 
        nmembers = GetMultiXactIdMembers(xwait, &members);
        for (i = 0; i < nmembers; i++)        {            if (TransactionIdIsCurrentTransactionId(members[i].xid))
      {                if (members[i].status >= cutoff)                {                    if (have_tuple_lock)
               UnlockTupleTuplock(relation, tid, mode);                        pfree(members);
returnHeapTupleMayBeUpdated;                }            }        } 
        pfree(members);    }

Now, I can't see the reason that we didn't previously consider locks "as
strong as what we're requesting" ... but surely it's the same case?

> > +         * else we will deadlock against anyone wanting to acquire a stronger
> > +         * lock.
>
> > +         *
> > +         * FIXME -- we don't do the below currently, but I think we should:
> > +         *
> > +         * We update the Xmax with a new MultiXactId to include the new lock
> > +         * mode in this case.
> > +         *
> > +         * Note that since we want to alter the Xmax, we need to re-acquire the
> > +         * buffer lock.  The xmax could have changed in the meantime, so we
> > +         * recheck it in that case, but we keep the buffer lock while doing it
> > +         * to prevent starvation.  The second time around we know we must be
> > +         * part of the MultiXactId in any case, which is why we don't need to
> > +         * go back to recheck HeapTupleSatisfiesUpdate.  Also, after we
> > +         * re-acquire lock, the MultiXact is likely to (but not necessarily) be
> > +         * the same that we see here, so it should be in multixact's cache and
> > +         * thus quick to obtain.
>
> What is the benefit of doing so?

After thinking more about it, I think it's bogus.  I've removed it.


> Incidentally, why is this level of xlog detail needed for tuple locks?  We
> need an FPI of the page before the lock-related changes start scribbling on
> it, and we need to log any xid, even that of a locker, that could land in the
> heap on disk.  But, why do we actually need to replay each lock?

Uhm.  I remember thinking that a hot standby replica needed it ...

> > + * slightly incorrect, because lockers whose status did not conflict with ours
> > + * are not even considered and so might have gone away anyway.
> >   *
> >   * But by the time we finish sleeping, someone else may have changed the Xmax
> >   * of the containing tuple, so the caller needs to iterate on us somehow.
> >   */
> >  void
> > -MultiXactIdWait(MultiXactId multi)
> > +MultiXactIdWait(MultiXactId multi, MultiXactStatus status, int *remaining)
>
> This function should probably move (with a new name) to heapam.c (or maybe
> lmgr.c, in part).  It's an abstraction violation to have multixact.c knowing
> about lock conflict tables.  multixact.c should be marshalling those two bits
> alongside each xid without any deep knowledge of their meaning.

Interesting thought.


> >      /*
> > -     * Also truncate MultiXactMember at the previously determined offset.
> > +     * FIXME there's a race condition here: somebody might have created a new
> > +     * segment after we finished scanning the dir.  That scenario would leave
> > +     * us with an invalid truncateXid in shared memory, which is not an easy
> > +     * situation to get out of.  Needs more thought.
>
> Agreed.  Not sure.
>
> Broadly, this feels like a lot of code to handle truncating the segments, but
> I don't know how to simplify it.

It is a lot of code.  And it took me quite a while to even figure out
how to do it.  I don't see any other way to go about it.


> > +        xmax = HeapTupleGetUpdateXid(tuple);
> > +        if (TransactionIdIsCurrentTransactionId(xmax))
> > +        {
> > +            if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
> > +                return true;    /* deleted after scan started */
> > +            else
> > +                return false;    /* deleted before scan started */
> > +        }
> > +        if (TransactionIdIsInProgress(xmax))
> > +            return true;
> > +        if (TransactionIdDidCommit(xmax))
> > +        {
> > +            SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED, xmax);
> > +            /* updating transaction committed, but when? */
> > +            if (XidInMVCCSnapshot(xmax, snapshot))
> > +                return true;    /* treat as still in progress */
> > +            return false;
> > +        }
>
> In both HEAP_XMAX_MULTI conditional blocks, you do not set HEAP_XMAX_INVALID
> for an aborted updater.  What is the new meaning of HEAP_XMAX_INVALID for
> multixacts?  What implications would arise if we instead had it mean that the
> updating xid is aborted?  That would allow us to get the mid-term performance
> benefit of the hint bit when the updating xid spills into a multixact, and it
> would reduce code duplication in this function.

Well, HEAP_XMAX_INVALID means the Xmax is not valid, period.  If there's
a multi whose updater is aborted, there's still a multi that needs to be
checked in various places, so we cannot set that bit.


> I did not review the other tqual.c changes.  Could you summarize how the
> changes to the other functions must differ from the changes to
> HeapTupleSatisfiesMVCC()?

I don't think they should differ in any significant way ... if they do,
it's probably bogus.  Only HeapTupleSatisfiesVacuum should differ
significantly, because it's a world on its own.


> > --- a/src/bin/pg_resetxlog/pg_resetxlog.c
> > +++ b/src/bin/pg_resetxlog/pg_resetxlog.c
> > @@ -332,6 +350,11 @@ main(int argc, char *argv[])
> >      if (set_mxoff != -1)
> >          ControlFile.checkPointCopy.nextMultiOffset = set_mxoff;
> >
> > +    /*
> > +    if (set_mxfreeze != -1)
> > +        ControlFile.checkPointCopy.mxactFreezeXid = set_mxfreeze;
> > +        */
> > +
> >      if (minXlogTli > ControlFile.checkPointCopy.ThisTimeLineID)
> >          ControlFile.checkPointCopy.ThisTimeLineID = minXlogTli;
> >
> > @@ -578,6 +601,10 @@ PrintControlValues(bool guessed)
> >             ControlFile.checkPointCopy.nextMulti);
> >      printf(_("Latest checkpoint's NextMultiOffset:  %u\n"),
> >             ControlFile.checkPointCopy.nextMultiOffset);
> > +    /*
> > +    printf(_("Latest checkpoint's MultiXact freezeXid: %u\n"),
> > +           ControlFile.checkPointCopy.mxactFreezeXid);
> > +           */
>
> Should these changes be live?  They look reasonable at first glance.

Oh, I forgot about these.  Yeah, these need to be live, but not in the
exact for they have here; there were some tweaks I needed to do IIRC.

> >  /*
> > + * A tuple is only locked (i.e. not updated by its Xmax) if it the Xmax is not
> > + * a multixact and it has either the EXCL_LOCK or KEYSHR_LOCK bits set, or if
> > + * the xmax is a multi that doesn't contain an update.
> > + *
> > + * Beware of multiple evaluation of arguments.
> > + */
> > +#define HeapTupleHeaderInfomaskIsLocked(infomask) \
> > +    ((!((infomask) & HEAP_XMAX_IS_MULTI) && \
> > +      (infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) || \
> > +     (((infomask) & HEAP_XMAX_IS_MULTI) && ((infomask) & HEAP_XMAX_IS_NOT_UPDATE)))
> > +
> > +#define HeapTupleHeaderIsLocked(tup) \
> > +    HeapTupleHeaderInfomaskIsLocked((tup)->t_infomask)
>
> I'm uneasy having a HeapTupleHeaderIsLocked() that returns false when a tuple
> is both updated and KEY SHARE-locked.  Perhaps HeapTupleHeaderIsUpdated() with
> the opposite meaning, or HeapTupleHeaderIsOnlyLocked()?

I had the IsOnlyLocked thought too.  I will go that route.

(I changed HeapTupleHeaderGetXmax to GetRawXmax, thanks for that
suggestion)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WIP: URI connection string support for libpq
Next
From: Greg Smith
Date:
Subject: Re: Configuration include directory