Thread: Hot Standby: subxid cache changes

Hot Standby: subxid cache changes

From
Heikki Linnakangas
Date:
One independent change included in the Hot Standby patch is the change
to the way subtransaction cache works. With the patch, only
subtransactions that don't fit in the subxid cache in PGPROC are marked
in pg_subtrans. To make that work, XidInMVCCSnapshot() always scans the
subxid array in the snapshot, while currently it's only used if none of
the subxid caches have overflowed. Attached is a patch for that,
extracted from the latest hot standby patch.

So far so good, but what about all the other callers of
SubTransGetParent()? For example, XactLockTableWait will fail an
assertion if asked to wait on a subtransaction which is then released.

It occurs to me that we don't need this patch for hot standby if we
abuse the main xid array (SnapshotData.xip) to store the unobserved xids
instead of the subxid array. That one is always scanned in
XidInMVCCSnapshot. I think we should do that rather than try to salvage
this patch.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/README
--- b/src/backend/access/transam/README
***************
*** 198,204 **** parent.  This maintains the invariant that child transactions have XIDs later
  than their parents, which is assumed in a number of places.

  The subsidiary actions of obtaining a lock on the XID and and entering it into
! pg_subtrans and PG_PROC are done at the time it is assigned.

  A transaction that has no XID still needs to be identified for various
  purposes, notably holding locks.  For this purpose we assign a "virtual
--- 198,204 ----
  than their parents, which is assumed in a number of places.

  The subsidiary actions of obtaining a lock on the XID and and entering it into
! PG_PROC and, in some cases, pg_subtrans are done at the time it is assigned.

  A transaction that has no XID still needs to be identified for various
  purposes, notably holding locks.  For this purpose we assign a "virtual
***************
*** 376,382 **** but since we allow arbitrary nesting of subtransactions, we can't fit all Xids
  in shared memory, so we have to store them on disk.  Note, however, that for
  each transaction we keep a "cache" of Xids that are known to be part of the
  transaction tree, so we can skip looking at pg_subtrans unless we know the
! cache has been overflowed.  See storage/ipc/procarray.c for the gory details.

  slru.c is the supporting mechanism for both pg_clog and pg_subtrans.  It
  implements the LRU policy for in-memory buffer pages.  The high-level routines
--- 376,384 ----
  in shared memory, so we have to store them on disk.  Note, however, that for
  each transaction we keep a "cache" of Xids that are known to be part of the
  transaction tree, so we can skip looking at pg_subtrans unless we know the
! cache has been overflowed.  In 8.4 we skip updating pg_subtrans unless the
! cache has overflowed for that transaction, considerably reducing pg_subtrans
! activity. See storage/ipc/procarray.c for the gory details.

  slru.c is the supporting mechanism for both pg_clog and pg_subtrans.  It
  implements the LRU policy for in-memory buffer pages.  The high-level routines
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***************
*** 415,421 **** AssignTransactionId(TransactionState s)
       */
      s->transactionId = GetNewTransactionId(isSubXact);

!     if (isSubXact)
          SubTransSetParent(s->transactionId, s->parent->transactionId);

      /*
--- 415,428 ----
       */
      s->transactionId = GetNewTransactionId(isSubXact);

!     /*
!      * If we have overflowed the subxid cache then we must mark subtrans
!      * with the parent xid. Prior to 8.4 we marked subtrans for each
!      * subtransaction, though that is no longer necessary because the
!      * way snapshots are searched in XidInMVCCSnapshot() has changed to
!      * allow searching of both subxid cache and subtrans, not either/or.
!      */
!     if (isSubXact && MyProc->subxids.overflowed)
          SubTransSetParent(s->transactionId, s->parent->transactionId);

      /*
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 680,685 **** GetSnapshotData(Snapshot snapshot)
--- 680,686 ----
      int            index;
      int            count = 0;
      int            subcount = 0;
+     bool        suboverflowed = false;

      Assert(snapshot != NULL);

***************
*** 771,781 **** GetSnapshotData(Snapshot snapshot)
          }

          /*
!          * Save subtransaction XIDs if possible (if we've already overflowed,
!          * there's no point).  Note that the subxact XIDs must be later than
!          * their parent, so no need to check them against xmin.  We could
!          * filter against xmax, but it seems better not to do that much work
!          * while holding the ProcArrayLock.
           *
           * The other backend can add more subxids concurrently, but cannot
           * remove any.    Hence it's important to fetch nxids just once. Should
--- 772,782 ----
          }

          /*
!          * Save subtransaction XIDs, whether or not we have overflowed.
!          * Note that the subxact XIDs must be later than their parent, so no
!          * need to check them against xmin.  We could filter against xmax,
!          * but it seems better not to do that much work while holding the
!          * ProcArrayLock.
           *
           * The other backend can add more subxids concurrently, but cannot
           * remove any.    Hence it's important to fetch nxids just once. Should
***************
*** 784,807 **** GetSnapshotData(Snapshot snapshot)
           *
           * Again, our own XIDs are not included in the snapshot.
           */
!         if (subcount >= 0 && proc != MyProc)
!         {
!             if (proc->subxids.overflowed)
!                 subcount = -1;    /* overflowed */
!             else
              {
                  int            nxids = proc->subxids.nxids;

                  if (nxids > 0)
                  {
                      memcpy(snapshot->subxip + subcount,
                             (void *) proc->subxids.xids,
                             nxids * sizeof(TransactionId));
                      subcount += nxids;
                  }
              }
          }
-     }

      if (!TransactionIdIsValid(MyProc->xmin))
          MyProc->xmin = TransactionXmin = xmin;
--- 785,807 ----
           *
           * Again, our own XIDs are not included in the snapshot.
           */
!         if (proc != MyProc)
              {
                  int            nxids = proc->subxids.nxids;

                  if (nxids > 0)
                  {
+                     if (proc->subxids.overflowed)
+                         suboverflowed = true;
+
                      memcpy(snapshot->subxip + subcount,
                             (void *) proc->subxids.xids,
                             nxids * sizeof(TransactionId));
                      subcount += nxids;
                  }
+
              }
          }

      if (!TransactionIdIsValid(MyProc->xmin))
          MyProc->xmin = TransactionXmin = xmin;
***************
*** 824,829 **** GetSnapshotData(Snapshot snapshot)
--- 824,830 ----
      snapshot->xmax = xmax;
      snapshot->xcnt = count;
      snapshot->subxcnt = subcount;
+     snapshot->suboverflowed = suboverflowed;

      snapshot->curcid = GetCurrentCommandId(false);

*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 1238,1263 **** XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
          return true;

      /*
!      * If the snapshot contains full subxact data, the fastest way to check
!      * things is just to compare the given XID against both subxact XIDs and
!      * top-level XIDs.    If the snapshot overflowed, we have to use pg_subtrans
!      * to convert a subxact XID to its parent XID, but then we need only look
!      * at top-level XIDs not subxacts.
       */
-     if (snapshot->subxcnt >= 0)
-     {
-         /* full data, so search subxip */
-         int32        j;
-
-         for (j = 0; j < snapshot->subxcnt; j++)
-         {
-             if (TransactionIdEquals(xid, snapshot->subxip[j]))
-                 return true;
-         }

!         /* not there, fall through to search xip[] */
      }
!     else
      {
          /* overflowed, so convert xid to top-level */
          xid = SubTransGetTopmostTransaction(xid);
--- 1238,1283 ----
          return true;

      /*
!      * Our strategy for checking xids changed in 8.4. Prior to 8.4
!      * we either checked the subxid cache on the snapshot or we
!      * checked subtrans. That was much more efficient than just using
!      * subtrans but it has some problems. First, as soon as *any*
!      * transaction had more than 64 transactions we forced *all*
!      * snapshots to check against subtrans, giving a sharp modal
!      * change in behaviour. Second because we either checked subtrans
!      * or the snapshot, we were forced to place entries in subtrans
!      * in case the snapshot later overflowed, even if we never
!      * actually checked subtrans.
!      *
!      * In 8.4 we improve on that scheme in a number of ways. As before
!      * we check subtrans if the snapshot has overflowed. We *also*
!      * check the subxid cache. This has two benefits: first the
!      * behaviour degrades gracefully when the cache overflows, so we
!      * retain much of its benefit if it has only just overflowed.
!      * Second, a transaction doesn't need to insert entries into
!      * subtrans until its own personal subxid cache overflows. This
!      * means entries into subtrans become significantly rarer,
!      * perhaps less than 1% of the previous insert rate, giving
!      * considerable benefit for transactions using only a few
!      * subtransactions.
       */

!     /*
!      * First, compare the given XID against cached subxact XIDs.
!      */
!     for (i = 0; i < snapshot->subxcnt; i++)
!     {
!         if (TransactionIdEquals(xid, snapshot->subxip[i]))
!             return true;
      }
!
!     /*
!      * If the snapshot overflowed and we haven't already located the xid
!      * we also have to consult pg_subtrans. We use subtrans to convert a
!      * subxact XID to its parent XID, so that we can then check the status
!      * of the top-level TransactionId.
!      */
!     if (snapshot->suboverflowed)
      {
          /* overflowed, so convert xid to top-level */
          xid = SubTransGetTopmostTransaction(xid);
***************
*** 1270,1275 **** XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
--- 1290,1299 ----
              return false;
      }

+     /*
+      * By now xid is either not present, or a top-level xid. So now
+      * we just need to check the main transaction ids.
+      */
      for (i = 0; i < snapshot->xcnt; i++)
      {
          if (TransactionIdEquals(xid, snapshot->xip[i]))
*** a/src/include/utils/snapshot.h
--- b/src/include/utils/snapshot.h
***************
*** 49,55 **** typedef struct SnapshotData
      uint32        xcnt;            /* # of xact ids in xip[] */
      TransactionId *xip;            /* array of xact IDs in progress */
      /* note: all ids in xip[] satisfy xmin <= xip[i] < xmax */
!     int32        subxcnt;        /* # of xact ids in subxip[], -1 if overflow */
      TransactionId *subxip;        /* array of subxact IDs in progress */

      /*
--- 49,65 ----
      uint32        xcnt;            /* # of xact ids in xip[] */
      TransactionId *xip;            /* array of xact IDs in progress */
      /* note: all ids in xip[] satisfy xmin <= xip[i] < xmax */
!
!     /*
!      * Prior to 8.4 we represented an overflowed subxid cache with subxcnt -1.
!      * In 8.4+ we separate the two concepts because when checking the xids
!      * in the snapshot we check *both* subxid cache and subtrans, if subxid
!      * cache has overflowed. So we still need the count, even if overflowed.
!      * We do this to allow unobserved xids to be placed into the snapshot
!      * even when snapshot overflows. It is also a performance gain.
!      */
!     uint32        subxcnt;        /* # of xact ids in subxip[] */
!     bool        suboverflowed;    /* true means at least one subxid cache overflowed */
      TransactionId *subxip;        /* array of subxact IDs in progress */

      /*

Re: Hot Standby: subxid cache changes

From
Simon Riggs
Date:
On Thu, 2009-02-12 at 09:50 +0200, Heikki Linnakangas wrote:

> It occurs to me that we don't need this patch for hot standby if we 
> abuse the main xid array (SnapshotData.xip) to store the unobserved xids 
> instead of the subxid array. That one is always scanned in 
> XidInMVCCSnapshot. I think we should do that rather than try to salvage 
> this patch.

At this stage, yes.

> So far so good, but what about all the other callers of 
> SubTransGetParent()? For example, XactLockTableWait will fail an 
> assertion if asked to wait on a subtransaction which is then released.

I agree that it could fail the assertion, though it is clear that the
assertion should now be removed.

The logic is: if there is no lock table entry for that xid *and* it is
not in progress *and* it is not in pg_subtrans, then it must have been
an aborted subtransaction of a currently active xact or it has otherwise
completed.

I think we can rework the other aspects also.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: subxid cache changes

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-02-12 at 09:50 +0200, Heikki Linnakangas wrote:
>> So far so good, but what about all the other callers of 
>> SubTransGetParent()? For example, XactLockTableWait will fail an 
>> assertion if asked to wait on a subtransaction which is then released.
> 
> I agree that it could fail the assertion, though it is clear that the
> assertion should now be removed.

No, then you just get an infinite loop instead, trying to get the parent 
of 0 over and over again.

> The logic is: if there is no lock table entry for that xid *and* it is
> not in progress *and* it is not in pg_subtrans, then it must have been
> an aborted subtransaction of a currently active xact or it has otherwise
> completed.

Right, we got it right that far. But after the subtransaction has 
completed, the question is: what's its parent? That's what the patch got 
wrong.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby: subxid cache changes

From
Simon Riggs
Date:
On Thu, 2009-02-12 at 14:23 +0200, Heikki Linnakangas wrote: 
> Simon Riggs wrote:
> > On Thu, 2009-02-12 at 09:50 +0200, Heikki Linnakangas wrote:
> >> So far so good, but what about all the other callers of 
> >> SubTransGetParent()? For example, XactLockTableWait will fail an 
> >> assertion if asked to wait on a subtransaction which is then released.
> > 
> > I agree that it could fail the assertion, though it is clear that the
> > assertion should now be removed.
> 
> No, then you just get an infinite loop instead, trying to get the parent 
> of 0 over and over again.

There is no infinite loop. Try it, or read TransactionIdIsInProgress().

> > The logic is: if there is no lock table entry for that xid *and* it is
> > not in progress *and* it is not in pg_subtrans, then it must have been
> > an aborted subtransaction of a currently active xact or it has otherwise
> > completed.
> 
> Right, we got it right that far. But after the subtransaction has 
> completed, the question is: what's its parent? That's what the patch got 
> wrong.

We can find that out from procarray, since a subcommitted xid will still
be present in the subxid cache of its parent (by definition, otherwise
it will be marked in pg_subtrans). 

It will be quicker to fix that than to make other changes.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: subxid cache changes

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-02-12 at 14:23 +0200, Heikki Linnakangas wrote: 
>> Simon Riggs wrote:
>>> On Thu, 2009-02-12 at 09:50 +0200, Heikki Linnakangas wrote:
>>>> So far so good, but what about all the other callers of 
>>>> SubTransGetParent()? For example, XactLockTableWait will fail an 
>>>> assertion if asked to wait on a subtransaction which is then released.
>>> I agree that it could fail the assertion, though it is clear that the
>>> assertion should now be removed.
>> No, then you just get an infinite loop instead, trying to get the parent 
>> of 0 over and over again.
> 
> There is no infinite loop. Try it, or read TransactionIdIsInProgress().

I did, my CPU was pegged at 100%. Hmm, attaching with a debugger shows 
that it's not looping within XactLockTableWait as I assumed. Instead, 
XactLockTableWait returns without waiting on the parent, so we get into 
an busy loop in _bt_do_insert, trying to wait on the transaction over 
and over again.

>>> The logic is: if there is no lock table entry for that xid *and* it is
>>> not in progress *and* it is not in pg_subtrans, then it must have been
>>> an aborted subtransaction of a currently active xact or it has otherwise
>>> completed.
>> Right, we got it right that far. But after the subtransaction has 
>> completed, the question is: what's its parent? That's what the patch got 
>> wrong.
> 
> We can find that out from procarray, since a subcommitted xid will still
> be present in the subxid cache of its parent (by definition, otherwise
> it will be marked in pg_subtrans). 

Unless the top transaction just committed. Looking at the other callers 
of SubTransGetParent, I think it would introduce a race condition to 
TransactionIdDidAbort and TransactionIdDidCommit.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby: subxid cache changes

From
Simon Riggs
Date:
On Fri, 2009-02-13 at 10:55 +0200, Heikki Linnakangas wrote:

> >>> The logic is: if there is no lock table entry for that xid *and* it is
> >>> not in progress *and* it is not in pg_subtrans, then it must have been
> >>> an aborted subtransaction of a currently active xact or it has otherwise
> >>> completed.
> >> Right, we got it right that far. But after the subtransaction has 
> >> completed, the question is: what's its parent? That's what the patch got 
> >> wrong.
> > 
> > We can find that out from procarray, since a subcommitted xid will still
> > be present in the subxid cache of its parent (by definition, otherwise
> > it will be marked in pg_subtrans). 
> 
> Unless the top transaction just committed. Looking at the other callers 
> of SubTransGetParent, I think it would introduce a race condition to 
> TransactionIdDidAbort and TransactionIdDidCommit.

I don't see a race condition, but we would need to add a clog recheck if
the xid was not found in procarray in the TransactionIdDid...()
functions.

SubTransGetParent() is just a way of getting the next xid to wait on. If
the xid has been removed from procarray, we know we can recheck clog to
find an authoritative answer because clog is always fully marked before
we remove from procarray.

We need three changes:
* in DidCommit/Abort
if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED){    ...other code...
    parentXid = SubTransGetParent(transactionId);    if (!TransactionIdIsValid(parentXid))
TransactionIdDidCommit(transactionId);   return TransactionIdDidCommit(parentXid);}
 


* bottom of XactLockTableWait becomes
/* we may find xid has completed just before we check */xid = SubTransGetParent(xid);if (!TransactionIdIsValid(xid))
break;

* SubTransGetParent() has some extra code to check procarray if
pg_subtrans returns 0. It would be better to refactor that

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: subxid cache changes

From
Simon Riggs
Date:
On Thu, 2009-02-12 at 09:50 +0200, Heikki Linnakangas wrote:

> It occurs to me that we don't need this patch for hot standby if we 
> abuse the main xid array (SnapshotData.xip) to store the unobserved xids 
> instead of the subxid array. That one is always scanned in 
> XidInMVCCSnapshot. I think we should do that rather than try to salvage 
> this patch.

I think my proposal still holds water, but I also think it is probably
time to say OK, let's make this simpler and take the subxid tuning off
line.

We would need to increase the max size of the xip array by
2*max_connections. So an increase of 80kB on normal running, which I can
accept. 

Is that the only change you are suggesting to resolve this? I hope so.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: subxid cache changes

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> I think my proposal still holds water, but I also think it is probably
> time to say OK, let's make this simpler and take the subxid tuning off
> line.

Agreed.

> We would need to increase the max size of the xip array by
> 2*max_connections. So an increase of 80kB on normal running, which I can
> accept. 

You only need the bigger xip array while in hot standby mode. Backends 
starting after the recovery is done can use just max_connections. And 
you were already allocating a bigger subxip array, so the net effect is nil.

> Is that the only change you are suggesting to resolve this? I hope so.

Yes.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com