Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit - Mailing list pgsql-patches

(moved to pgsql-patches, as there's a patch attached)

Tom Lane wrote:
> Getting rid of the linked-list representation would be a win in a couple
> of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
> and xactGetCommittedChildren would go away.  OTOH AtSubCommit_childXids
> would get more expensive.

I couldn't let this case go, so I wrote a patch. I replaced the linked
list with an array that's enlarged at AtSubCommit_childXids when necessary.

I couldn't measure any performance hit from the extra work of enlarging
and memcpying the array. I tried two test cases:

1. Insert one row per subtransaction, and commit the subtransaction
after each insert. This is like the OP's case.

         printf("CREATE TABLE foo (id int4);\n");
         printf("BEGIN;\n");
         for(i=1; i <= 100000; i++)
         {
                 printf("SAVEPOINT sp%d;\n", i);
                 printf("INSERT INTO foo VALUES (1);\n");
                 printf("RELEASE SAVEPOINT sp%d;\n", i);
         }
         printf("COMMIT;\n");

2. Insert one row per subtransaction, but leave the subtransaction in
not-committed state

         printf("CREATE TABLE foo (id int4, t text);\n");
         printf("BEGIN;\n");
         for(i=1; i <= 100000; i++)
         {
                 printf("SAVEPOINT sp%d;\n", i);
                 printf("INSERT INTO foo VALUES (1, 'f');\n");
         }
         printf("COMMIT;\n");

Test case 1 is not bad, because we just keep appending to the parent
array one at a time. Test case 2 might become slower, as the number of
subtransactions increases, as at the commit of each subtransaction you
need to enlarge the parent array and copy all the already-committed
childxids to it. However, you hit the limit on amount of shared mem
required for holding the per-xid locks before that becomes a problem,
and the performance becomes dominated by other things anyway (notably
LockReassignCurrentOwner).

I initially thought that using a single palloc'd array to hold all the
XIDs would introduce a new limit on the number committed
subtransactions, thanks to MaxAllocSize, but that's not the case.
Without patch, we actually allocate an array like that anyway in
xactGetCommittedChildren.

Elsewhere in our codebase where we use arrays that are enlarged as
needed, we keep track of the "allocated" size and the "used" size of the
array separately, and only call repalloc when the array fills up, and
repalloc a larger than necessary array when it does. I chose to just
call repalloc every time instead, as repalloc is smart enough to fall
out quickly if the chunk the allocation was made in is already larger
than the new size. There might be some gain avoiding the repeated
repalloc calls, but I doubt it's worth the code complexity, and calling
repalloc with a larger than necessary size can actually force it to
unnecessarily allocate a new, larger chunk instead of reusing the old
one. Thoughts on that?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.258
diff -c -r1.258 xact.c
*** src/backend/access/transam/xact.c    4 Mar 2008 19:54:06 -0000    1.258
--- src/backend/access/transam/xact.c    11 Mar 2008 12:31:28 -0000
***************
*** 130,136 ****
      int            gucNestLevel;    /* GUC context nesting depth */
      MemoryContext curTransactionContext;        /* my xact-lifetime context */
      ResourceOwner curTransactionOwner;    /* my query resources */
!     List       *childXids;        /* subcommitted child XIDs */
      Oid            prevUser;        /* previous CurrentUserId setting */
      bool        prevSecDefCxt;    /* previous SecurityDefinerContext setting */
      bool        prevXactReadOnly;        /* entry-time xact r/o state */
--- 130,137 ----
      int            gucNestLevel;    /* GUC context nesting depth */
      MemoryContext curTransactionContext;        /* my xact-lifetime context */
      ResourceOwner curTransactionOwner;    /* my query resources */
!     TransactionId    *childXids;    /* subcommitted child XIDs, in XID order */
!     int            nChildXids;        /* # of subcommitted child XIDs */
      Oid            prevUser;        /* previous CurrentUserId setting */
      bool        prevSecDefCxt;    /* previous SecurityDefinerContext setting */
      bool        prevXactReadOnly;        /* entry-time xact r/o state */
***************
*** 156,162 ****
      0,                            /* GUC context nesting depth */
      NULL,                        /* cur transaction context */
      NULL,                        /* cur transaction resource owner */
!     NIL,                        /* subcommitted child Xids */
      InvalidOid,                    /* previous CurrentUserId setting */
      false,                        /* previous SecurityDefinerContext setting */
      false,                        /* entry-time xact r/o state */
--- 157,164 ----
      0,                            /* GUC context nesting depth */
      NULL,                        /* cur transaction context */
      NULL,                        /* cur transaction resource owner */
!     NULL,                        /* subcommitted child Xids */
!     0,                            /* # of subcommitted child Xids */
      InvalidOid,                    /* previous CurrentUserId setting */
      false,                        /* previous SecurityDefinerContext setting */
      false,                        /* entry-time xact r/o state */
***************
*** 559,565 ****
       */
      for (s = CurrentTransactionState; s != NULL; s = s->parent)
      {
!         ListCell   *cell;

          if (s->state == TRANS_ABORT)
              continue;
--- 561,567 ----
       */
      for (s = CurrentTransactionState; s != NULL; s = s->parent)
      {
!         int low, high;

          if (s->state == TRANS_ABORT)
              continue;
***************
*** 567,576 ****
              continue;            /* it can't have any child XIDs either */
          if (TransactionIdEquals(xid, s->transactionId))
              return true;
!         foreach(cell, s->childXids)
          {
!             if (TransactionIdEquals(xid, lfirst_xid(cell)))
                  return true;
          }
      }

--- 569,594 ----
              continue;            /* it can't have any child XIDs either */
          if (TransactionIdEquals(xid, s->transactionId))
              return true;
!
!         /*
!          * Return true for any committed child of this subtransaction. As
!          * the childXids array is ordered, we can binary search.
!          */
!         low = 0;
!         high = s->nChildXids - 1;
!         while (low <= high)
          {
!             int                middle;
!             TransactionId    probe;
!
!             middle = low + (high - low) / 2;
!             probe = s->childXids[middle];
!             if (TransactionIdEquals(probe, xid))
                  return true;
+             else if (TransactionIdPrecedes(probe, xid))
+                 low = middle + 1;
+             else
+                 high = middle - 1;
          }
      }

***************
*** 1068,1073 ****
--- 1086,1093 ----
  {
      TransactionState s = CurrentTransactionState;
      MemoryContext old_cxt;
+     TransactionId *new_childXids;
+     TransactionId new_nChildXids;

      Assert(s->parent != NULL);

***************
*** 1078,1098 ****
       */
      old_cxt = MemoryContextSwitchTo(TopTransactionContext);

!     s->parent->childXids = lappend_xid(s->parent->childXids,
!                                        s->transactionId);

!     if (s->childXids != NIL)
!     {
!         s->parent->childXids = list_concat(s->parent->childXids,
!                                            s->childXids);

!         /*
!          * list_concat doesn't free the list header for the second list; do so
!          * here to avoid memory leakage (kluge)
!          */
          pfree(s->childXids);
-         s->childXids = NIL;
-     }

      MemoryContextSwitchTo(old_cxt);
  }
--- 1098,1132 ----
       */
      old_cxt = MemoryContextSwitchTo(TopTransactionContext);

!     /* Enlarge the parent array so that it can hold all the new xids */
!     new_nChildXids = s->parent->nChildXids + s->nChildXids + 1;

!     if (s->parent->childXids == NULL)
!         new_childXids = palloc((new_nChildXids + 10) * sizeof(TransactionId));
!     else
!         new_childXids = repalloc(s->parent->childXids,
!                                  new_nChildXids * sizeof(TransactionId));

!     /*
!      * Copy all my XIDs to parent's array.
!      *
!      * Note: We rely on the fact that the XID of a child always follows that
!      * of its parent's. By copying the XID of this subtransaction before the
!      * XIDs of its children, we ensure that the array stays ordered. Likewise,
!      * all XIDs already in the array belong to our parent, or subtransactions
!      * started and committed before us, so their XIDs must precede ours.
!      */
!     new_childXids[s->parent->nChildXids] = s->transactionId;
!
!     if (s->nChildXids > 0)
!         memcpy(&new_childXids[s->parent->nChildXids + 1],
!                s->childXids, s->nChildXids * sizeof(TransactionId));
!
!     s->parent->childXids  = new_childXids;
!     s->parent->nChildXids = new_nChildXids;
!
!     if (s->childXids != NULL)
          pfree(s->childXids);

      MemoryContextSwitchTo(old_cxt);
  }
***************
*** 1336,1343 ****
       * AtSubCommit_childXids).    This means we'd better free the list
       * explicitly at abort to avoid leakage.
       */
!     list_free(s->childXids);
!     s->childXids = NIL;
  }

  /* ----------------------------------------------------------------
--- 1370,1378 ----
       * AtSubCommit_childXids).    This means we'd better free the list
       * explicitly at abort to avoid leakage.
       */
!     if (s->childXids != NULL)
!         pfree(s->childXids);
!     s->nChildXids = 0;
  }

  /* ----------------------------------------------------------------
***************
*** 1506,1512 ****
       */
      s->nestingLevel = 1;
      s->gucNestLevel = 1;
!     s->childXids = NIL;
      GetUserIdAndContext(&s->prevUser, &s->prevSecDefCxt);
      /* SecurityDefinerContext should never be set outside a transaction */
      Assert(!s->prevSecDefCxt);
--- 1541,1548 ----
       */
      s->nestingLevel = 1;
      s->gucNestLevel = 1;
!     s->nChildXids = 0;
!     s->childXids = NULL;
      GetUserIdAndContext(&s->prevUser, &s->prevSecDefCxt);
      /* SecurityDefinerContext should never be set outside a transaction */
      Assert(!s->prevSecDefCxt);
***************
*** 1702,1708 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with commit processing, set current transaction state back to
--- 1738,1745 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->nChildXids = 0;
!     s->childXids = NULL;

      /*
       * done with commit processing, set current transaction state back to
***************
*** 1931,1937 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with 1st phase commit processing, set current transaction state
--- 1968,1975 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->nChildXids = 0;
!     s->childXids = NULL;

      /*
       * done with 1st phase commit processing, set current transaction state
***************
*** 2101,2107 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with abort processing, set current transaction state back to
--- 2139,2146 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->nChildXids = 0;
!     s->childXids = NULL;

      /*
       * done with abort processing, set current transaction state back to
***************
*** 4051,4056 ****
--- 4090,4108 ----
  static void
  ShowTransactionStateRec(TransactionState s)
  {
+     StringInfoData buf;
+
+     initStringInfo(&buf);
+
+     if (s->nChildXids > 0)
+     {
+         int i;
+
+         appendStringInfo(&buf, "%u", s->childXids[0]);
+         for (i = 1; i < s->nChildXids; i++)
+             appendStringInfo(&buf, " %u", s->childXids[i]);
+     }
+
      if (s->parent)
          ShowTransactionStateRec(s->parent);

***************
*** 4064,4071 ****
                               (unsigned int) s->subTransactionId,
                               (unsigned int) currentCommandId,
                               currentCommandIdUsed ? " (used)" : "",
!                              s->nestingLevel,
!                              nodeToString(s->childXids))));
  }

  /*
--- 4116,4122 ----
                               (unsigned int) s->subTransactionId,
                               (unsigned int) currentCommandId,
                               currentCommandIdUsed ? " (used)" : "",
!                              s->nestingLevel, buf.data)));
  }

  /*
***************
*** 4154,4162 ****
      TransactionState s = CurrentTransactionState;
      int            nchildren;
      TransactionId *children;
-     ListCell   *p;

!     nchildren = list_length(s->childXids);
      if (nchildren == 0)
      {
          *ptr = NULL;
--- 4205,4212 ----
      TransactionState s = CurrentTransactionState;
      int            nchildren;
      TransactionId *children;

!     nchildren = s->nChildXids;
      if (nchildren == 0)
      {
          *ptr = NULL;
***************
*** 4166,4177 ****
      children = (TransactionId *) palloc(nchildren * sizeof(TransactionId));
      *ptr = children;

!     foreach(p, s->childXids)
!     {
!         TransactionId child = lfirst_xid(p);
!
!         *children++ = child;
!     }

      return nchildren;
  }
--- 4216,4222 ----
      children = (TransactionId *) palloc(nchildren * sizeof(TransactionId));
      *ptr = children;

!     memcpy(children, s->childXids, nchildren * sizeof(TransactionId));

      return nchildren;
  }
Index: src/include/nodes/pg_list.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/pg_list.h,v
retrieving revision 1.57
diff -c -r1.57 pg_list.h
*** src/include/nodes/pg_list.h    1 Jan 2008 19:45:58 -0000    1.57
--- src/include/nodes/pg_list.h    11 Mar 2008 11:56:21 -0000
***************
*** 26,36 ****
   * (At the moment, ints and Oids are the same size, but they may not
   * always be so; try to be careful to maintain the distinction.)
   *
-  * There is also limited support for lists of TransactionIds; since these
-  * are used in only one or two places, we don't provide a full implementation,
-  * but map them onto Oid lists.  This effectively assumes that TransactionId
-  * is no wider than Oid and both are unsigned types.
-  *
   *
   * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
--- 26,31 ----
***************
*** 160,171 ****
  #define list_make4_oid(x1,x2,x3,x4) lcons_oid(x1, list_make3_oid(x2, x3, x4))

  /*
-  * Limited support for lists of TransactionIds, mapped onto lists of Oids
-  */
- #define lfirst_xid(lc)                ((TransactionId) lfirst_oid(lc))
- #define lappend_xid(list, datum)    lappend_oid(list, (Oid) (datum))
-
- /*
   * foreach -
   *      a convenience macro which loops through the list
   */
--- 155,160 ----

pgsql-patches by date:

Previous
From: Zoltan Boszormenyi
Date:
Subject: Fix HAVE_LONG[_LONG]_INT_64 to really define to 1
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: TransactionIdIsInProgress() cache