Thread: Improvement of procArray.xmin for VACUUM

Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
VACUUM treats dead tuple with an xmax less than all procArray.xmin's as
invisible with space ready to be reused.

Right now, we set procArray.xmin at transaction start to the required
SERIALIZABLE value.  We currently don't update procArray.xmin when we
know we are in a READ COMMITTED transaction, even though we already call
GetSnapshotData().

This means that if a transaction completes that was active when our
transaction started, we don't update our procArray.xmin during the next
multi-statement transaction command to indicate that we don't care about
the completed transaction anymore.

I have been thinking we could improve how quickly VACUUM can expire rows
if we update procArray.xmin more frequently for non-SERIALIZABLE
transactions.

The attached patch updates procArray.xmin in this manner.  Here is an
example of how the patch improves dead row reuse:

Session #:
        1                               2               3

        CREATE TABLE test(x int);
        INSERT INTO test VALUES (1);
        BEGIN;
        DELETE FROM test;
                                        BEGIN;
                                        SELECT 1;
        COMMIT;
                                                        VACUUM VERBOSE test;
                                                        (row not reused)
                                        SELECT 1;

                                        (At this point #2 doesn't see
                                         the test row anymore.  Patch
                                         updates procArray.xmin.)

                                                        VACUUM VERBOSE test;
                                                        (row reused with patch)
                                        COMMIT;
                                                        VACUUM VERBOSE test;
                                                        (normal row reuse)

What the patch does is to recompute procArray.xmin during the second
SELECT, which is then used by the next VACUUM.

The patch has debug code that prints the procArray.xmin assignments.  I
am not sure if I have all the boolean calls to GetTransactionSnapshot()
correct.  The set_procArray_xmin parameter should be true only when we
are sure we aren't going to be resetting the session back to an earlier
snapshot.

The major affect of the patch is to set the minimum procArray.xmin to be
the earliest in-process transaction, rather than the earliest in-process
transaction at transaction start.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.280
diff -c -c -r1.280 index.c
*** src/backend/catalog/index.c    3 Mar 2007 20:08:41 -0000    1.280
--- src/backend/catalog/index.c    23 Mar 2007 19:20:44 -0000
***************
*** 1394,1400 ****
      }
      else if (indexInfo->ii_Concurrent)
      {
!         snapshot = CopySnapshot(GetTransactionSnapshot());
          OldestXmin = InvalidTransactionId;        /* not used */
      }
      else
--- 1394,1400 ----
      }
      else if (indexInfo->ii_Concurrent)
      {
!         snapshot = CopySnapshot(GetTransactionSnapshot(false));
          OldestXmin = InvalidTransactionId;        /* not used */
      }
      else
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/cluster.c    23 Mar 2007 19:20:45 -0000
***************
*** 204,210 ****
              /* Start a new transaction for each relation. */
              StartTransactionCommand();
              /* functions in indexes may want a snapshot set */
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
              cluster_rel(rvtc, true);
              CommitTransactionCommand();
          }
--- 204,210 ----
              /* Start a new transaction for each relation. */
              StartTransactionCommand();
              /* functions in indexes may want a snapshot set */
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
              cluster_rel(rvtc, true);
              CommitTransactionCommand();
          }
Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/indexcmds.c    23 Mar 2007 19:20:45 -0000
***************
*** 493,499 ****
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     snapshot = CopySnapshot(GetTransactionSnapshot());
      ActiveSnapshot = snapshot;

      /*
--- 493,499 ----
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     snapshot = CopySnapshot(GetTransactionSnapshot(false));
      ActiveSnapshot = snapshot;

      /*
***************
*** 1304,1310 ****

          StartTransactionCommand();
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
          if (reindex_relation(relid, true))
              ereport(NOTICE,
                      (errmsg("table \"%s\" was reindexed",
--- 1304,1310 ----

          StartTransactionCommand();
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
          if (reindex_relation(relid, true))
              ereport(NOTICE,
                      (errmsg("table \"%s\" was reindexed",
Index: src/backend/commands/trigger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.214
diff -c -c -r1.214 trigger.c
*** src/backend/commands/trigger.c    19 Mar 2007 23:38:29 -0000    1.214
--- src/backend/commands/trigger.c    23 Mar 2007 19:20:46 -0000
***************
*** 2645,2651 ****
       */
      events = &afterTriggers->events;
      if (events->head != NULL)
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Run all the remaining triggers.    Loop until they are all gone, just in
--- 2645,2651 ----
       */
      events = &afterTriggers->events;
      if (events->head != NULL)
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Run all the remaining triggers.    Loop until they are all gone, just in
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c    14 Mar 2007 18:48:55 -0000    1.349
--- src/backend/commands/vacuum.c    23 Mar 2007 19:20:47 -0000
***************
*** 412,418 ****
                  {
                      StartTransactionCommand();
                      /* functions in indexes may want a snapshot set */
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  }
                  else
                      old_context = MemoryContextSwitchTo(anl_context);
--- 412,418 ----
                  {
                      StartTransactionCommand();
                      /* functions in indexes may want a snapshot set */
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
                  }
                  else
                      old_context = MemoryContextSwitchTo(anl_context);
***************
*** 470,476 ****
           * necessary if we are called from autovacuum because autovacuum might
           * continue on to do an ANALYZE-only call.
           */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }

      if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess())
--- 470,476 ----
           * necessary if we are called from autovacuum because autovacuum might
           * continue on to do an ANALYZE-only call.
           */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
      }

      if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess())
***************
*** 964,970 ****
      if (vacstmt->full)
      {
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }
      else
      {
--- 964,970 ----
      if (vacstmt->full)
      {
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
      }
      else
      {
Index: src/backend/executor/functions.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v
retrieving revision 1.112
diff -c -c -r1.112 functions.c
*** src/backend/executor/functions.c    13 Mar 2007 00:33:40 -0000    1.112
--- src/backend/executor/functions.c    23 Mar 2007 19:20:48 -0000
***************
*** 300,306 ****
      else
      {
          CommandCounterIncrement();
!         snapshot = CopySnapshot(GetTransactionSnapshot());
      }

      if (IsA(es->stmt, PlannedStmt))
--- 300,306 ----
      else
      {
          CommandCounterIncrement();
!         snapshot = CopySnapshot(GetTransactionSnapshot(false));
      }

      if (IsA(es->stmt, PlannedStmt))
Index: src/backend/executor/spi.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/spi.c,v
retrieving revision 1.173
diff -c -c -r1.173 spi.c
*** src/backend/executor/spi.c    17 Mar 2007 03:15:38 -0000    1.173
--- src/backend/executor/spi.c    23 Mar 2007 19:20:48 -0000
***************
*** 997,1003 ****
      else
      {
          CommandCounterIncrement();
!         snapshot = GetTransactionSnapshot();
      }

      /*
--- 997,1003 ----
      else
      {
          CommandCounterIncrement();
!         snapshot = GetTransactionSnapshot(false);
      }

      /*
***************
*** 1530,1536 ****
                      if (read_only)
                          ActiveSnapshot = CopySnapshot(saveActiveSnapshot);
                      else
!                         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  }
                  else
                  {
--- 1530,1536 ----
                      if (read_only)
                          ActiveSnapshot = CopySnapshot(saveActiveSnapshot);
                      else
!                         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
                  }
                  else
                  {
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.34
diff -c -c -r1.34 autovacuum.c
*** src/backend/postmaster/autovacuum.c    13 Mar 2007 00:33:41 -0000    1.34
--- src/backend/postmaster/autovacuum.c    23 Mar 2007 19:20:49 -0000
***************
*** 868,874 ****
      StartTransactionCommand();

      /* functions in indexes may want a snapshot set */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Clean up any dead statistics collector entries for this DB. We always
--- 868,874 ----
      StartTransactionCommand();

      /* functions in indexes may want a snapshot set */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Clean up any dead statistics collector entries for this DB. We always
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.22
diff -c -c -r1.22 procarray.c
*** src/backend/storage/ipc/procarray.c    23 Mar 2007 03:16:39 -0000    1.22
--- src/backend/storage/ipc/procarray.c    23 Mar 2007 19:20:49 -0000
***************
*** 498,504 ****
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot, bool serializable)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
--- 498,504 ----
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
***************
*** 510,519 ****

      Assert(snapshot != NULL);

!     /* Serializable snapshot must be computed before any other... */
!     Assert(serializable ?
!            !TransactionIdIsValid(MyProc->xmin) :
!            TransactionIdIsValid(MyProc->xmin));

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
--- 510,517 ----

      Assert(snapshot != NULL);

!     /* Either we set an xmin, or we already have one.  */
!     Assert(set_procArray_xmin || TransactionIdIsValid(MyProc->xmin));

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
***************
*** 656,664 ****
          }
      }

!     if (serializable)
          MyProc->xmin = TransactionXmin = xmin;
!
      LWLockRelease(ProcArrayLock);

      /*
--- 654,668 ----
          }
      }

!     if (set_procArray_xmin)
!     {
!         if (xmin != MyProc->xmin)
!             fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin);
!         else
!             fprintf(stderr, "%x:%x: not set\n", MyProcPid, MyProc->xid);
          MyProc->xmin = TransactionXmin = xmin;
!     }
!
      LWLockRelease(ProcArrayLock);

      /*
Index: src/backend/tcop/fastpath.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v
retrieving revision 1.96
diff -c -c -r1.96 fastpath.c
*** src/backend/tcop/fastpath.c    3 Mar 2007 19:32:54 -0000    1.96
--- src/backend/tcop/fastpath.c    23 Mar 2007 19:20:49 -0000
***************
*** 308,314 ****
       * Now that we know we are in a valid transaction, set snapshot in case
       * needed by function itself or one of the datatype I/O routines.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Begin parsing the buffer contents.
--- 308,314 ----
       * Now that we know we are in a valid transaction, set snapshot in case
       * needed by function itself or one of the datatype I/O routines.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Begin parsing the buffer contents.
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.529
diff -c -c -r1.529 postgres.c
*** src/backend/tcop/postgres.c    22 Mar 2007 19:55:04 -0000    1.529
--- src/backend/tcop/postgres.c    23 Mar 2007 19:20:51 -0000
***************
*** 738,744 ****
          {
              if (needSnapshot)
              {
!                 ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  needSnapshot = false;
              }
              stmt = (Node *) pg_plan_query(query, boundParams);
--- 738,744 ----
          {
              if (needSnapshot)
              {
!                 ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
                  needSnapshot = false;
              }
              stmt = (Node *) pg_plan_query(query, boundParams);
Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.115
diff -c -c -r1.115 pquery.c
*** src/backend/tcop/pquery.c    13 Mar 2007 00:33:42 -0000    1.115
--- src/backend/tcop/pquery.c    23 Mar 2007 19:20:51 -0000
***************
*** 154,160 ****
       * Must always set snapshot for plannable queries.    Note we assume that
       * caller will take care of restoring ActiveSnapshot on exit/error.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Create the QueryDesc object
--- 154,160 ----
       * Must always set snapshot for plannable queries.    Note we assume that
       * caller will take care of restoring ActiveSnapshot on exit/error.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Create the QueryDesc object
***************
*** 489,495 ****
                  if (snapshot)
                      ActiveSnapshot = CopySnapshot(snapshot);
                  else
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

                  /*
                   * Create QueryDesc in portal's context; for the moment, set
--- 489,495 ----
                  if (snapshot)
                      ActiveSnapshot = CopySnapshot(snapshot);
                  else
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

                  /*
                   * Create QueryDesc in portal's context; for the moment, set
***************
*** 1163,1169 ****
            IsA(utilityStmt, NotifyStmt) ||
            IsA(utilityStmt, UnlistenStmt) ||
            IsA(utilityStmt, CheckPointStmt)))
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      else
          ActiveSnapshot = NULL;

--- 1163,1169 ----
            IsA(utilityStmt, NotifyStmt) ||
            IsA(utilityStmt, UnlistenStmt) ||
            IsA(utilityStmt, CheckPointStmt)))
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
      else
          ActiveSnapshot = NULL;

Index: src/backend/utils/adt/ri_triggers.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v
retrieving revision 1.92
diff -c -c -r1.92 ri_triggers.c
*** src/backend/utils/adt/ri_triggers.c    15 Mar 2007 23:12:06 -0000    1.92
--- src/backend/utils/adt/ri_triggers.c    23 Mar 2007 19:20:52 -0000
***************
*** 3283,3289 ****
      {
          CommandCounterIncrement();        /* be sure all my own work is visible */
          test_snapshot = CopySnapshot(GetLatestSnapshot());
!         crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot());
      }
      else
      {
--- 3283,3289 ----
      {
          CommandCounterIncrement();        /* be sure all my own work is visible */
          test_snapshot = CopySnapshot(GetLatestSnapshot());
!         crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot(false));
      }
      else
      {
Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.101
diff -c -c -r1.101 tqual.c
*** src/backend/utils/time/tqual.c    5 Jan 2007 22:19:47 -0000    1.101
--- src/backend/utils/time/tqual.c    23 Mar 2007 19:20:53 -0000
***************
*** 1202,1208 ****
   * the result with CopySnapshot() if it is to be used very long.
   */
  Snapshot
! GetTransactionSnapshot(void)
  {
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
--- 1202,1208 ----
   * the result with CopySnapshot() if it is to be used very long.
   */
  Snapshot
! GetTransactionSnapshot(bool set_procArray_xmin)
  {
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
***************
*** 1214,1220 ****
      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false);

      return LatestSnapshot;
  }
--- 1214,1221 ----
      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData,
!     /* we know we are not Serializable */ set_procArray_xmin);

      return LatestSnapshot;
  }
Index: src/include/utils/tqual.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v
retrieving revision 1.65
diff -c -c -r1.65 tqual.h
*** src/include/utils/tqual.h    5 Jan 2007 22:19:59 -0000    1.65
--- src/include/utils/tqual.h    23 Mar 2007 19:20:54 -0000
***************
*** 142,154 ****
  extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
                           TransactionId OldestXmin, Buffer buffer);

! extern Snapshot GetTransactionSnapshot(void);
  extern Snapshot GetLatestSnapshot(void);
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable);

  #endif   /* TQUAL_H */
--- 142,154 ----
  extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
                           TransactionId OldestXmin, Buffer buffer);

! extern Snapshot GetTransactionSnapshot(bool set_procArray_xmin);
  extern Snapshot GetLatestSnapshot(void);
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin);

  #endif   /* TQUAL_H */
Index: src/pl/plpgsql/src/pl_exec.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.190
diff -c -c -r1.190 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c    15 Mar 2007 23:12:07 -0000    1.190
--- src/pl/plpgsql/src/pl_exec.c    23 Mar 2007 19:20:55 -0000
***************
*** 4124,4130 ****
          if (!estate->readonly_func)
          {
              CommandCounterIncrement();
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
          }

          /*
--- 4124,4130 ----
          if (!estate->readonly_func)
          {
              CommandCounterIncrement();
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
          }

          /*

Re: Improvement of procArray.xmin for VACUUM

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> The attached patch updates procArray.xmin in this manner.  Here is an
> example of how the patch improves dead row reuse:

I don't think this really works.  Consider what happens if I change
session 2 this way:

> Session #:
>         1                               2               3
>
>         CREATE TABLE test(x int);
>         INSERT INTO test VALUES (1);
>         BEGIN;
>         DELETE FROM test;
>                                         BEGIN;
                                          DECLARE foo CURSOR FOR
                      SELECT * FROM test;
>                                         SELECT 1;
>         COMMIT;
>                                                         VACUUM VERBOSE test;
>                                                         (row not reused)
>                                         SELECT 1;
                                          FETCH * FROM foo;
>
>                                         (At this point #2 doesn't see
>                                          the test row anymore.  Patch
>                                          updates procArray.xmin.)
>
>                                                         VACUUM VERBOSE test;
>                                                         (row reused with patch)
>                                         COMMIT;
>                                                         VACUUM VERBOSE test;
>                                                         (normal row reuse)


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I have been thinking we could improve how quickly VACUUM can expire rows
> if we update procArray.xmin more frequently for non-SERIALIZABLE
> transactions.
> The attached patch updates procArray.xmin in this manner.

This patch is incredibly broken.  Didn't you understand what I said
about how we don't track which snapshots are still alive?  You can't
advance procArray.xmin past the xmin of the oldest live snapshot in the
backend, and you can't assume that there are no live snapshots at the
places where this patch assumes that.  (Open cursors are one obvious
counterexample, but I think there are more.)

To make intra-transaction advancing of xmin possible, we'd need to
explicitly track all of the backend's live snapshots, probably by
introducing a "snapshot cache" manager that gives out tracked refcounts
as we do for some other structures like catcache entries.  This might
have some other advantages (I think most of the current CopySnapshot
operations could be replaced by refcount increments) but it's a whole
lot more complicated and invasive than what you've got here.

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I have been thinking we could improve how quickly VACUUM can expire rows
> > if we update procArray.xmin more frequently for non-SERIALIZABLE
> > transactions.
> > The attached patch updates procArray.xmin in this manner.
>
> This patch is incredibly broken.  Didn't you understand what I said
> about how we don't track which snapshots are still alive?  You can't
> advance procArray.xmin past the xmin of the oldest live snapshot in the
> backend, and you can't assume that there are no live snapshots at the
> places where this patch assumes that.  (Open cursors are one obvious
> counterexample, but I think there are more.)
>
> To make intra-transaction advancing of xmin possible, we'd need to
> explicitly track all of the backend's live snapshots, probably by
> introducing a "snapshot cache" manager that gives out tracked refcounts
> as we do for some other structures like catcache entries.  This might
> have some other advantages (I think most of the current CopySnapshot
> operations could be replaced by refcount increments) but it's a whole
> lot more complicated and invasive than what you've got here.

I updated the patch to save the MyProc->xid at the time the first cursor
is created, and not allow the MyProc->xid to be set lower than that
saved value in the current transaction.  It added only a few more lines
to the patch.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.238
diff -c -c -r1.238 xact.c
*** src/backend/access/transam/xact.c    22 Mar 2007 19:55:04 -0000    1.238
--- src/backend/access/transam/xact.c    24 Mar 2007 02:18:41 -0000
***************
*** 1563,1569 ****

          LWLockRelease(ProcArrayLock);
      }
!
      PG_TRACE1(transaction__commit, s->transactionId);

      /*
--- 1563,1570 ----

          LWLockRelease(ProcArrayLock);
      }
!     set_portal_min_xid(InvalidTransactionId);
!
      PG_TRACE1(transaction__commit, s->transactionId);

      /*
***************
*** 1802,1807 ****
--- 1803,1810 ----

      LWLockRelease(ProcArrayLock);

+     set_portal_min_xid(InvalidTransactionId);
+
      /*
       * This is all post-transaction cleanup.  Note that if an error is raised
       * here, it's too late to abort the transaction.  This should be just
***************
*** 1969,1974 ****
--- 1972,1979 ----
          LWLockRelease(ProcArrayLock);
      }

+     set_portal_min_xid(InvalidTransactionId);
+
      PG_TRACE1(transaction__abort, s->transactionId);

      /*
Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.280
diff -c -c -r1.280 index.c
*** src/backend/catalog/index.c    3 Mar 2007 20:08:41 -0000    1.280
--- src/backend/catalog/index.c    24 Mar 2007 02:18:42 -0000
***************
*** 1394,1400 ****
      }
      else if (indexInfo->ii_Concurrent)
      {
!         snapshot = CopySnapshot(GetTransactionSnapshot());
          OldestXmin = InvalidTransactionId;        /* not used */
      }
      else
--- 1394,1400 ----
      }
      else if (indexInfo->ii_Concurrent)
      {
!         snapshot = CopySnapshot(GetTransactionSnapshot(false));
          OldestXmin = InvalidTransactionId;        /* not used */
      }
      else
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/cluster.c    24 Mar 2007 02:18:42 -0000
***************
*** 204,210 ****
              /* Start a new transaction for each relation. */
              StartTransactionCommand();
              /* functions in indexes may want a snapshot set */
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
              cluster_rel(rvtc, true);
              CommitTransactionCommand();
          }
--- 204,210 ----
              /* Start a new transaction for each relation. */
              StartTransactionCommand();
              /* functions in indexes may want a snapshot set */
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
              cluster_rel(rvtc, true);
              CommitTransactionCommand();
          }
Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/indexcmds.c    24 Mar 2007 02:18:42 -0000
***************
*** 493,499 ****
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     snapshot = CopySnapshot(GetTransactionSnapshot());
      ActiveSnapshot = snapshot;

      /*
--- 493,499 ----
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     snapshot = CopySnapshot(GetTransactionSnapshot(false));
      ActiveSnapshot = snapshot;

      /*
***************
*** 1304,1310 ****

          StartTransactionCommand();
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
          if (reindex_relation(relid, true))
              ereport(NOTICE,
                      (errmsg("table \"%s\" was reindexed",
--- 1304,1310 ----

          StartTransactionCommand();
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
          if (reindex_relation(relid, true))
              ereport(NOTICE,
                      (errmsg("table \"%s\" was reindexed",
Index: src/backend/commands/trigger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.214
diff -c -c -r1.214 trigger.c
*** src/backend/commands/trigger.c    19 Mar 2007 23:38:29 -0000    1.214
--- src/backend/commands/trigger.c    24 Mar 2007 02:18:43 -0000
***************
*** 2645,2651 ****
       */
      events = &afterTriggers->events;
      if (events->head != NULL)
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Run all the remaining triggers.    Loop until they are all gone, just in
--- 2645,2651 ----
       */
      events = &afterTriggers->events;
      if (events->head != NULL)
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Run all the remaining triggers.    Loop until they are all gone, just in
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c    14 Mar 2007 18:48:55 -0000    1.349
--- src/backend/commands/vacuum.c    24 Mar 2007 02:18:45 -0000
***************
*** 412,418 ****
                  {
                      StartTransactionCommand();
                      /* functions in indexes may want a snapshot set */
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  }
                  else
                      old_context = MemoryContextSwitchTo(anl_context);
--- 412,418 ----
                  {
                      StartTransactionCommand();
                      /* functions in indexes may want a snapshot set */
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
                  }
                  else
                      old_context = MemoryContextSwitchTo(anl_context);
***************
*** 470,476 ****
           * necessary if we are called from autovacuum because autovacuum might
           * continue on to do an ANALYZE-only call.
           */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }

      if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess())
--- 470,476 ----
           * necessary if we are called from autovacuum because autovacuum might
           * continue on to do an ANALYZE-only call.
           */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
      }

      if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess())
***************
*** 964,970 ****
      if (vacstmt->full)
      {
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }
      else
      {
--- 964,970 ----
      if (vacstmt->full)
      {
          /* functions in indexes may want a snapshot set */
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
      }
      else
      {
Index: src/backend/executor/functions.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v
retrieving revision 1.112
diff -c -c -r1.112 functions.c
*** src/backend/executor/functions.c    13 Mar 2007 00:33:40 -0000    1.112
--- src/backend/executor/functions.c    24 Mar 2007 02:18:45 -0000
***************
*** 300,306 ****
      else
      {
          CommandCounterIncrement();
!         snapshot = CopySnapshot(GetTransactionSnapshot());
      }

      if (IsA(es->stmt, PlannedStmt))
--- 300,306 ----
      else
      {
          CommandCounterIncrement();
!         snapshot = CopySnapshot(GetTransactionSnapshot(false));
      }

      if (IsA(es->stmt, PlannedStmt))
Index: src/backend/executor/spi.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/spi.c,v
retrieving revision 1.173
diff -c -c -r1.173 spi.c
*** src/backend/executor/spi.c    17 Mar 2007 03:15:38 -0000    1.173
--- src/backend/executor/spi.c    24 Mar 2007 02:18:46 -0000
***************
*** 15,23 ****
--- 15,25 ----
  #include "postgres.h"

  #include "access/printtup.h"
+ #include "access/transam.h"
  #include "catalog/heap.h"
  #include "commands/trigger.h"
  #include "executor/spi_priv.h"
+ #include "storage/procarray.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/typcache.h"
***************
*** 57,63 ****
  static MemoryContext _SPI_procmem(void);
  static bool _SPI_checktuples(void);

-
  /* =================== interface functions =================== */

  int
--- 59,64 ----
***************
*** 997,1004 ****
      else
      {
          CommandCounterIncrement();
!         snapshot = GetTransactionSnapshot();
      }

      /*
       * Start portal execution.
--- 998,1006 ----
      else
      {
          CommandCounterIncrement();
!         snapshot = GetTransactionSnapshot(false);
      }
+     set_portal_min_xid(TransactionXmin);

      /*
       * Start portal execution.
***************
*** 1530,1536 ****
                      if (read_only)
                          ActiveSnapshot = CopySnapshot(saveActiveSnapshot);
                      else
!                         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  }
                  else
                  {
--- 1532,1538 ----
                      if (read_only)
                          ActiveSnapshot = CopySnapshot(saveActiveSnapshot);
                      else
!                         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
                  }
                  else
                  {
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.34
diff -c -c -r1.34 autovacuum.c
*** src/backend/postmaster/autovacuum.c    13 Mar 2007 00:33:41 -0000    1.34
--- src/backend/postmaster/autovacuum.c    24 Mar 2007 02:18:46 -0000
***************
*** 868,874 ****
      StartTransactionCommand();

      /* functions in indexes may want a snapshot set */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Clean up any dead statistics collector entries for this DB. We always
--- 868,874 ----
      StartTransactionCommand();

      /* functions in indexes may want a snapshot set */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Clean up any dead statistics collector entries for this DB. We always
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.22
diff -c -c -r1.22 procarray.c
*** src/backend/storage/ipc/procarray.c    23 Mar 2007 03:16:39 -0000    1.22
--- src/backend/storage/ipc/procarray.c    24 Mar 2007 02:18:47 -0000
***************
*** 54,60 ****
  } ProcArrayStruct;

  static ProcArrayStruct *procArray;
!

  #ifdef XIDCACHE_DEBUG

--- 54,60 ----
  } ProcArrayStruct;

  static ProcArrayStruct *procArray;
! static TransactionId portal_min_xid = InvalidTransactionId;

  #ifdef XIDCACHE_DEBUG

***************
*** 498,504 ****
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot, bool serializable)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
--- 498,504 ----
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
***************
*** 510,519 ****

      Assert(snapshot != NULL);

!     /* Serializable snapshot must be computed before any other... */
!     Assert(serializable ?
!            !TransactionIdIsValid(MyProc->xmin) :
!            TransactionIdIsValid(MyProc->xmin));

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
--- 510,517 ----

      Assert(snapshot != NULL);

!     /* Either we set an xmin, or we already have one.  */
!     Assert(set_procArray_xmin || TransactionIdIsValid(MyProc->xmin));

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
***************
*** 656,664 ****
          }
      }

!     if (serializable)
!         MyProc->xmin = TransactionXmin = xmin;
!
      LWLockRelease(ProcArrayLock);

      /*
--- 654,684 ----
          }
      }

!     if (set_procArray_xmin)
!     {
!         if (xmin != MyProc->xmin)
!             fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin);
!         else
!             fprintf(stderr, "%x:%x: not set\n", MyProcPid, MyProc->xid);
!         /*
!          * ---
!          * We need to set MyProc->xmin in a few cases:
!          *    o  set at transaction start for SERIALIZABLE visibility
!          *    o  set for new command in READ COMMITTED mode so
!          *       VACUUM knows we don't care about completed transactions
!          *
!          *    The problem is that we might have old cursors/portals
!          *    that need to have old transaction visibility, so we use
!          *    portal_min_xid to record MyProc->xmin during the first
!          *    cursors/portals creation, and then don't allow MyProc->xmin
!          *    to be set lower.
!          */
!         if (!TransactionIdIsValid(MyProc->xmin) ||
!             !TransactionIdIsValid(portal_min_xid) ||
!             TransactionIdPrecedesOrEquals(xmin, portal_min_xid))
!             MyProc->xmin = TransactionXmin = xmin;
!     }
!
      LWLockRelease(ProcArrayLock);

      /*
***************
*** 987,992 ****
--- 1007,1023 ----
      LWLockRelease(ProcArrayLock);
  }

+ void set_portal_min_xid(TransactionId xid)
+ {
+     /*
+      *    We only want portal_min_xid to be the first
+      *    portal's MyProc->xmin, or set to invalid.
+      */
+     if (portal_min_xid == InvalidTransactionId ||
+         xid == InvalidTransactionId)
+         portal_min_xid = xid;
+ }
+
  #ifdef XIDCACHE_DEBUG

  /*
Index: src/backend/tcop/fastpath.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v
retrieving revision 1.96
diff -c -c -r1.96 fastpath.c
*** src/backend/tcop/fastpath.c    3 Mar 2007 19:32:54 -0000    1.96
--- src/backend/tcop/fastpath.c    24 Mar 2007 02:18:47 -0000
***************
*** 308,314 ****
       * Now that we know we are in a valid transaction, set snapshot in case
       * needed by function itself or one of the datatype I/O routines.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Begin parsing the buffer contents.
--- 308,314 ----
       * Now that we know we are in a valid transaction, set snapshot in case
       * needed by function itself or one of the datatype I/O routines.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Begin parsing the buffer contents.
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.529
diff -c -c -r1.529 postgres.c
*** src/backend/tcop/postgres.c    22 Mar 2007 19:55:04 -0000    1.529
--- src/backend/tcop/postgres.c    24 Mar 2007 02:18:48 -0000
***************
*** 738,744 ****
          {
              if (needSnapshot)
              {
!                 ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  needSnapshot = false;
              }
              stmt = (Node *) pg_plan_query(query, boundParams);
--- 738,744 ----
          {
              if (needSnapshot)
              {
!                 ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true));
                  needSnapshot = false;
              }
              stmt = (Node *) pg_plan_query(query, boundParams);
Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.115
diff -c -c -r1.115 pquery.c
*** src/backend/tcop/pquery.c    13 Mar 2007 00:33:42 -0000    1.115
--- src/backend/tcop/pquery.c    24 Mar 2007 02:18:48 -0000
***************
*** 154,160 ****
       * Must always set snapshot for plannable queries.    Note we assume that
       * caller will take care of restoring ActiveSnapshot on exit/error.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Create the QueryDesc object
--- 154,160 ----
       * Must always set snapshot for plannable queries.    Note we assume that
       * caller will take care of restoring ActiveSnapshot on exit/error.
       */
!     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

      /*
       * Create the QueryDesc object
***************
*** 489,495 ****
                  if (snapshot)
                      ActiveSnapshot = CopySnapshot(snapshot);
                  else
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

                  /*
                   * Create QueryDesc in portal's context; for the moment, set
--- 489,495 ----
                  if (snapshot)
                      ActiveSnapshot = CopySnapshot(snapshot);
                  else
!                     ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));

                  /*
                   * Create QueryDesc in portal's context; for the moment, set
***************
*** 1163,1169 ****
            IsA(utilityStmt, NotifyStmt) ||
            IsA(utilityStmt, UnlistenStmt) ||
            IsA(utilityStmt, CheckPointStmt)))
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      else
          ActiveSnapshot = NULL;

--- 1163,1169 ----
            IsA(utilityStmt, NotifyStmt) ||
            IsA(utilityStmt, UnlistenStmt) ||
            IsA(utilityStmt, CheckPointStmt)))
!         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
      else
          ActiveSnapshot = NULL;

Index: src/backend/utils/adt/ri_triggers.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v
retrieving revision 1.92
diff -c -c -r1.92 ri_triggers.c
*** src/backend/utils/adt/ri_triggers.c    15 Mar 2007 23:12:06 -0000    1.92
--- src/backend/utils/adt/ri_triggers.c    24 Mar 2007 02:18:50 -0000
***************
*** 3283,3289 ****
      {
          CommandCounterIncrement();        /* be sure all my own work is visible */
          test_snapshot = CopySnapshot(GetLatestSnapshot());
!         crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot());
      }
      else
      {
--- 3283,3289 ----
      {
          CommandCounterIncrement();        /* be sure all my own work is visible */
          test_snapshot = CopySnapshot(GetLatestSnapshot());
!         crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot(false));
      }
      else
      {
Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.101
diff -c -c -r1.101 tqual.c
*** src/backend/utils/time/tqual.c    5 Jan 2007 22:19:47 -0000    1.101
--- src/backend/utils/time/tqual.c    24 Mar 2007 02:18:50 -0000
***************
*** 1202,1208 ****
   * the result with CopySnapshot() if it is to be used very long.
   */
  Snapshot
! GetTransactionSnapshot(void)
  {
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
--- 1202,1208 ----
   * the result with CopySnapshot() if it is to be used very long.
   */
  Snapshot
! GetTransactionSnapshot(bool set_procArray_xmin)
  {
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
***************
*** 1214,1220 ****
      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false);

      return LatestSnapshot;
  }
--- 1214,1221 ----
      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData,
!     /* we know we are not Serializable */ set_procArray_xmin);

      return LatestSnapshot;
  }
Index: src/include/storage/procarray.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/storage/procarray.h,v
retrieving revision 1.12
diff -c -c -r1.12 procarray.h
*** src/include/storage/procarray.h    16 Jan 2007 13:28:57 -0000    1.12
--- src/include/storage/procarray.h    24 Mar 2007 02:18:51 -0000
***************
*** 37,41 ****
--- 37,42 ----

  extern void XidCacheRemoveRunningXids(TransactionId xid,
                            int nxids, TransactionId *xids);
+ extern void set_portal_min_xid(TransactionId xid);

  #endif   /* PROCARRAY_H */
Index: src/include/utils/tqual.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v
retrieving revision 1.65
diff -c -c -r1.65 tqual.h
*** src/include/utils/tqual.h    5 Jan 2007 22:19:59 -0000    1.65
--- src/include/utils/tqual.h    24 Mar 2007 02:18:51 -0000
***************
*** 142,154 ****
  extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
                           TransactionId OldestXmin, Buffer buffer);

! extern Snapshot GetTransactionSnapshot(void);
  extern Snapshot GetLatestSnapshot(void);
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable);

  #endif   /* TQUAL_H */
--- 142,154 ----
  extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
                           TransactionId OldestXmin, Buffer buffer);

! extern Snapshot GetTransactionSnapshot(bool set_procArray_xmin);
  extern Snapshot GetLatestSnapshot(void);
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin);

  #endif   /* TQUAL_H */
Index: src/pl/plpgsql/src/pl_exec.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.190
diff -c -c -r1.190 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c    15 Mar 2007 23:12:07 -0000    1.190
--- src/pl/plpgsql/src/pl_exec.c    24 Mar 2007 02:18:53 -0000
***************
*** 4124,4130 ****
          if (!estate->readonly_func)
          {
              CommandCounterIncrement();
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
          }

          /*
--- 4124,4130 ----
          if (!estate->readonly_func)
          {
              CommandCounterIncrement();
!             ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false));
          }

          /*

Re: Improvement of procArray.xmin for VACUUM

From
Heikki Linnakangas
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> I have been thinking we could improve how quickly VACUUM can expire rows
>>> if we update procArray.xmin more frequently for non-SERIALIZABLE
>>> transactions.
>>> The attached patch updates procArray.xmin in this manner.
>> This patch is incredibly broken.  Didn't you understand what I said
>> about how we don't track which snapshots are still alive?  You can't
>> advance procArray.xmin past the xmin of the oldest live snapshot in the
>> backend, and you can't assume that there are no live snapshots at the
>> places where this patch assumes that.  (Open cursors are one obvious
>> counterexample, but I think there are more.)
>>
>> To make intra-transaction advancing of xmin possible, we'd need to
>> explicitly track all of the backend's live snapshots, probably by
>> introducing a "snapshot cache" manager that gives out tracked refcounts
>> as we do for some other structures like catcache entries.  This might
>> have some other advantages (I think most of the current CopySnapshot
>> operations could be replaced by refcount increments) but it's a whole
>> lot more complicated and invasive than what you've got here.
>
> I updated the patch to save the MyProc->xid at the time the first cursor
> is created, and not allow the MyProc->xid to be set lower than that
> saved value in the current transaction.  It added only a few more lines
> to the patch.

It seems to me a lot cleaner to do the reference counting like Tom
suggested. Increase the refcount on CopySnapshot, and decrease it on
FreeSnapshot. Assuming that all callers of CopySnapshot free the
snapshot with FreeSnapshot when they're done with it.

BTW: I really like the idea of doing this. It's a relatively simple
thing to do and gives some immediate benefit. And it opens the door for
more tricks to vacuum more aggressively in the presence of long-running
transactions. And it allows us to vacuum tuples that were inserted and
deleted in the same transactions even while the transaction is still
running, which helps some pathological cases where a transaction updates
a counter column many times within a transaction. We could also use it
to free entries in the new combocids hash table earlier (not that it's a
problem as it is, though).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> It seems to me a lot cleaner to do the reference counting like Tom
> suggested. Increase the refcount on CopySnapshot, and decrease it on
> FreeSnapshot. Assuming that all callers of CopySnapshot free the
> snapshot with FreeSnapshot when they're done with it.

I don't believe we bother at the moment; which is one of the reasons
it'd be a nontrivial patch.  I do think it might be worth doing though.
In the simple case where you're just issuing successive non-cursor
commands within a READ COMMITTED transaction, a refcounted
implementation would be able to recognize that there are *no* live
snapshots between commands and therefore reset MyProc->xmin to 0
whenever the backend is idle.

OTOH, do we have any evidence that this is worth bothering with at all?
I fear that the cases of long-running transactions that are problems
in the real world wouldn't be helped much --- for instance, pg_dump
wouldn't change behavior because it uses a serializable transaction.
Also, at some point a long-running transaction becomes a bottleneck
simply because its XID is itself the oldest thing visible in the
ProcArray and is determining everyone's xmin.  How much daylight is
there really between "your xmin is old" and "your xid is old"?

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
> > It seems to me a lot cleaner to do the reference counting like Tom
> > suggested. Increase the refcount on CopySnapshot, and decrease it on
> > FreeSnapshot. Assuming that all callers of CopySnapshot free the
> > snapshot with FreeSnapshot when they're done with it.
>
> I don't believe we bother at the moment; which is one of the reasons
> it'd be a nontrivial patch.  I do think it might be worth doing though.
> In the simple case where you're just issuing successive non-cursor
> commands within a READ COMMITTED transaction, a refcounted
> implementation would be able to recognize that there are *no* live
> snapshots between commands and therefore reset MyProc->xmin to 0
> whenever the backend is idle.

Attached is my current version of the patch.  It doesn't work now that I
tried to do reference count for Snapshots, but will stop now that Tom is
considering redesigning the snapshot mechanism.

> OTOH, do we have any evidence that this is worth bothering with at all?
> I fear that the cases of long-running transactions that are problems
> in the real world wouldn't be helped much --- for instance, pg_dump
> wouldn't change behavior because it uses a serializable transaction.
> Also, at some point a long-running transaction becomes a bottleneck
> simply because its XID is itself the oldest thing visible in the
> ProcArray and is determining everyone's xmin.  How much daylight is
> there really between "your xmin is old" and "your xid is old"?

Well, interesting you mention that, because I have a second idea on how
to improve things.  We start with MyProc->xmin equal to our own xid, and
then look for earlier transactions.  It should be possible to skip
considering our own xid for MyProc->xmin.  This would obviously help
VACUUM during long-running transactions.  While our transaction is
running, our xid isn't committed, so VACUUM isn't going to touch any of
our rows, and if other transactions complete before our
multi-transaction _statement_ starts, we can't see deleted rows from
them transaction, so why keep the deleted rows around?  Consider this
case:

Session #:
        1                               2               3
                                        BEGIN;
                                        SELECT 1;
        CREATE TABLE test(x int);
        INSERT INTO test VALUES (1);
        DELETE FROM test;
                                        SELECT 1;
                                                        VACUUM VERBOSE test;
                                                        (row can be reused)
                                        COMMIT;
                                                        VACUUM VERBOSE test;
                                                        (normal row reuse)

As I understand it, in READ COMMITTED mode, we have to skip
transactions in progress when our _statement_ starts, but anything
committed before that we see and we don't see dead rows created by them.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.280
diff -c -c -r1.280 index.c
*** src/backend/catalog/index.c    3 Mar 2007 20:08:41 -0000    1.280
--- src/backend/catalog/index.c    24 Mar 2007 19:17:56 -0000
***************
*** 1358,1364 ****
      ExprContext *econtext;
      Snapshot    snapshot;
      TransactionId OldestXmin;
!
      /*
       * sanity checks
       */
--- 1358,1364 ----
      ExprContext *econtext;
      Snapshot    snapshot;
      TransactionId OldestXmin;
!
      /*
       * sanity checks
       */
***************
*** 1555,1561 ****
      ExecDropSingleTupleTableSlot(slot);

      FreeExecutorState(estate);
!
      /* These may have been pointing to the now-gone estate */
      indexInfo->ii_ExpressionsState = NIL;
      indexInfo->ii_PredicateState = NIL;
--- 1555,1563 ----
      ExecDropSingleTupleTableSlot(slot);

      FreeExecutorState(estate);
!     if (snapshot != SnapshotNow)
!         FreeSnapshot(snapshot);
!
      /* These may have been pointing to the now-gone estate */
      indexInfo->ii_ExpressionsState = NIL;
      indexInfo->ii_PredicateState = NIL;
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/cluster.c    24 Mar 2007 19:17:56 -0000
***************
*** 204,209 ****
--- 204,210 ----
              /* Start a new transaction for each relation. */
              StartTransactionCommand();
              /* functions in indexes may want a snapshot set */
+             FreeSnapshot(ActiveSnapshot);
              ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
              cluster_rel(rvtc, true);
              CommitTransactionCommand();
Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/indexcmds.c    24 Mar 2007 19:17:56 -0000
***************
*** 493,500 ****
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     snapshot = CopySnapshot(GetTransactionSnapshot());
!     ActiveSnapshot = snapshot;

      /*
       * Scan the index and the heap, insert any missing index entries.
--- 493,500 ----
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     FreeSnapshot(ActiveSnapshot);
!     ActiveSnapshot = snapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Scan the index and the heap, insert any missing index entries.
***************
*** 1304,1309 ****
--- 1304,1310 ----

          StartTransactionCommand();
          /* functions in indexes may want a snapshot set */
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
          if (reindex_relation(relid, true))
              ereport(NOTICE,
Index: src/backend/commands/trigger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.214
diff -c -c -r1.214 trigger.c
*** src/backend/commands/trigger.c    19 Mar 2007 23:38:29 -0000    1.214
--- src/backend/commands/trigger.c    24 Mar 2007 19:17:57 -0000
***************
*** 2645,2651 ****
--- 2645,2654 ----
       */
      events = &afterTriggers->events;
      if (events->head != NULL)
+     {
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+     }

      /*
       * Run all the remaining triggers.    Loop until they are all gone, just in
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c    14 Mar 2007 18:48:55 -0000    1.349
--- src/backend/commands/vacuum.c    24 Mar 2007 19:17:59 -0000
***************
*** 412,417 ****
--- 412,418 ----
                  {
                      StartTransactionCommand();
                      /* functions in indexes may want a snapshot set */
+                     FreeSnapshot(ActiveSnapshot);
                      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  }
                  else
***************
*** 470,475 ****
--- 471,477 ----
           * necessary if we are called from autovacuum because autovacuum might
           * continue on to do an ANALYZE-only call.
           */
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }

***************
*** 964,969 ****
--- 966,972 ----
      if (vacstmt->full)
      {
          /* functions in indexes may want a snapshot set */
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }
      else
Index: src/backend/executor/functions.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v
retrieving revision 1.112
diff -c -c -r1.112 functions.c
*** src/backend/executor/functions.c    13 Mar 2007 00:33:40 -0000    1.112
--- src/backend/executor/functions.c    24 Mar 2007 19:17:59 -0000
***************
*** 315,321 ****
                                          fcache->paramLI);

      /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */
!
      /* Utility commands don't need Executor. */
      if (es->qd->operation != CMD_UTILITY)
      {
--- 315,321 ----
                                          fcache->paramLI);

      /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */
!
      /* Utility commands don't need Executor. */
      if (es->qd->operation != CMD_UTILITY)
      {
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.38
diff -c -c -r1.38 autovacuum.c
*** src/backend/postmaster/autovacuum.c    23 Mar 2007 21:57:10 -0000    1.38
--- src/backend/postmaster/autovacuum.c    24 Mar 2007 19:18:00 -0000
***************
*** 884,889 ****
--- 884,890 ----
      StartTransactionCommand();

      /* functions in indexes may want a snapshot set */
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.22
diff -c -c -r1.22 procarray.c
*** src/backend/storage/ipc/procarray.c    23 Mar 2007 03:16:39 -0000    1.22
--- src/backend/storage/ipc/procarray.c    24 Mar 2007 19:18:01 -0000
***************
*** 498,504 ****
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot, bool serializable)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
--- 498,504 ----
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
***************
*** 510,519 ****

      Assert(snapshot != NULL);

!     /* Serializable snapshot must be computed before any other... */
!     Assert(serializable ?
!            !TransactionIdIsValid(MyProc->xmin) :
!            TransactionIdIsValid(MyProc->xmin));

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
--- 510,517 ----

      Assert(snapshot != NULL);

!     /* Either we set an xmin, or we already have one. */
!     Assert(TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0);

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
***************
*** 656,664 ****
          }
      }

!     if (serializable)
!         MyProc->xmin = TransactionXmin = xmin;

      LWLockRelease(ProcArrayLock);

      /*
--- 654,681 ----
          }
      }

!     /* DEBUG */
!     if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0)
!         fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin);
!     else
!         fprintf(stderr, "%x:%x: count=%d, not set\n", MyProcPid, MyProc->xid, GetOpenSnapshotCount());

+     /*
+      * ---
+      * We need to set MyProc->xmin in a few cases:
+      *    o  set at transaction start for SERIALIZABLE visibility
+      *    o  set for new command in READ COMMITTED mode so
+      *       VACUUM knows we don't care about completed transactions
+      *
+      *    The problem is that we might have old cursors/portals
+      *    that need to have older transaction visibility, so we
+      *    check to see we have no open Snapshots before making
+      *    the change.
+      * ---
+      */
+     if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0)
+         MyProc->xmin = TransactionXmin = xmin;
+
      LWLockRelease(ProcArrayLock);

      /*
Index: src/backend/tcop/fastpath.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v
retrieving revision 1.96
diff -c -c -r1.96 fastpath.c
*** src/backend/tcop/fastpath.c    3 Mar 2007 19:32:54 -0000    1.96
--- src/backend/tcop/fastpath.c    24 Mar 2007 19:18:01 -0000
***************
*** 308,313 ****
--- 308,314 ----
       * Now that we know we are in a valid transaction, set snapshot in case
       * needed by function itself or one of the datatype I/O routines.
       */
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.529
diff -c -c -r1.529 postgres.c
*** src/backend/tcop/postgres.c    22 Mar 2007 19:55:04 -0000    1.529
--- src/backend/tcop/postgres.c    24 Mar 2007 19:18:02 -0000
***************
*** 738,743 ****
--- 738,744 ----
          {
              if (needSnapshot)
              {
+                 FreeSnapshot(ActiveSnapshot);
                  ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  needSnapshot = false;
              }
Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.115
diff -c -c -r1.115 pquery.c
*** src/backend/tcop/pquery.c    13 Mar 2007 00:33:42 -0000    1.115
--- src/backend/tcop/pquery.c    24 Mar 2007 19:18:02 -0000
***************
*** 154,159 ****
--- 154,160 ----
       * Must always set snapshot for plannable queries.    Note we assume that
       * caller will take care of restoring ActiveSnapshot on exit/error.
       */
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
***************
*** 486,491 ****
--- 487,493 ----
                   * Must set snapshot before starting executor.    Be sure to
                   * copy it into the portal's context.
                   */
+                 FreeSnapshot(ActiveSnapshot);
                  if (snapshot)
                      ActiveSnapshot = CopySnapshot(snapshot);
                  else
***************
*** 1163,1169 ****
--- 1165,1174 ----
            IsA(utilityStmt, NotifyStmt) ||
            IsA(utilityStmt, UnlistenStmt) ||
            IsA(utilityStmt, CheckPointStmt)))
+     {
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+     }
      else
          ActiveSnapshot = NULL;

***************
*** 1177,1184 ****
      /* Some utility statements may change context on us */
      MemoryContextSwitchTo(PortalGetHeapMemory(portal));

!     if (ActiveSnapshot)
!         FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = NULL;
  }

--- 1182,1188 ----
      /* Some utility statements may change context on us */
      MemoryContextSwitchTo(PortalGetHeapMemory(portal));

!     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = NULL;
  }

Index: src/backend/utils/adt/ri_triggers.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v
retrieving revision 1.92
diff -c -c -r1.92 ri_triggers.c
*** src/backend/utils/adt/ri_triggers.c    15 Mar 2007 23:12:06 -0000    1.92
--- src/backend/utils/adt/ri_triggers.c    24 Mar 2007 19:18:04 -0000
***************
*** 2636,2642 ****
      char        workmembuf[32];
      int            spi_result;
      SPIPlanPtr    qplan;
!
      /*
       * Check to make sure current user has enough permissions to do the test
       * query.  (If not, caller can fall back to the trigger method, which
--- 2636,2643 ----
      char        workmembuf[32];
      int            spi_result;
      SPIPlanPtr    qplan;
!     Snapshot    snapshot;
!
      /*
       * Check to make sure current user has enough permissions to do the test
       * query.  (If not, caller can fall back to the trigger method, which
***************
*** 2770,2780 ****
       * really haven't got much choice.)  We need at most one tuple returned,
       * so pass limit = 1.
       */
      spi_result = SPI_execute_snapshot(qplan,
                                        NULL, NULL,
!                                       CopySnapshot(GetLatestSnapshot()),
                                        InvalidSnapshot,
                                        true, 1);

      /* Check result */
      if (spi_result != SPI_OK_SELECT)
--- 2771,2783 ----
       * really haven't got much choice.)  We need at most one tuple returned,
       * so pass limit = 1.
       */
+     snapshot = CopySnapshot(GetLatestSnapshot());
      spi_result = SPI_execute_snapshot(qplan,
                                        NULL, NULL,
!                                       snapshot,
                                        InvalidSnapshot,
                                        true, 1);
+     FreeSnapshot(snapshot);

      /* Check result */
      if (spi_result != SPI_OK_SELECT)
***************
*** 3310,3315 ****
--- 3313,3322 ----
                                        test_snapshot, crosscheck_snapshot,
                                        false, limit);

+     if (test_snapshot != InvalidSnapshot)
+         FreeSnapshot(test_snapshot);
+     if (crosscheck_snapshot != InvalidSnapshot)
+         FreeSnapshot(crosscheck_snapshot);
      /* Restore UID */
      SetUserId(save_uid);

Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.101
diff -c -c -r1.101 tqual.c
*** src/backend/utils/time/tqual.c    5 Jan 2007 22:19:47 -0000    1.101
--- src/backend/utils/time/tqual.c    24 Mar 2007 19:18:04 -0000
***************
*** 54,59 ****
--- 54,67 ----
  static SnapshotData SnapshotDirtyData;
  static SnapshotData SerializableSnapshotData;
  static SnapshotData LatestSnapshotData;
+ /*
+  * We use OpenSnapshotCount to record if there are any open snapshots.  If
+  * there are not, we can update MyProc->xmin. A more sophisticated solution
+  * would be to keep track of the oldest open snapshot and prevent
+  * MyProc->xmin from going below that value, but the complexity doesn't
+  * seem worth it.
+  */
+ static int OpenSnapshotCount = 0;

  /* Externally visible pointers to valid snapshots: */
  Snapshot    SnapshotDirty = &SnapshotDirtyData;
***************
*** 1207,1220 ****
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
      {
!         SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData, true);
          return SerializableSnapshot;
      }

      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false);

      return LatestSnapshot;
  }
--- 1215,1228 ----
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
      {
!         SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData);
          return SerializableSnapshot;
      }

      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData);

      return LatestSnapshot;
  }
***************
*** 1231,1237 ****
      if (SerializableSnapshot == NULL)
          elog(ERROR, "no snapshot has been set");

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false);

      return LatestSnapshot;
  }
--- 1239,1245 ----
      if (SerializableSnapshot == NULL)
          elog(ERROR, "no snapshot has been set");

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData);

      return LatestSnapshot;
  }
***************
*** 1251,1256 ****
--- 1259,1266 ----
      Size        subxipoff;
      Size        size;

+     OpenSnapshotCount++;
+
      /* We allocate any XID arrays needed in the same palloc block. */
      size = subxipoff = sizeof(SnapshotData) +
          snapshot->xcnt * sizeof(TransactionId);
***************
*** 1295,1301 ****
  void
  FreeSnapshot(Snapshot snapshot)
  {
!     pfree(snapshot);
  }

  /*
--- 1305,1320 ----
  void
  FreeSnapshot(Snapshot snapshot)
  {
!     if (snapshot != NULL)
!     {
!         OpenSnapshotCount--;
!         pfree(snapshot);
!     }
! }
!
! int GetOpenSnapshotCount(void)
! {
!     return OpenSnapshotCount;
  }

  /*
***************
*** 1312,1318 ****
--- 1331,1339 ----
       */
      SerializableSnapshot = NULL;
      LatestSnapshot = NULL;
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = NULL;        /* just for cleanliness */
+     Assert(GetOpenSnapshotCount() == 0);
  }

  /*
Index: src/include/utils/tqual.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v
retrieving revision 1.65
diff -c -c -r1.65 tqual.h
*** src/include/utils/tqual.h    5 Jan 2007 22:19:59 -0000    1.65
--- src/include/utils/tqual.h    24 Mar 2007 19:18:05 -0000
***************
*** 147,154 ****
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable);

  #endif   /* TQUAL_H */
--- 147,155 ----
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);
+ extern int GetOpenSnapshotCount(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot);

  #endif   /* TQUAL_H */
Index: src/pl/plpgsql/src/pl_exec.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.190
diff -c -c -r1.190 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c    15 Mar 2007 23:12:07 -0000    1.190
--- src/pl/plpgsql/src/pl_exec.c    24 Mar 2007 19:18:06 -0000
***************
*** 4139,4149 ****
--- 4139,4153 ----
      PG_CATCH();
      {
          /* Restore global vars and propagate error */
+         if (!estate->readonly_func)
+             FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = saveActiveSnapshot;
          PG_RE_THROW();
      }
      PG_END_TRY();

+     if (!estate->readonly_func)
+         FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = saveActiveSnapshot;
      SPI_pop();


Re: Improvement of procArray.xmin for VACUUM

From
Gregory Stark
Date:
> OTOH, do we have any evidence that this is worth bothering with at all?
> I fear that the cases of long-running transactions that are problems
> in the real world wouldn't be helped much --- for instance, pg_dump
> wouldn't change behavior because it uses a serializable transaction.
Well I think this would be the same infrastructure we would need to do the
other discussed improvement to address pg_dump's impact. That would require us
to publish the youngest xmax of the live snapshots. Vacuum could deduce that
that xid cannot possibly see any transactions between the youngest extant xmax
and the oldest in-progress transaction.
--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Well I think this would be the same infrastructure we would need to do
> the other discussed improvement to address pg_dump's impact. That
> would require us to publish the youngest xmax of the live
> snapshots. Vacuum could deduce that that xid cannot possibly see any
> transactions between the youngest extant xmax and the oldest
> in-progress transaction.

... and do what with the knowledge?  Not remove tuples, because any such
tuples would be part of RECENTLY_DEAD update chains that that xact might
be following now or in the future.

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Well I think this would be the same infrastructure we would need to do
>> the other discussed improvement to address pg_dump's impact. That
>> would require us to publish the youngest xmax of the live
>> snapshots. Vacuum could deduce that that xid cannot possibly see any
>> transactions between the youngest extant xmax and the oldest
>> in-progress transaction.
>
> ... and do what with the knowledge?  Not remove tuples, because any such
> tuples would be part of RECENTLY_DEAD update chains that that xact might
> be following now or in the future.

Well that just means it might require extra work, not that it would be
impossible.

Firstly, some tuples would not be part of a chain and could be cleaned up
anyways. Others would be part of a HOT chain which might make it easier to
clean them up.

But even for tuples that are part of a chain there may be solutions. We could
truncate the tuple to just the MVCC information so subsequent transactions can
find the head. Or we might be able to go back and edit the forward link to
skip the dead intermediate tuples (and somehow deal with the race
conditions...)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Improvement of procArray.xmin for VACUUM

From
"Simon Riggs"
Date:
On Fri, 2007-03-23 at 17:35 -0400, Tom Lane wrote:
> To make intra-transaction advancing of xmin possible, we'd need to
> explicitly track all of the backend's live snapshots, probably by
> introducing a "snapshot cache" manager that gives out tracked
> refcounts
> as we do for some other structures like catcache entries.  This might
> have some other advantages (I think most of the current CopySnapshot
> operations could be replaced by refcount increments)

Seems like we should do this for many reasons, whether or not this is
one of them. I seem to have butted heads and lost more than once with
not being able to tell which Snapshots exist.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: Improvement of procArray.xmin for VACUUM

From
"Simon Riggs"
Date:
On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote:

> Also, at some point a long-running transaction becomes a bottleneck
> simply because its XID is itself the oldest thing visible in the
> ProcArray and is determining everyone's xmin.  How much daylight is
> there really between "your xmin is old" and "your xid is old"?

Hmm, yes. How often do we have an LRT that consists of multiple
statements of significant duration? Not often, I'd wager.

How much does it cost to optimise for this case?

Did Heikki's patch to move the xmin forward during VACUUM get rejected?
That seems like it has a much wider use case.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Did Heikki's patch to move the xmin forward during VACUUM get rejected?
> That seems like it has a much wider use case.

It's still in the queue (which I have finally started to review in
earnest).

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote:
>
> > Also, at some point a long-running transaction becomes a bottleneck
> > simply because its XID is itself the oldest thing visible in the
> > ProcArray and is determining everyone's xmin.  How much daylight is
> > there really between "your xmin is old" and "your xid is old"?
>
> Hmm, yes. How often do we have an LRT that consists of multiple
> statements of significant duration? Not often, I'd wager.
>
> How much does it cost to optimise for this case?

Zero cost.  It involves just tracking if there are any old snapshots,
and if not, update the proc xmin rather than skipping the asignment,
like we do now.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Simon Riggs wrote:
>> How much does it cost to optimise for this case?

> Zero cost.  It involves just tracking if there are any old snapshots,

I will be fascinated to hear how you define that as zero cost.  It might
be relatively low, but it's not zero, and for many simple cases (eg,
all single-statement transactions) the benefit will indeed be zero.
We need some serious consideration of the costs and benefits, not airy
dismissals.

I had originally thought that we could avoid CopySnapshot copying,
which might buy back more than enough to cover the cost of tracking
reference counts ... but in a quick look yesterday it seemed that the
high-use code paths would still need a copy, because they are always
copying off the static variables filled by GetTransactionSnapshot.
Changing that would come with a tremendous memory penalty, or else
a time penalty to scan the ProcArray twice to see how big the arrays
need to be.

[ thinks a bit... ]  The other way we might possibly tackle that is
to avoid copying by the expedient of just using those static snapshot
variables in-place.  SerializableSnapshot surely need never be copied
since it remains unchanged till end of xact, and no use of the snap
will survive longer than that.  In simple cases LatestSnapshot is not
overwritten until the prior value is no longer needed, either.  If
we could arrange to copy it only when setting up a cursor, or other
long-lived usage, we'd be ahead of the game.  But I'd certainly want
a management layer in there to ensure that we don't overwrite a value
that *is* still in use, and offhand I'm not sure what that layer would
need to look like.  Possibly some sort of double indirection so that
callers would have a Snapshot pointer that remained valid after we'd
copied off arrays that need to be updated?  We already do have double
indirection in the form of the Snapshot and xip[] pointers ... maybe
attach refcounts to the xip arrays not the Snapshots per se?

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
This is an optimization so I was thinking of something simpler, like
incrementing/decrementing a counter every time we allocate/free a
snapshot (like in the patch), and updating MyProc->xmin only if there
are no open snapshots.  I don't think it is worth tracking anything more
complicated than that.  Is that now possible to do cleanly?  I am having
trouble getting this to work in the attached patch.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Simon Riggs wrote:
> >> How much does it cost to optimise for this case?
>
> > Zero cost.  It involves just tracking if there are any old snapshots,
>
> I will be fascinated to hear how you define that as zero cost.  It might
> be relatively low, but it's not zero, and for many simple cases (eg,
> all single-statement transactions) the benefit will indeed be zero.
> We need some serious consideration of the costs and benefits, not airy
> dismissals.
>
> I had originally thought that we could avoid CopySnapshot copying,
> which might buy back more than enough to cover the cost of tracking
> reference counts ... but in a quick look yesterday it seemed that the
> high-use code paths would still need a copy, because they are always
> copying off the static variables filled by GetTransactionSnapshot.
> Changing that would come with a tremendous memory penalty, or else
> a time penalty to scan the ProcArray twice to see how big the arrays
> need to be.
>
> [ thinks a bit... ]  The other way we might possibly tackle that is
> to avoid copying by the expedient of just using those static snapshot
> variables in-place.  SerializableSnapshot surely need never be copied
> since it remains unchanged till end of xact, and no use of the snap
> will survive longer than that.  In simple cases LatestSnapshot is not
> overwritten until the prior value is no longer needed, either.  If
> we could arrange to copy it only when setting up a cursor, or other
> long-lived usage, we'd be ahead of the game.  But I'd certainly want
> a management layer in there to ensure that we don't overwrite a value
> that *is* still in use, and offhand I'm not sure what that layer would
> need to look like.  Possibly some sort of double indirection so that
> callers would have a Snapshot pointer that remained valid after we'd
> copied off arrays that need to be updated?  We already do have double
> indirection in the form of the Snapshot and xip[] pointers ... maybe
> attach refcounts to the xip arrays not the Snapshots per se?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.281
diff -c -c -r1.281 index.c
*** src/backend/catalog/index.c    25 Mar 2007 19:45:14 -0000    1.281
--- src/backend/catalog/index.c    26 Mar 2007 21:06:30 -0000
***************
*** 1358,1364 ****
      ExprContext *econtext;
      Snapshot    snapshot;
      TransactionId OldestXmin;
!
      /*
       * sanity checks
       */
--- 1358,1364 ----
      ExprContext *econtext;
      Snapshot    snapshot;
      TransactionId OldestXmin;
!
      /*
       * sanity checks
       */
***************
*** 1555,1561 ****
      ExecDropSingleTupleTableSlot(slot);

      FreeExecutorState(estate);
!
      /* These may have been pointing to the now-gone estate */
      indexInfo->ii_ExpressionsState = NIL;
      indexInfo->ii_PredicateState = NIL;
--- 1555,1563 ----
      ExecDropSingleTupleTableSlot(slot);

      FreeExecutorState(estate);
!     if (snapshot != SnapshotNow)
!         FreeSnapshot(snapshot);
!
      /* These may have been pointing to the now-gone estate */
      indexInfo->ii_ExpressionsState = NIL;
      indexInfo->ii_PredicateState = NIL;
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.157
diff -c -c -r1.157 cluster.c
*** src/backend/commands/cluster.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/cluster.c    26 Mar 2007 21:06:30 -0000
***************
*** 204,209 ****
--- 204,210 ----
              /* Start a new transaction for each relation. */
              StartTransactionCommand();
              /* functions in indexes may want a snapshot set */
+             FreeSnapshot(ActiveSnapshot);
              ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
              cluster_rel(rvtc, true);
              CommitTransactionCommand();
Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c    13 Mar 2007 00:33:39 -0000    1.157
--- src/backend/commands/indexcmds.c    26 Mar 2007 21:06:31 -0000
***************
*** 493,500 ****
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     snapshot = CopySnapshot(GetTransactionSnapshot());
!     ActiveSnapshot = snapshot;

      /*
       * Scan the index and the heap, insert any missing index entries.
--- 493,500 ----
       * We also set ActiveSnapshot to this snap, since functions in indexes may
       * need a snapshot.
       */
!     FreeSnapshot(ActiveSnapshot);
!     ActiveSnapshot = snapshot = CopySnapshot(GetTransactionSnapshot());

      /*
       * Scan the index and the heap, insert any missing index entries.
***************
*** 1304,1309 ****
--- 1304,1310 ----

          StartTransactionCommand();
          /* functions in indexes may want a snapshot set */
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
          if (reindex_relation(relid, true))
              ereport(NOTICE,
Index: src/backend/commands/trigger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.214
diff -c -c -r1.214 trigger.c
*** src/backend/commands/trigger.c    19 Mar 2007 23:38:29 -0000    1.214
--- src/backend/commands/trigger.c    26 Mar 2007 21:06:32 -0000
***************
*** 2645,2651 ****
--- 2645,2654 ----
       */
      events = &afterTriggers->events;
      if (events->head != NULL)
+     {
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+     }

      /*
       * Run all the remaining triggers.    Loop until they are all gone, just in
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c    14 Mar 2007 18:48:55 -0000    1.349
--- src/backend/commands/vacuum.c    26 Mar 2007 21:06:33 -0000
***************
*** 412,417 ****
--- 412,418 ----
                  {
                      StartTransactionCommand();
                      /* functions in indexes may want a snapshot set */
+                     FreeSnapshot(ActiveSnapshot);
                      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  }
                  else
***************
*** 470,475 ****
--- 471,477 ----
           * necessary if we are called from autovacuum because autovacuum might
           * continue on to do an ANALYZE-only call.
           */
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }

***************
*** 964,969 ****
--- 966,972 ----
      if (vacstmt->full)
      {
          /* functions in indexes may want a snapshot set */
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
      }
      else
Index: src/backend/executor/functions.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v
retrieving revision 1.112
diff -c -c -r1.112 functions.c
*** src/backend/executor/functions.c    13 Mar 2007 00:33:40 -0000    1.112
--- src/backend/executor/functions.c    26 Mar 2007 21:06:33 -0000
***************
*** 315,321 ****
                                          fcache->paramLI);

      /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */
!
      /* Utility commands don't need Executor. */
      if (es->qd->operation != CMD_UTILITY)
      {
--- 315,321 ----
                                          fcache->paramLI);

      /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */
!
      /* Utility commands don't need Executor. */
      if (es->qd->operation != CMD_UTILITY)
      {
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.38
diff -c -c -r1.38 autovacuum.c
*** src/backend/postmaster/autovacuum.c    23 Mar 2007 21:57:10 -0000    1.38
--- src/backend/postmaster/autovacuum.c    26 Mar 2007 21:06:34 -0000
***************
*** 884,889 ****
--- 884,890 ----
      StartTransactionCommand();

      /* functions in indexes may want a snapshot set */
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.23
diff -c -c -r1.23 procarray.c
*** src/backend/storage/ipc/procarray.c    25 Mar 2007 19:45:14 -0000    1.23
--- src/backend/storage/ipc/procarray.c    26 Mar 2007 21:06:35 -0000
***************
*** 498,504 ****
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot, bool serializable)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
--- 498,504 ----
   *----------
   */
  Snapshot
! GetSnapshotData(Snapshot snapshot)
  {
      ProcArrayStruct *arrayP = procArray;
      TransactionId xmin;
***************
*** 510,519 ****

      Assert(snapshot != NULL);

!     /* Serializable snapshot must be computed before any other... */
!     Assert(serializable ?
!            !TransactionIdIsValid(MyProc->xmin) :
!            TransactionIdIsValid(MyProc->xmin));

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
--- 510,517 ----

      Assert(snapshot != NULL);

!     /* Either we set an xmin, or we already have one. */
!     Assert(TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0);

      /*
       * Allocating space for maxProcs xids is usually overkill; numProcs would
***************
*** 656,664 ****
          }
      }

!     if (serializable)
!         MyProc->xmin = TransactionXmin = xmin;

      LWLockRelease(ProcArrayLock);

      /*
--- 654,681 ----
          }
      }

!     /* DEBUG */
!     if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0)
!         fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin);
!     else
!         fprintf(stderr, "%x:%x: count=%d, not set\n", MyProcPid, MyProc->xid, GetOpenSnapshotCount());

+     /*
+      * ---
+      * We need to set MyProc->xmin in a few cases:
+      *    o  set at transaction start for SERIALIZABLE visibility
+      *    o  set for new command in READ COMMITTED mode so
+      *       VACUUM knows we don't care about completed transactions
+      *
+      *    The problem is that we might have old cursors/portals
+      *    that need to have older transaction visibility, so we
+      *    check to see we have no open Snapshots before making
+      *    the change.
+      * ---
+      */
+     if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0)
+         MyProc->xmin = TransactionXmin = xmin;
+
      LWLockRelease(ProcArrayLock);

      /*
Index: src/backend/tcop/fastpath.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v
retrieving revision 1.96
diff -c -c -r1.96 fastpath.c
*** src/backend/tcop/fastpath.c    3 Mar 2007 19:32:54 -0000    1.96
--- src/backend/tcop/fastpath.c    26 Mar 2007 21:06:35 -0000
***************
*** 308,313 ****
--- 308,314 ----
       * Now that we know we are in a valid transaction, set snapshot in case
       * needed by function itself or one of the datatype I/O routines.
       */
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.529
diff -c -c -r1.529 postgres.c
*** src/backend/tcop/postgres.c    22 Mar 2007 19:55:04 -0000    1.529
--- src/backend/tcop/postgres.c    26 Mar 2007 21:06:36 -0000
***************
*** 738,743 ****
--- 738,744 ----
          {
              if (needSnapshot)
              {
+                 FreeSnapshot(ActiveSnapshot);
                  ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
                  needSnapshot = false;
              }
Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.115
diff -c -c -r1.115 pquery.c
*** src/backend/tcop/pquery.c    13 Mar 2007 00:33:42 -0000    1.115
--- src/backend/tcop/pquery.c    26 Mar 2007 21:06:36 -0000
***************
*** 154,159 ****
--- 154,160 ----
       * Must always set snapshot for plannable queries.    Note we assume that
       * caller will take care of restoring ActiveSnapshot on exit/error.
       */
+     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

      /*
***************
*** 603,608 ****
--- 604,611 ----

          /* Restore global vars and propagate error */
          ActivePortal = saveActivePortal;
+         if (portal->strategy == PORTAL_ONE_SELECT)
+             FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = saveActiveSnapshot;
          CurrentResourceOwner = saveResourceOwner;
          PortalContext = savePortalContext;
***************
*** 614,619 ****
--- 617,624 ----
      MemoryContextSwitchTo(oldContext);

      ActivePortal = saveActivePortal;
+     if (portal->strategy == PORTAL_ONE_SELECT)
+         FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = saveActiveSnapshot;
      CurrentResourceOwner = saveResourceOwner;
      PortalContext = savePortalContext;
***************
*** 1163,1169 ****
--- 1168,1177 ----
            IsA(utilityStmt, NotifyStmt) ||
            IsA(utilityStmt, UnlistenStmt) ||
            IsA(utilityStmt, CheckPointStmt)))
+     {
+         FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+     }
      else
          ActiveSnapshot = NULL;

***************
*** 1177,1184 ****
      /* Some utility statements may change context on us */
      MemoryContextSwitchTo(PortalGetHeapMemory(portal));

!     if (ActiveSnapshot)
!         FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = NULL;
  }

--- 1185,1191 ----
      /* Some utility statements may change context on us */
      MemoryContextSwitchTo(PortalGetHeapMemory(portal));

!     FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = NULL;
  }

Index: src/backend/utils/adt/ri_triggers.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v
retrieving revision 1.93
diff -c -c -r1.93 ri_triggers.c
*** src/backend/utils/adt/ri_triggers.c    25 Mar 2007 19:45:14 -0000    1.93
--- src/backend/utils/adt/ri_triggers.c    26 Mar 2007 21:06:38 -0000
***************
*** 2636,2642 ****
      char        workmembuf[32];
      int            spi_result;
      SPIPlanPtr    qplan;
!
      /*
       * Check to make sure current user has enough permissions to do the test
       * query.  (If not, caller can fall back to the trigger method, which
--- 2636,2643 ----
      char        workmembuf[32];
      int            spi_result;
      SPIPlanPtr    qplan;
!     Snapshot    snapshot;
!
      /*
       * Check to make sure current user has enough permissions to do the test
       * query.  (If not, caller can fall back to the trigger method, which
***************
*** 2770,2780 ****
       * really haven't got much choice.)  We need at most one tuple returned,
       * so pass limit = 1.
       */
      spi_result = SPI_execute_snapshot(qplan,
                                        NULL, NULL,
!                                       CopySnapshot(GetLatestSnapshot()),
                                        InvalidSnapshot,
                                        true, 1);

      /* Check result */
      if (spi_result != SPI_OK_SELECT)
--- 2771,2783 ----
       * really haven't got much choice.)  We need at most one tuple returned,
       * so pass limit = 1.
       */
+     snapshot = CopySnapshot(GetLatestSnapshot());
      spi_result = SPI_execute_snapshot(qplan,
                                        NULL, NULL,
!                                       snapshot,
                                        InvalidSnapshot,
                                        true, 1);
+     FreeSnapshot(snapshot);

      /* Check result */
      if (spi_result != SPI_OK_SELECT)
***************
*** 3310,3315 ****
--- 3313,3322 ----
                                        test_snapshot, crosscheck_snapshot,
                                        false, limit);

+     if (test_snapshot != InvalidSnapshot)
+         FreeSnapshot(test_snapshot);
+     if (crosscheck_snapshot != InvalidSnapshot)
+         FreeSnapshot(crosscheck_snapshot);
      /* Restore UID */
      SetUserId(save_uid);

Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.102
diff -c -c -r1.102 tqual.c
*** src/backend/utils/time/tqual.c    25 Mar 2007 19:45:14 -0000    1.102
--- src/backend/utils/time/tqual.c    26 Mar 2007 21:06:38 -0000
***************
*** 65,70 ****
--- 65,79 ----
  Snapshot    LatestSnapshot = NULL;

  /*
+  * We use OpenSnapshotCount to record if there are any open snapshots.  If
+  * there are not, we can update MyProc->xmin. A more sophisticated solution
+  * would be to keep track of the oldest open snapshot and prevent
+  * MyProc->xmin from going below that value, but the complexity doesn't
+  * seem worth it.
+  */
+ static int OpenSnapshotCount = 0;
+
+ /*
   * This pointer is not maintained by this module, but it's convenient
   * to declare it here anyway.  Callers typically assign a copy of
   * GetTransactionSnapshot's result to ActiveSnapshot.
***************
*** 1224,1237 ****
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
      {
!         SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData, true);
          return SerializableSnapshot;
      }

      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false);

      return LatestSnapshot;
  }
--- 1233,1246 ----
      /* First call in transaction? */
      if (SerializableSnapshot == NULL)
      {
!         SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData);
          return SerializableSnapshot;
      }

      if (IsXactIsoLevelSerializable)
          return SerializableSnapshot;

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData);

      return LatestSnapshot;
  }
***************
*** 1248,1254 ****
      if (SerializableSnapshot == NULL)
          elog(ERROR, "no snapshot has been set");

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false);

      return LatestSnapshot;
  }
--- 1257,1263 ----
      if (SerializableSnapshot == NULL)
          elog(ERROR, "no snapshot has been set");

!     LatestSnapshot = GetSnapshotData(&LatestSnapshotData);

      return LatestSnapshot;
  }
***************
*** 1266,1271 ****
--- 1275,1282 ----
      Size        subxipoff;
      Size        size;

+     OpenSnapshotCount++;
+
      /* We allocate any XID arrays needed in the same palloc block. */
      size = subxipoff = sizeof(SnapshotData) +
          snapshot->xcnt * sizeof(TransactionId);
***************
*** 1310,1316 ****
  void
  FreeSnapshot(Snapshot snapshot)
  {
!     pfree(snapshot);
  }

  /*
--- 1321,1337 ----
  void
  FreeSnapshot(Snapshot snapshot)
  {
!     if (snapshot != NULL)
!     {
!         OpenSnapshotCount--;
!         Assert(OpenSnapshotCount >= 0);
!         pfree(snapshot);
!     }
! }
!
! int GetOpenSnapshotCount(void)
! {
!     return OpenSnapshotCount;
  }

  /*
***************
*** 1328,1333 ****
--- 1349,1355 ----
      SerializableSnapshot = NULL;
      LatestSnapshot = NULL;
      ActiveSnapshot = NULL;        /* just for cleanliness */
+     OpenSnapshotCount = 0;
  }

  /*
Index: src/include/utils/tqual.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v
retrieving revision 1.66
diff -c -c -r1.66 tqual.h
*** src/include/utils/tqual.h    25 Mar 2007 19:45:14 -0000    1.66
--- src/include/utils/tqual.h    26 Mar 2007 21:06:39 -0000
***************
*** 150,157 ****
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable);

  #endif   /* TQUAL_H */
--- 150,158 ----
  extern Snapshot CopySnapshot(Snapshot snapshot);
  extern void FreeSnapshot(Snapshot snapshot);
  extern void FreeXactSnapshot(void);
+ extern int GetOpenSnapshotCount(void);

  /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */
! extern Snapshot GetSnapshotData(Snapshot snapshot);

  #endif   /* TQUAL_H */
Index: src/pl/plpgsql/src/pl_exec.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.191
diff -c -c -r1.191 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c    25 Mar 2007 23:27:59 -0000    1.191
--- src/pl/plpgsql/src/pl_exec.c    26 Mar 2007 21:06:41 -0000
***************
*** 4129,4139 ****
--- 4129,4143 ----
      PG_CATCH();
      {
          /* Restore global vars and propagate error */
+         if (!estate->readonly_func)
+             FreeSnapshot(ActiveSnapshot);
          ActiveSnapshot = saveActiveSnapshot;
          PG_RE_THROW();
      }
      PG_END_TRY();

+     if (!estate->readonly_func)
+         FreeSnapshot(ActiveSnapshot);
      ActiveSnapshot = saveActiveSnapshot;
      SPI_pop();


Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> This is an optimization so I was thinking of something simpler, like
> incrementing/decrementing a counter every time we allocate/free a
> snapshot (like in the patch), and updating MyProc->xmin only if there
> are no open snapshots.  I don't think it is worth tracking anything more
> complicated than that.  Is that now possible to do cleanly?  I am having
> trouble getting this to work in the attached patch.

No kidding.

Even if you get it to pass the regression tests, it'll be a constant
source of bugs.  IMHO the *only* way to do this reliably is to have
a centralized reference-count management module.

I've been thinking more about the avoid-copy idea and I think it will
indeed work, and be simpler than what we have now, to boot.  Ideas:

* There is only one (pair of) statically allocated, max-size XID arrays
for GetSnapshotData to read into.  When we want to acquire a new snap
from GetSnapshotData, we palloc a new SnapshotData struct (and no more),
make it point at the statically allocated arrays, fill and return it.

* There can be at most one SnapshotData struct using the statically
allocated arrays.  We remember its address in a static variable (which
goes NULL when there is no such struct).  If we have to acquire a new
snapshot before the old one's refcount has gone to zero, then at that
time we palloc arrays just big enough for the old one, copy the static
data to there, insert the palloc'd array pointers into the old one,
and clear the static variable (or more likely, immediately set it to
point to the new one).

* All the current CopySnapshot operations are replaced by refcount
increments.  We need explicit refcount-decrement calls added, and
ResourceOwner support to make sure they happen.  (I'm imagining that
snapshots will now live in a dedicated context, so they don't go away
implicitly but need explicit release operations.  The ResourceOwner
infrastructure will help us find any missed releases.)

* Decrementing the refcount on a SnapshotData pfree's it if the refcount
has gone to zero, and also clears the aforementioned static pointer
variable if it happens to point at that selfsame struct.  Note that for
a serializable transaction, or a "simple" read-committed transaction
that doesn't use more than one snap at a time, this will always happen
and so there will be zero copying cost.

* In this scheme the distinction between SerializableSnapshot and
LatestSnapshot vanishes, at least at the level of memory management.
Probably SerializableSnapshot would translate into a refcount held by
xact.c or some other upper-level module.

* We keep all the live MVCC snapshots in a linked list managed by
a "snapshot cache" module.  We can traverse this list to determine
the minimum xmin at any instant.  Freeing a snap whose xmin equals
the current MyProc->xmin causes us to traverse the list, determine
the new backend-wide xmin, and update MyProc->xmin.  Freeing the
last snap causes us to reset MyProc->xmin to 0.  (I'm not completely
sure, but I think that we can just set MyProc->xmin in these cases,
without bothering to lock the ProcArray --- any comments on that?)

* GetSnapshotData no longer cares about serializable vs nonserializable
snaps, either.  Its rule is just "if MyProc->xmin is 0, then set it".
This can happen only when creating the first snap in a transaction that
currently has no live snaps.

* The above two rules clearly suffice to maintain the invariant
"MyProc->xmin is the minimum of the xmins of the live snapshots in the
transaction, or zero when there are none".  Which is what we want.


On the whole though I think we should let this idea go till 8.4; we have
a lot to deal with for 8.3 and a definite shortage of evidence that
advancing xmin will buy much.  Mu gut feeling is that the above design
would save about enough in snapshot-copying costs to pay for its extra
management logic, but we won't come out ahead unless advancing xmin
intra-transaction really helps, and I'm not sure I believe that (except
in the special case of VACUUM, and we already have a solution for that,
which would be independent of this).

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> On the whole though I think we should let this idea go till 8.4; we have
> a lot to deal with for 8.3 and a definite shortage of evidence that
> advancing xmin will buy much.  Mu gut feeling is that the above design
> would save about enough in snapshot-copying costs to pay for its extra
> management logic, but we won't come out ahead unless advancing xmin
> intra-transaction really helps, and I'm not sure I believe that (except
> in the special case of VACUUM, and we already have a solution for that,
> which would be independent of this).

The improvement is going to be a win for multi-statement transactions
--- the only question is how often are they long-running.

It does seem best to put this on the TODO for 8.4, and I will do that
now.  The only thing that makes it tempting to get into 8.3 is that we
could advertise this release as a major "space reuse" release because of
HOT, autovacuum on by default, multiple autovacuum processes, and, if we
added it, improved VACUUM for multi-statement transactions.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Improvement of procArray.xmin for VACUUM

From
Gregory Stark
Date:
I have a question about what would happen for a transaction running a command
like COPY FROM. Is it possible it would manage to arrange to have no live
snapshots at all? So it would have no impact on concurrent VACUUMs? What about
something running a large pg_restore?

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> On the whole though I think we should let this idea go till 8.4;

I tend to agree but for a different reason. I think it's something that will
open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I
think (or hope at any rate) we'll think of at least a few new things we can do
with the new more precise information it exposes.

Just as an example, if you find you have no live snapshots can you throw out
the combocid hash? Any tuple you find with a combocid that's been discarded
that way must predate your current scan and therefore is deleted for you.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Gregory Stark wrote:
>
> I have a question about what would happen for a transaction running a command
> like COPY FROM. Is it possible it would manage to arrange to have no live
> snapshots at all? So it would have no impact on concurrent VACUUMs? What about
> something running a large pg_restore?

Interesting idea.  If the table had triggers, it would need a snapshot,
but if not, yea, that is certainly possible.

---------------------------------------------------------------------------


>
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > On the whole though I think we should let this idea go till 8.4;
>
> I tend to agree but for a different reason. I think it's something that will
> open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I
> think (or hope at any rate) we'll think of at least a few new things we can do
> with the new more precise information it exposes.
>
> Just as an example, if you find you have no live snapshots can you throw out
> the combocid hash? Any tuple you find with a combocid that's been discarded
> that way must predate your current scan and therefore is deleted for you.
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> ... but we won't come out ahead unless advancing xmin
>> intra-transaction really helps, and I'm not sure I believe that (except
>> in the special case of VACUUM, and we already have a solution for that,
>> which would be independent of this).

> The improvement is going to be a win for multi-statement transactions
> --- the only question is how often are they long-running.

Uh, no, that's not very clear.  A long-running transaction will be a
VACUUM bottleneck because of its own XID, never mind its xmin.  To make
this helpful, you have to posit a lot of overlapping long-running
transactions (such that the distance back to GlobalXmin might average
about twice the distance back to the oldest live XID).  That's not
impossible but I wonder whether it's not mostly a token of bad
application design.

> It does seem best to put this on the TODO for 8.4, and I will do that
> now.

Agreed.  Quite aside from the time needed for a reasonable
implementation, we'd really need to do more performance-testing than we
have time for now.

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Gregory Stark wrote:
>> I have a question about what would happen for a transaction running a command
>> like COPY FROM. Is it possible it would manage to arrange to have no live
>> snapshots at all? So it would have no impact on concurrent VACUUMs? What about
>> something running a large pg_restore?

> Interesting idea.

Indeed.  Currently, COPY forcibly sets a snapshot on the off chance
something will use it, but I could certainly see making that happen
"lazily", ie not at all in the simple case.

pg_restore is probably a lost cause, at least if you are running it
in single-transaction mode.  I guess there'd be tradeoffs as to whether
to do that or not ...

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> ... but we won't come out ahead unless advancing xmin
> >> intra-transaction really helps, and I'm not sure I believe that (except
> >> in the special case of VACUUM, and we already have a solution for that,
> >> which would be independent of this).
>
> > The improvement is going to be a win for multi-statement transactions
> > --- the only question is how often are they long-running.
>
> Uh, no, that's not very clear.  A long-running transaction will be a
> VACUUM bottleneck because of its own XID, never mind its xmin.  To make
> this helpful, you have to posit a lot of overlapping long-running
> transactions (such that the distance back to GlobalXmin might average
> about twice the distance back to the oldest live XID).  That's not
> impossible but I wonder whether it's not mostly a token of bad
> application design.

Well, my secondary addition was to start MyProc->xmin with the current
xid counter, rather than your own xid.  This is because your xid is
already in progress and so will not be touched, so why not start with
the current xid counter.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Gregory Stark wrote:
> >> I have a question about what would happen for a transaction running a command
> >> like COPY FROM. Is it possible it would manage to arrange to have no live
> >> snapshots at all? So it would have no impact on concurrent VACUUMs? What about
> >> something running a large pg_restore?
>
> > Interesting idea.
>
> Indeed.  Currently, COPY forcibly sets a snapshot on the off chance
> something will use it, but I could certainly see making that happen
> "lazily", ie not at all in the simple case.
>
> pg_restore is probably a lost cause, at least if you are running it
> in single-transaction mode.  I guess there'd be tradeoffs as to whether
> to do that or not ...

The bottom line is that more optimizations for VACUUM dead tuple
identification are possible.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Uh, no, that's not very clear.  A long-running transaction will be a
>> VACUUM bottleneck because of its own XID, never mind its xmin.

> Well, my secondary addition was to start MyProc->xmin with the current
> xid counter, rather than your own xid.

Don't tell me you seriously believe that would work.

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> Uh, no, that's not very clear.  A long-running transaction will be a
> >> VACUUM bottleneck because of its own XID, never mind its xmin.
>
> > Well, my secondary addition was to start MyProc->xmin with the current
> > xid counter, rather than your own xid.
>
> Don't tell me you seriously believe that would work.

I do.  Please tell me why it will not.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Improvement of procArray.xmin for VACUUM

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Well, my secondary addition was to start MyProc->xmin with the current
>>> xid counter, rather than your own xid.
>>
>> Don't tell me you seriously believe that would work.

> I do.  Please tell me why it will not.

You can't have GlobalXmin greater than your own xid, else log space
(particularly pg_subtrans) may be recycled too soon.

            regards, tom lane

Re: Improvement of procArray.xmin for VACUUM

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> Bruce Momjian <bruce@momjian.us> writes:
> >>> Well, my secondary addition was to start MyProc->xmin with the current
> >>> xid counter, rather than your own xid.
> >>
> >> Don't tell me you seriously believe that would work.
>
> > I do.  Please tell me why it will not.
>
> You can't have GlobalXmin greater than your own xid, else log space
> (particularly pg_subtrans) may be recycled too soon.

OK, so maybe GlobalXmin would have to be split into two new variables
that control log space and dead-row detection separately.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +