Thread: Improvement of procArray.xmin for VACUUM
VACUUM treats dead tuple with an xmax less than all procArray.xmin's as invisible with space ready to be reused. Right now, we set procArray.xmin at transaction start to the required SERIALIZABLE value. We currently don't update procArray.xmin when we know we are in a READ COMMITTED transaction, even though we already call GetSnapshotData(). This means that if a transaction completes that was active when our transaction started, we don't update our procArray.xmin during the next multi-statement transaction command to indicate that we don't care about the completed transaction anymore. I have been thinking we could improve how quickly VACUUM can expire rows if we update procArray.xmin more frequently for non-SERIALIZABLE transactions. The attached patch updates procArray.xmin in this manner. Here is an example of how the patch improves dead row reuse: Session #: 1 2 3 CREATE TABLE test(x int); INSERT INTO test VALUES (1); BEGIN; DELETE FROM test; BEGIN; SELECT 1; COMMIT; VACUUM VERBOSE test; (row not reused) SELECT 1; (At this point #2 doesn't see the test row anymore. Patch updates procArray.xmin.) VACUUM VERBOSE test; (row reused with patch) COMMIT; VACUUM VERBOSE test; (normal row reuse) What the patch does is to recompute procArray.xmin during the second SELECT, which is then used by the next VACUUM. The patch has debug code that prints the procArray.xmin assignments. I am not sure if I have all the boolean calls to GetTransactionSnapshot() correct. The set_procArray_xmin parameter should be true only when we are sure we aren't going to be resetting the session back to an earlier snapshot. The major affect of the patch is to set the minimum procArray.xmin to be the earliest in-process transaction, rather than the earliest in-process transaction at transaction start. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.280 diff -c -c -r1.280 index.c *** src/backend/catalog/index.c 3 Mar 2007 20:08:41 -0000 1.280 --- src/backend/catalog/index.c 23 Mar 2007 19:20:44 -0000 *************** *** 1394,1400 **** } else if (indexInfo->ii_Concurrent) { ! snapshot = CopySnapshot(GetTransactionSnapshot()); OldestXmin = InvalidTransactionId; /* not used */ } else --- 1394,1400 ---- } else if (indexInfo->ii_Concurrent) { ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); OldestXmin = InvalidTransactionId; /* not used */ } else Index: src/backend/commands/cluster.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/cluster.c 23 Mar 2007 19:20:45 -0000 *************** *** 204,210 **** /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); cluster_rel(rvtc, true); CommitTransactionCommand(); } --- 204,210 ---- /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); cluster_rel(rvtc, true); CommitTransactionCommand(); } Index: src/backend/commands/indexcmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.157 diff -c -c -r1.157 indexcmds.c *** src/backend/commands/indexcmds.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/indexcmds.c 23 Mar 2007 19:20:45 -0000 *************** *** 493,499 **** * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot()); ActiveSnapshot = snapshot; /* --- 493,499 ---- * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); ActiveSnapshot = snapshot; /* *************** *** 1304,1310 **** StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); if (reindex_relation(relid, true)) ereport(NOTICE, (errmsg("table \"%s\" was reindexed", --- 1304,1310 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); if (reindex_relation(relid, true)) ereport(NOTICE, (errmsg("table \"%s\" was reindexed", Index: src/backend/commands/trigger.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.214 diff -c -c -r1.214 trigger.c *** src/backend/commands/trigger.c 19 Mar 2007 23:38:29 -0000 1.214 --- src/backend/commands/trigger.c 23 Mar 2007 19:20:46 -0000 *************** *** 2645,2651 **** */ events = &afterTriggers->events; if (events->head != NULL) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Run all the remaining triggers. Loop until they are all gone, just in --- 2645,2651 ---- */ events = &afterTriggers->events; if (events->head != NULL) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Run all the remaining triggers. Loop until they are all gone, just in Index: src/backend/commands/vacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v retrieving revision 1.349 diff -c -c -r1.349 vacuum.c *** src/backend/commands/vacuum.c 14 Mar 2007 18:48:55 -0000 1.349 --- src/backend/commands/vacuum.c 23 Mar 2007 19:20:47 -0000 *************** *** 412,418 **** { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else old_context = MemoryContextSwitchTo(anl_context); --- 412,418 ---- { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } else old_context = MemoryContextSwitchTo(anl_context); *************** *** 470,476 **** * necessary if we are called from autovacuum because autovacuum might * continue on to do an ANALYZE-only call. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess()) --- 470,476 ---- * necessary if we are called from autovacuum because autovacuum might * continue on to do an ANALYZE-only call. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess()) *************** *** 964,970 **** if (vacstmt->full) { /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else { --- 964,970 ---- if (vacstmt->full) { /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } else { Index: src/backend/executor/functions.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v retrieving revision 1.112 diff -c -c -r1.112 functions.c *** src/backend/executor/functions.c 13 Mar 2007 00:33:40 -0000 1.112 --- src/backend/executor/functions.c 23 Mar 2007 19:20:48 -0000 *************** *** 300,306 **** else { CommandCounterIncrement(); ! snapshot = CopySnapshot(GetTransactionSnapshot()); } if (IsA(es->stmt, PlannedStmt)) --- 300,306 ---- else { CommandCounterIncrement(); ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); } if (IsA(es->stmt, PlannedStmt)) Index: src/backend/executor/spi.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/spi.c,v retrieving revision 1.173 diff -c -c -r1.173 spi.c *** src/backend/executor/spi.c 17 Mar 2007 03:15:38 -0000 1.173 --- src/backend/executor/spi.c 23 Mar 2007 19:20:48 -0000 *************** *** 997,1003 **** else { CommandCounterIncrement(); ! snapshot = GetTransactionSnapshot(); } /* --- 997,1003 ---- else { CommandCounterIncrement(); ! snapshot = GetTransactionSnapshot(false); } /* *************** *** 1530,1536 **** if (read_only) ActiveSnapshot = CopySnapshot(saveActiveSnapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else { --- 1530,1536 ---- if (read_only) ActiveSnapshot = CopySnapshot(saveActiveSnapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } else { Index: src/backend/postmaster/autovacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.34 diff -c -c -r1.34 autovacuum.c *** src/backend/postmaster/autovacuum.c 13 Mar 2007 00:33:41 -0000 1.34 --- src/backend/postmaster/autovacuum.c 23 Mar 2007 19:20:49 -0000 *************** *** 868,874 **** StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Clean up any dead statistics collector entries for this DB. We always --- 868,874 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Clean up any dead statistics collector entries for this DB. We always Index: src/backend/storage/ipc/procarray.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v retrieving revision 1.22 diff -c -c -r1.22 procarray.c *** src/backend/storage/ipc/procarray.c 23 Mar 2007 03:16:39 -0000 1.22 --- src/backend/storage/ipc/procarray.c 23 Mar 2007 19:20:49 -0000 *************** *** 498,504 **** *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot, bool serializable) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; --- 498,504 ---- *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; *************** *** 510,519 **** Assert(snapshot != NULL); ! /* Serializable snapshot must be computed before any other... */ ! Assert(serializable ? ! !TransactionIdIsValid(MyProc->xmin) : ! TransactionIdIsValid(MyProc->xmin)); /* * Allocating space for maxProcs xids is usually overkill; numProcs would --- 510,517 ---- Assert(snapshot != NULL); ! /* Either we set an xmin, or we already have one. */ ! Assert(set_procArray_xmin || TransactionIdIsValid(MyProc->xmin)); /* * Allocating space for maxProcs xids is usually overkill; numProcs would *************** *** 656,664 **** } } ! if (serializable) MyProc->xmin = TransactionXmin = xmin; ! LWLockRelease(ProcArrayLock); /* --- 654,668 ---- } } ! if (set_procArray_xmin) ! { ! if (xmin != MyProc->xmin) ! fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin); ! else ! fprintf(stderr, "%x:%x: not set\n", MyProcPid, MyProc->xid); MyProc->xmin = TransactionXmin = xmin; ! } ! LWLockRelease(ProcArrayLock); /* Index: src/backend/tcop/fastpath.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v retrieving revision 1.96 diff -c -c -r1.96 fastpath.c *** src/backend/tcop/fastpath.c 3 Mar 2007 19:32:54 -0000 1.96 --- src/backend/tcop/fastpath.c 23 Mar 2007 19:20:49 -0000 *************** *** 308,314 **** * Now that we know we are in a valid transaction, set snapshot in case * needed by function itself or one of the datatype I/O routines. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Begin parsing the buffer contents. --- 308,314 ---- * Now that we know we are in a valid transaction, set snapshot in case * needed by function itself or one of the datatype I/O routines. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Begin parsing the buffer contents. Index: src/backend/tcop/postgres.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.529 diff -c -c -r1.529 postgres.c *** src/backend/tcop/postgres.c 22 Mar 2007 19:55:04 -0000 1.529 --- src/backend/tcop/postgres.c 23 Mar 2007 19:20:51 -0000 *************** *** 738,744 **** { if (needSnapshot) { ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); needSnapshot = false; } stmt = (Node *) pg_plan_query(query, boundParams); --- 738,744 ---- { if (needSnapshot) { ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); needSnapshot = false; } stmt = (Node *) pg_plan_query(query, boundParams); Index: src/backend/tcop/pquery.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v retrieving revision 1.115 diff -c -c -r1.115 pquery.c *** src/backend/tcop/pquery.c 13 Mar 2007 00:33:42 -0000 1.115 --- src/backend/tcop/pquery.c 23 Mar 2007 19:20:51 -0000 *************** *** 154,160 **** * Must always set snapshot for plannable queries. Note we assume that * caller will take care of restoring ActiveSnapshot on exit/error. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Create the QueryDesc object --- 154,160 ---- * Must always set snapshot for plannable queries. Note we assume that * caller will take care of restoring ActiveSnapshot on exit/error. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Create the QueryDesc object *************** *** 489,495 **** if (snapshot) ActiveSnapshot = CopySnapshot(snapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Create QueryDesc in portal's context; for the moment, set --- 489,495 ---- if (snapshot) ActiveSnapshot = CopySnapshot(snapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Create QueryDesc in portal's context; for the moment, set *************** *** 1163,1169 **** IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || IsA(utilityStmt, CheckPointStmt))) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); else ActiveSnapshot = NULL; --- 1163,1169 ---- IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || IsA(utilityStmt, CheckPointStmt))) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); else ActiveSnapshot = NULL; Index: src/backend/utils/adt/ri_triggers.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v retrieving revision 1.92 diff -c -c -r1.92 ri_triggers.c *** src/backend/utils/adt/ri_triggers.c 15 Mar 2007 23:12:06 -0000 1.92 --- src/backend/utils/adt/ri_triggers.c 23 Mar 2007 19:20:52 -0000 *************** *** 3283,3289 **** { CommandCounterIncrement(); /* be sure all my own work is visible */ test_snapshot = CopySnapshot(GetLatestSnapshot()); ! crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot()); } else { --- 3283,3289 ---- { CommandCounterIncrement(); /* be sure all my own work is visible */ test_snapshot = CopySnapshot(GetLatestSnapshot()); ! crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot(false)); } else { Index: src/backend/utils/time/tqual.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v retrieving revision 1.101 diff -c -c -r1.101 tqual.c *** src/backend/utils/time/tqual.c 5 Jan 2007 22:19:47 -0000 1.101 --- src/backend/utils/time/tqual.c 23 Mar 2007 19:20:53 -0000 *************** *** 1202,1208 **** * the result with CopySnapshot() if it is to be used very long. */ Snapshot ! GetTransactionSnapshot(void) { /* First call in transaction? */ if (SerializableSnapshot == NULL) --- 1202,1208 ---- * the result with CopySnapshot() if it is to be used very long. */ Snapshot ! GetTransactionSnapshot(bool set_procArray_xmin) { /* First call in transaction? */ if (SerializableSnapshot == NULL) *************** *** 1214,1220 **** if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false); return LatestSnapshot; } --- 1214,1221 ---- if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, ! /* we know we are not Serializable */ set_procArray_xmin); return LatestSnapshot; } Index: src/include/utils/tqual.h =================================================================== RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v retrieving revision 1.65 diff -c -c -r1.65 tqual.h *** src/include/utils/tqual.h 5 Jan 2007 22:19:59 -0000 1.65 --- src/include/utils/tqual.h 23 Mar 2007 19:20:54 -0000 *************** *** 142,154 **** extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin, Buffer buffer); ! extern Snapshot GetTransactionSnapshot(void); extern Snapshot GetLatestSnapshot(void); extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable); #endif /* TQUAL_H */ --- 142,154 ---- extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin, Buffer buffer); ! extern Snapshot GetTransactionSnapshot(bool set_procArray_xmin); extern Snapshot GetLatestSnapshot(void); extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin); #endif /* TQUAL_H */ Index: src/pl/plpgsql/src/pl_exec.c =================================================================== RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.190 diff -c -c -r1.190 pl_exec.c *** src/pl/plpgsql/src/pl_exec.c 15 Mar 2007 23:12:07 -0000 1.190 --- src/pl/plpgsql/src/pl_exec.c 23 Mar 2007 19:20:55 -0000 *************** *** 4124,4130 **** if (!estate->readonly_func) { CommandCounterIncrement(); ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } /* --- 4124,4130 ---- if (!estate->readonly_func) { CommandCounterIncrement(); ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } /*
Bruce Momjian wrote: > The attached patch updates procArray.xmin in this manner. Here is an > example of how the patch improves dead row reuse: I don't think this really works. Consider what happens if I change session 2 this way: > Session #: > 1 2 3 > > CREATE TABLE test(x int); > INSERT INTO test VALUES (1); > BEGIN; > DELETE FROM test; > BEGIN; DECLARE foo CURSOR FOR SELECT * FROM test; > SELECT 1; > COMMIT; > VACUUM VERBOSE test; > (row not reused) > SELECT 1; FETCH * FROM foo; > > (At this point #2 doesn't see > the test row anymore. Patch > updates procArray.xmin.) > > VACUUM VERBOSE test; > (row reused with patch) > COMMIT; > VACUUM VERBOSE test; > (normal row reuse) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Bruce Momjian <bruce@momjian.us> writes: > I have been thinking we could improve how quickly VACUUM can expire rows > if we update procArray.xmin more frequently for non-SERIALIZABLE > transactions. > The attached patch updates procArray.xmin in this manner. This patch is incredibly broken. Didn't you understand what I said about how we don't track which snapshots are still alive? You can't advance procArray.xmin past the xmin of the oldest live snapshot in the backend, and you can't assume that there are no live snapshots at the places where this patch assumes that. (Open cursors are one obvious counterexample, but I think there are more.) To make intra-transaction advancing of xmin possible, we'd need to explicitly track all of the backend's live snapshots, probably by introducing a "snapshot cache" manager that gives out tracked refcounts as we do for some other structures like catcache entries. This might have some other advantages (I think most of the current CopySnapshot operations could be replaced by refcount increments) but it's a whole lot more complicated and invasive than what you've got here. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I have been thinking we could improve how quickly VACUUM can expire rows > > if we update procArray.xmin more frequently for non-SERIALIZABLE > > transactions. > > The attached patch updates procArray.xmin in this manner. > > This patch is incredibly broken. Didn't you understand what I said > about how we don't track which snapshots are still alive? You can't > advance procArray.xmin past the xmin of the oldest live snapshot in the > backend, and you can't assume that there are no live snapshots at the > places where this patch assumes that. (Open cursors are one obvious > counterexample, but I think there are more.) > > To make intra-transaction advancing of xmin possible, we'd need to > explicitly track all of the backend's live snapshots, probably by > introducing a "snapshot cache" manager that gives out tracked refcounts > as we do for some other structures like catcache entries. This might > have some other advantages (I think most of the current CopySnapshot > operations could be replaced by refcount increments) but it's a whole > lot more complicated and invasive than what you've got here. I updated the patch to save the MyProc->xid at the time the first cursor is created, and not allow the MyProc->xid to be set lower than that saved value in the current transaction. It added only a few more lines to the patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/access/transam/xact.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.238 diff -c -c -r1.238 xact.c *** src/backend/access/transam/xact.c 22 Mar 2007 19:55:04 -0000 1.238 --- src/backend/access/transam/xact.c 24 Mar 2007 02:18:41 -0000 *************** *** 1563,1569 **** LWLockRelease(ProcArrayLock); } ! PG_TRACE1(transaction__commit, s->transactionId); /* --- 1563,1570 ---- LWLockRelease(ProcArrayLock); } ! set_portal_min_xid(InvalidTransactionId); ! PG_TRACE1(transaction__commit, s->transactionId); /* *************** *** 1802,1807 **** --- 1803,1810 ---- LWLockRelease(ProcArrayLock); + set_portal_min_xid(InvalidTransactionId); + /* * This is all post-transaction cleanup. Note that if an error is raised * here, it's too late to abort the transaction. This should be just *************** *** 1969,1974 **** --- 1972,1979 ---- LWLockRelease(ProcArrayLock); } + set_portal_min_xid(InvalidTransactionId); + PG_TRACE1(transaction__abort, s->transactionId); /* Index: src/backend/catalog/index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.280 diff -c -c -r1.280 index.c *** src/backend/catalog/index.c 3 Mar 2007 20:08:41 -0000 1.280 --- src/backend/catalog/index.c 24 Mar 2007 02:18:42 -0000 *************** *** 1394,1400 **** } else if (indexInfo->ii_Concurrent) { ! snapshot = CopySnapshot(GetTransactionSnapshot()); OldestXmin = InvalidTransactionId; /* not used */ } else --- 1394,1400 ---- } else if (indexInfo->ii_Concurrent) { ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); OldestXmin = InvalidTransactionId; /* not used */ } else Index: src/backend/commands/cluster.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/cluster.c 24 Mar 2007 02:18:42 -0000 *************** *** 204,210 **** /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); cluster_rel(rvtc, true); CommitTransactionCommand(); } --- 204,210 ---- /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); cluster_rel(rvtc, true); CommitTransactionCommand(); } Index: src/backend/commands/indexcmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.157 diff -c -c -r1.157 indexcmds.c *** src/backend/commands/indexcmds.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/indexcmds.c 24 Mar 2007 02:18:42 -0000 *************** *** 493,499 **** * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot()); ActiveSnapshot = snapshot; /* --- 493,499 ---- * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); ActiveSnapshot = snapshot; /* *************** *** 1304,1310 **** StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); if (reindex_relation(relid, true)) ereport(NOTICE, (errmsg("table \"%s\" was reindexed", --- 1304,1310 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); if (reindex_relation(relid, true)) ereport(NOTICE, (errmsg("table \"%s\" was reindexed", Index: src/backend/commands/trigger.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.214 diff -c -c -r1.214 trigger.c *** src/backend/commands/trigger.c 19 Mar 2007 23:38:29 -0000 1.214 --- src/backend/commands/trigger.c 24 Mar 2007 02:18:43 -0000 *************** *** 2645,2651 **** */ events = &afterTriggers->events; if (events->head != NULL) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Run all the remaining triggers. Loop until they are all gone, just in --- 2645,2651 ---- */ events = &afterTriggers->events; if (events->head != NULL) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Run all the remaining triggers. Loop until they are all gone, just in Index: src/backend/commands/vacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v retrieving revision 1.349 diff -c -c -r1.349 vacuum.c *** src/backend/commands/vacuum.c 14 Mar 2007 18:48:55 -0000 1.349 --- src/backend/commands/vacuum.c 24 Mar 2007 02:18:45 -0000 *************** *** 412,418 **** { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else old_context = MemoryContextSwitchTo(anl_context); --- 412,418 ---- { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } else old_context = MemoryContextSwitchTo(anl_context); *************** *** 470,476 **** * necessary if we are called from autovacuum because autovacuum might * continue on to do an ANALYZE-only call. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess()) --- 470,476 ---- * necessary if we are called from autovacuum because autovacuum might * continue on to do an ANALYZE-only call. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } if (vacstmt->vacuum && !IsAutoVacuumWorkerProcess()) *************** *** 964,970 **** if (vacstmt->full) { /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else { --- 964,970 ---- if (vacstmt->full) { /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } else { Index: src/backend/executor/functions.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v retrieving revision 1.112 diff -c -c -r1.112 functions.c *** src/backend/executor/functions.c 13 Mar 2007 00:33:40 -0000 1.112 --- src/backend/executor/functions.c 24 Mar 2007 02:18:45 -0000 *************** *** 300,306 **** else { CommandCounterIncrement(); ! snapshot = CopySnapshot(GetTransactionSnapshot()); } if (IsA(es->stmt, PlannedStmt)) --- 300,306 ---- else { CommandCounterIncrement(); ! snapshot = CopySnapshot(GetTransactionSnapshot(false)); } if (IsA(es->stmt, PlannedStmt)) Index: src/backend/executor/spi.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/spi.c,v retrieving revision 1.173 diff -c -c -r1.173 spi.c *** src/backend/executor/spi.c 17 Mar 2007 03:15:38 -0000 1.173 --- src/backend/executor/spi.c 24 Mar 2007 02:18:46 -0000 *************** *** 15,23 **** --- 15,25 ---- #include "postgres.h" #include "access/printtup.h" + #include "access/transam.h" #include "catalog/heap.h" #include "commands/trigger.h" #include "executor/spi_priv.h" + #include "storage/procarray.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/typcache.h" *************** *** 57,63 **** static MemoryContext _SPI_procmem(void); static bool _SPI_checktuples(void); - /* =================== interface functions =================== */ int --- 59,64 ---- *************** *** 997,1004 **** else { CommandCounterIncrement(); ! snapshot = GetTransactionSnapshot(); } /* * Start portal execution. --- 998,1006 ---- else { CommandCounterIncrement(); ! snapshot = GetTransactionSnapshot(false); } + set_portal_min_xid(TransactionXmin); /* * Start portal execution. *************** *** 1530,1536 **** if (read_only) ActiveSnapshot = CopySnapshot(saveActiveSnapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else { --- 1532,1538 ---- if (read_only) ActiveSnapshot = CopySnapshot(saveActiveSnapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } else { Index: src/backend/postmaster/autovacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.34 diff -c -c -r1.34 autovacuum.c *** src/backend/postmaster/autovacuum.c 13 Mar 2007 00:33:41 -0000 1.34 --- src/backend/postmaster/autovacuum.c 24 Mar 2007 02:18:46 -0000 *************** *** 868,874 **** StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Clean up any dead statistics collector entries for this DB. We always --- 868,874 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Clean up any dead statistics collector entries for this DB. We always Index: src/backend/storage/ipc/procarray.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v retrieving revision 1.22 diff -c -c -r1.22 procarray.c *** src/backend/storage/ipc/procarray.c 23 Mar 2007 03:16:39 -0000 1.22 --- src/backend/storage/ipc/procarray.c 24 Mar 2007 02:18:47 -0000 *************** *** 54,60 **** } ProcArrayStruct; static ProcArrayStruct *procArray; ! #ifdef XIDCACHE_DEBUG --- 54,60 ---- } ProcArrayStruct; static ProcArrayStruct *procArray; ! static TransactionId portal_min_xid = InvalidTransactionId; #ifdef XIDCACHE_DEBUG *************** *** 498,504 **** *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot, bool serializable) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; --- 498,504 ---- *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; *************** *** 510,519 **** Assert(snapshot != NULL); ! /* Serializable snapshot must be computed before any other... */ ! Assert(serializable ? ! !TransactionIdIsValid(MyProc->xmin) : ! TransactionIdIsValid(MyProc->xmin)); /* * Allocating space for maxProcs xids is usually overkill; numProcs would --- 510,517 ---- Assert(snapshot != NULL); ! /* Either we set an xmin, or we already have one. */ ! Assert(set_procArray_xmin || TransactionIdIsValid(MyProc->xmin)); /* * Allocating space for maxProcs xids is usually overkill; numProcs would *************** *** 656,664 **** } } ! if (serializable) ! MyProc->xmin = TransactionXmin = xmin; ! LWLockRelease(ProcArrayLock); /* --- 654,684 ---- } } ! if (set_procArray_xmin) ! { ! if (xmin != MyProc->xmin) ! fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin); ! else ! fprintf(stderr, "%x:%x: not set\n", MyProcPid, MyProc->xid); ! /* ! * --- ! * We need to set MyProc->xmin in a few cases: ! * o set at transaction start for SERIALIZABLE visibility ! * o set for new command in READ COMMITTED mode so ! * VACUUM knows we don't care about completed transactions ! * ! * The problem is that we might have old cursors/portals ! * that need to have old transaction visibility, so we use ! * portal_min_xid to record MyProc->xmin during the first ! * cursors/portals creation, and then don't allow MyProc->xmin ! * to be set lower. ! */ ! if (!TransactionIdIsValid(MyProc->xmin) || ! !TransactionIdIsValid(portal_min_xid) || ! TransactionIdPrecedesOrEquals(xmin, portal_min_xid)) ! MyProc->xmin = TransactionXmin = xmin; ! } ! LWLockRelease(ProcArrayLock); /* *************** *** 987,992 **** --- 1007,1023 ---- LWLockRelease(ProcArrayLock); } + void set_portal_min_xid(TransactionId xid) + { + /* + * We only want portal_min_xid to be the first + * portal's MyProc->xmin, or set to invalid. + */ + if (portal_min_xid == InvalidTransactionId || + xid == InvalidTransactionId) + portal_min_xid = xid; + } + #ifdef XIDCACHE_DEBUG /* Index: src/backend/tcop/fastpath.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v retrieving revision 1.96 diff -c -c -r1.96 fastpath.c *** src/backend/tcop/fastpath.c 3 Mar 2007 19:32:54 -0000 1.96 --- src/backend/tcop/fastpath.c 24 Mar 2007 02:18:47 -0000 *************** *** 308,314 **** * Now that we know we are in a valid transaction, set snapshot in case * needed by function itself or one of the datatype I/O routines. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Begin parsing the buffer contents. --- 308,314 ---- * Now that we know we are in a valid transaction, set snapshot in case * needed by function itself or one of the datatype I/O routines. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Begin parsing the buffer contents. Index: src/backend/tcop/postgres.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.529 diff -c -c -r1.529 postgres.c *** src/backend/tcop/postgres.c 22 Mar 2007 19:55:04 -0000 1.529 --- src/backend/tcop/postgres.c 24 Mar 2007 02:18:48 -0000 *************** *** 738,744 **** { if (needSnapshot) { ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); needSnapshot = false; } stmt = (Node *) pg_plan_query(query, boundParams); --- 738,744 ---- { if (needSnapshot) { ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(true)); needSnapshot = false; } stmt = (Node *) pg_plan_query(query, boundParams); Index: src/backend/tcop/pquery.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v retrieving revision 1.115 diff -c -c -r1.115 pquery.c *** src/backend/tcop/pquery.c 13 Mar 2007 00:33:42 -0000 1.115 --- src/backend/tcop/pquery.c 24 Mar 2007 02:18:48 -0000 *************** *** 154,160 **** * Must always set snapshot for plannable queries. Note we assume that * caller will take care of restoring ActiveSnapshot on exit/error. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Create the QueryDesc object --- 154,160 ---- * Must always set snapshot for plannable queries. Note we assume that * caller will take care of restoring ActiveSnapshot on exit/error. */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Create the QueryDesc object *************** *** 489,495 **** if (snapshot) ActiveSnapshot = CopySnapshot(snapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* * Create QueryDesc in portal's context; for the moment, set --- 489,495 ---- if (snapshot) ActiveSnapshot = CopySnapshot(snapshot); else ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); /* * Create QueryDesc in portal's context; for the moment, set *************** *** 1163,1169 **** IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || IsA(utilityStmt, CheckPointStmt))) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); else ActiveSnapshot = NULL; --- 1163,1169 ---- IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || IsA(utilityStmt, CheckPointStmt))) ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); else ActiveSnapshot = NULL; Index: src/backend/utils/adt/ri_triggers.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v retrieving revision 1.92 diff -c -c -r1.92 ri_triggers.c *** src/backend/utils/adt/ri_triggers.c 15 Mar 2007 23:12:06 -0000 1.92 --- src/backend/utils/adt/ri_triggers.c 24 Mar 2007 02:18:50 -0000 *************** *** 3283,3289 **** { CommandCounterIncrement(); /* be sure all my own work is visible */ test_snapshot = CopySnapshot(GetLatestSnapshot()); ! crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot()); } else { --- 3283,3289 ---- { CommandCounterIncrement(); /* be sure all my own work is visible */ test_snapshot = CopySnapshot(GetLatestSnapshot()); ! crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot(false)); } else { Index: src/backend/utils/time/tqual.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v retrieving revision 1.101 diff -c -c -r1.101 tqual.c *** src/backend/utils/time/tqual.c 5 Jan 2007 22:19:47 -0000 1.101 --- src/backend/utils/time/tqual.c 24 Mar 2007 02:18:50 -0000 *************** *** 1202,1208 **** * the result with CopySnapshot() if it is to be used very long. */ Snapshot ! GetTransactionSnapshot(void) { /* First call in transaction? */ if (SerializableSnapshot == NULL) --- 1202,1208 ---- * the result with CopySnapshot() if it is to be used very long. */ Snapshot ! GetTransactionSnapshot(bool set_procArray_xmin) { /* First call in transaction? */ if (SerializableSnapshot == NULL) *************** *** 1214,1220 **** if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false); return LatestSnapshot; } --- 1214,1221 ---- if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, ! /* we know we are not Serializable */ set_procArray_xmin); return LatestSnapshot; } Index: src/include/storage/procarray.h =================================================================== RCS file: /cvsroot/pgsql/src/include/storage/procarray.h,v retrieving revision 1.12 diff -c -c -r1.12 procarray.h *** src/include/storage/procarray.h 16 Jan 2007 13:28:57 -0000 1.12 --- src/include/storage/procarray.h 24 Mar 2007 02:18:51 -0000 *************** *** 37,41 **** --- 37,42 ---- extern void XidCacheRemoveRunningXids(TransactionId xid, int nxids, TransactionId *xids); + extern void set_portal_min_xid(TransactionId xid); #endif /* PROCARRAY_H */ Index: src/include/utils/tqual.h =================================================================== RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v retrieving revision 1.65 diff -c -c -r1.65 tqual.h *** src/include/utils/tqual.h 5 Jan 2007 22:19:59 -0000 1.65 --- src/include/utils/tqual.h 24 Mar 2007 02:18:51 -0000 *************** *** 142,154 **** extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin, Buffer buffer); ! extern Snapshot GetTransactionSnapshot(void); extern Snapshot GetLatestSnapshot(void); extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable); #endif /* TQUAL_H */ --- 142,154 ---- extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin, Buffer buffer); ! extern Snapshot GetTransactionSnapshot(bool set_procArray_xmin); extern Snapshot GetLatestSnapshot(void); extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot, bool set_procArray_xmin); #endif /* TQUAL_H */ Index: src/pl/plpgsql/src/pl_exec.c =================================================================== RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.190 diff -c -c -r1.190 pl_exec.c *** src/pl/plpgsql/src/pl_exec.c 15 Mar 2007 23:12:07 -0000 1.190 --- src/pl/plpgsql/src/pl_exec.c 24 Mar 2007 02:18:53 -0000 *************** *** 4124,4130 **** if (!estate->readonly_func) { CommandCounterIncrement(); ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } /* --- 4124,4130 ---- if (!estate->readonly_func) { CommandCounterIncrement(); ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot(false)); } /*
Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> I have been thinking we could improve how quickly VACUUM can expire rows >>> if we update procArray.xmin more frequently for non-SERIALIZABLE >>> transactions. >>> The attached patch updates procArray.xmin in this manner. >> This patch is incredibly broken. Didn't you understand what I said >> about how we don't track which snapshots are still alive? You can't >> advance procArray.xmin past the xmin of the oldest live snapshot in the >> backend, and you can't assume that there are no live snapshots at the >> places where this patch assumes that. (Open cursors are one obvious >> counterexample, but I think there are more.) >> >> To make intra-transaction advancing of xmin possible, we'd need to >> explicitly track all of the backend's live snapshots, probably by >> introducing a "snapshot cache" manager that gives out tracked refcounts >> as we do for some other structures like catcache entries. This might >> have some other advantages (I think most of the current CopySnapshot >> operations could be replaced by refcount increments) but it's a whole >> lot more complicated and invasive than what you've got here. > > I updated the patch to save the MyProc->xid at the time the first cursor > is created, and not allow the MyProc->xid to be set lower than that > saved value in the current transaction. It added only a few more lines > to the patch. It seems to me a lot cleaner to do the reference counting like Tom suggested. Increase the refcount on CopySnapshot, and decrease it on FreeSnapshot. Assuming that all callers of CopySnapshot free the snapshot with FreeSnapshot when they're done with it. BTW: I really like the idea of doing this. It's a relatively simple thing to do and gives some immediate benefit. And it opens the door for more tricks to vacuum more aggressively in the presence of long-running transactions. And it allows us to vacuum tuples that were inserted and deleted in the same transactions even while the transaction is still running, which helps some pathological cases where a transaction updates a counter column many times within a transaction. We could also use it to free entries in the new combocids hash table earlier (not that it's a problem as it is, though). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > It seems to me a lot cleaner to do the reference counting like Tom > suggested. Increase the refcount on CopySnapshot, and decrease it on > FreeSnapshot. Assuming that all callers of CopySnapshot free the > snapshot with FreeSnapshot when they're done with it. I don't believe we bother at the moment; which is one of the reasons it'd be a nontrivial patch. I do think it might be worth doing though. In the simple case where you're just issuing successive non-cursor commands within a READ COMMITTED transaction, a refcounted implementation would be able to recognize that there are *no* live snapshots between commands and therefore reset MyProc->xmin to 0 whenever the backend is idle. OTOH, do we have any evidence that this is worth bothering with at all? I fear that the cases of long-running transactions that are problems in the real world wouldn't be helped much --- for instance, pg_dump wouldn't change behavior because it uses a serializable transaction. Also, at some point a long-running transaction becomes a bottleneck simply because its XID is itself the oldest thing visible in the ProcArray and is determining everyone's xmin. How much daylight is there really between "your xmin is old" and "your xid is old"? regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: > > It seems to me a lot cleaner to do the reference counting like Tom > > suggested. Increase the refcount on CopySnapshot, and decrease it on > > FreeSnapshot. Assuming that all callers of CopySnapshot free the > > snapshot with FreeSnapshot when they're done with it. > > I don't believe we bother at the moment; which is one of the reasons > it'd be a nontrivial patch. I do think it might be worth doing though. > In the simple case where you're just issuing successive non-cursor > commands within a READ COMMITTED transaction, a refcounted > implementation would be able to recognize that there are *no* live > snapshots between commands and therefore reset MyProc->xmin to 0 > whenever the backend is idle. Attached is my current version of the patch. It doesn't work now that I tried to do reference count for Snapshots, but will stop now that Tom is considering redesigning the snapshot mechanism. > OTOH, do we have any evidence that this is worth bothering with at all? > I fear that the cases of long-running transactions that are problems > in the real world wouldn't be helped much --- for instance, pg_dump > wouldn't change behavior because it uses a serializable transaction. > Also, at some point a long-running transaction becomes a bottleneck > simply because its XID is itself the oldest thing visible in the > ProcArray and is determining everyone's xmin. How much daylight is > there really between "your xmin is old" and "your xid is old"? Well, interesting you mention that, because I have a second idea on how to improve things. We start with MyProc->xmin equal to our own xid, and then look for earlier transactions. It should be possible to skip considering our own xid for MyProc->xmin. This would obviously help VACUUM during long-running transactions. While our transaction is running, our xid isn't committed, so VACUUM isn't going to touch any of our rows, and if other transactions complete before our multi-transaction _statement_ starts, we can't see deleted rows from them transaction, so why keep the deleted rows around? Consider this case: Session #: 1 2 3 BEGIN; SELECT 1; CREATE TABLE test(x int); INSERT INTO test VALUES (1); DELETE FROM test; SELECT 1; VACUUM VERBOSE test; (row can be reused) COMMIT; VACUUM VERBOSE test; (normal row reuse) As I understand it, in READ COMMITTED mode, we have to skip transactions in progress when our _statement_ starts, but anything committed before that we see and we don't see dead rows created by them. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.280 diff -c -c -r1.280 index.c *** src/backend/catalog/index.c 3 Mar 2007 20:08:41 -0000 1.280 --- src/backend/catalog/index.c 24 Mar 2007 19:17:56 -0000 *************** *** 1358,1364 **** ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ --- 1358,1364 ---- ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ *************** *** 1555,1561 **** ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! /* These may have been pointing to the now-gone estate */ indexInfo->ii_ExpressionsState = NIL; indexInfo->ii_PredicateState = NIL; --- 1555,1563 ---- ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! if (snapshot != SnapshotNow) ! FreeSnapshot(snapshot); ! /* These may have been pointing to the now-gone estate */ indexInfo->ii_ExpressionsState = NIL; indexInfo->ii_PredicateState = NIL; Index: src/backend/commands/cluster.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/cluster.c 24 Mar 2007 19:17:56 -0000 *************** *** 204,209 **** --- 204,210 ---- /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); cluster_rel(rvtc, true); CommitTransactionCommand(); Index: src/backend/commands/indexcmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.157 diff -c -c -r1.157 indexcmds.c *** src/backend/commands/indexcmds.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/indexcmds.c 24 Mar 2007 19:17:56 -0000 *************** *** 493,500 **** * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot()); ! ActiveSnapshot = snapshot; /* * Scan the index and the heap, insert any missing index entries. --- 493,500 ---- * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! FreeSnapshot(ActiveSnapshot); ! ActiveSnapshot = snapshot = CopySnapshot(GetTransactionSnapshot()); /* * Scan the index and the heap, insert any missing index entries. *************** *** 1304,1309 **** --- 1304,1310 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); if (reindex_relation(relid, true)) ereport(NOTICE, Index: src/backend/commands/trigger.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.214 diff -c -c -r1.214 trigger.c *** src/backend/commands/trigger.c 19 Mar 2007 23:38:29 -0000 1.214 --- src/backend/commands/trigger.c 24 Mar 2007 19:17:57 -0000 *************** *** 2645,2651 **** --- 2645,2654 ---- */ events = &afterTriggers->events; if (events->head != NULL) + { + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + } /* * Run all the remaining triggers. Loop until they are all gone, just in Index: src/backend/commands/vacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v retrieving revision 1.349 diff -c -c -r1.349 vacuum.c *** src/backend/commands/vacuum.c 14 Mar 2007 18:48:55 -0000 1.349 --- src/backend/commands/vacuum.c 24 Mar 2007 19:17:59 -0000 *************** *** 412,417 **** --- 412,418 ---- { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else *************** *** 470,475 **** --- 471,477 ---- * necessary if we are called from autovacuum because autovacuum might * continue on to do an ANALYZE-only call. */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } *************** *** 964,969 **** --- 966,972 ---- if (vacstmt->full) { /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else Index: src/backend/executor/functions.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v retrieving revision 1.112 diff -c -c -r1.112 functions.c *** src/backend/executor/functions.c 13 Mar 2007 00:33:40 -0000 1.112 --- src/backend/executor/functions.c 24 Mar 2007 19:17:59 -0000 *************** *** 315,321 **** fcache->paramLI); /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */ ! /* Utility commands don't need Executor. */ if (es->qd->operation != CMD_UTILITY) { --- 315,321 ---- fcache->paramLI); /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */ ! /* Utility commands don't need Executor. */ if (es->qd->operation != CMD_UTILITY) { Index: src/backend/postmaster/autovacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.38 diff -c -c -r1.38 autovacuum.c *** src/backend/postmaster/autovacuum.c 23 Mar 2007 21:57:10 -0000 1.38 --- src/backend/postmaster/autovacuum.c 24 Mar 2007 19:18:00 -0000 *************** *** 884,889 **** --- 884,890 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* Index: src/backend/storage/ipc/procarray.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v retrieving revision 1.22 diff -c -c -r1.22 procarray.c *** src/backend/storage/ipc/procarray.c 23 Mar 2007 03:16:39 -0000 1.22 --- src/backend/storage/ipc/procarray.c 24 Mar 2007 19:18:01 -0000 *************** *** 498,504 **** *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot, bool serializable) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; --- 498,504 ---- *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; *************** *** 510,519 **** Assert(snapshot != NULL); ! /* Serializable snapshot must be computed before any other... */ ! Assert(serializable ? ! !TransactionIdIsValid(MyProc->xmin) : ! TransactionIdIsValid(MyProc->xmin)); /* * Allocating space for maxProcs xids is usually overkill; numProcs would --- 510,517 ---- Assert(snapshot != NULL); ! /* Either we set an xmin, or we already have one. */ ! Assert(TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0); /* * Allocating space for maxProcs xids is usually overkill; numProcs would *************** *** 656,664 **** } } ! if (serializable) ! MyProc->xmin = TransactionXmin = xmin; LWLockRelease(ProcArrayLock); /* --- 654,681 ---- } } ! /* DEBUG */ ! if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0) ! fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin); ! else ! fprintf(stderr, "%x:%x: count=%d, not set\n", MyProcPid, MyProc->xid, GetOpenSnapshotCount()); + /* + * --- + * We need to set MyProc->xmin in a few cases: + * o set at transaction start for SERIALIZABLE visibility + * o set for new command in READ COMMITTED mode so + * VACUUM knows we don't care about completed transactions + * + * The problem is that we might have old cursors/portals + * that need to have older transaction visibility, so we + * check to see we have no open Snapshots before making + * the change. + * --- + */ + if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0) + MyProc->xmin = TransactionXmin = xmin; + LWLockRelease(ProcArrayLock); /* Index: src/backend/tcop/fastpath.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v retrieving revision 1.96 diff -c -c -r1.96 fastpath.c *** src/backend/tcop/fastpath.c 3 Mar 2007 19:32:54 -0000 1.96 --- src/backend/tcop/fastpath.c 24 Mar 2007 19:18:01 -0000 *************** *** 308,313 **** --- 308,314 ---- * Now that we know we are in a valid transaction, set snapshot in case * needed by function itself or one of the datatype I/O routines. */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* Index: src/backend/tcop/postgres.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.529 diff -c -c -r1.529 postgres.c *** src/backend/tcop/postgres.c 22 Mar 2007 19:55:04 -0000 1.529 --- src/backend/tcop/postgres.c 24 Mar 2007 19:18:02 -0000 *************** *** 738,743 **** --- 738,744 ---- { if (needSnapshot) { + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); needSnapshot = false; } Index: src/backend/tcop/pquery.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v retrieving revision 1.115 diff -c -c -r1.115 pquery.c *** src/backend/tcop/pquery.c 13 Mar 2007 00:33:42 -0000 1.115 --- src/backend/tcop/pquery.c 24 Mar 2007 19:18:02 -0000 *************** *** 154,159 **** --- 154,160 ---- * Must always set snapshot for plannable queries. Note we assume that * caller will take care of restoring ActiveSnapshot on exit/error. */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* *************** *** 486,491 **** --- 487,493 ---- * Must set snapshot before starting executor. Be sure to * copy it into the portal's context. */ + FreeSnapshot(ActiveSnapshot); if (snapshot) ActiveSnapshot = CopySnapshot(snapshot); else *************** *** 1163,1169 **** --- 1165,1174 ---- IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || IsA(utilityStmt, CheckPointStmt))) + { + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + } else ActiveSnapshot = NULL; *************** *** 1177,1184 **** /* Some utility statements may change context on us */ MemoryContextSwitchTo(PortalGetHeapMemory(portal)); ! if (ActiveSnapshot) ! FreeSnapshot(ActiveSnapshot); ActiveSnapshot = NULL; } --- 1182,1188 ---- /* Some utility statements may change context on us */ MemoryContextSwitchTo(PortalGetHeapMemory(portal)); ! FreeSnapshot(ActiveSnapshot); ActiveSnapshot = NULL; } Index: src/backend/utils/adt/ri_triggers.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v retrieving revision 1.92 diff -c -c -r1.92 ri_triggers.c *** src/backend/utils/adt/ri_triggers.c 15 Mar 2007 23:12:06 -0000 1.92 --- src/backend/utils/adt/ri_triggers.c 24 Mar 2007 19:18:04 -0000 *************** *** 2636,2642 **** char workmembuf[32]; int spi_result; SPIPlanPtr qplan; ! /* * Check to make sure current user has enough permissions to do the test * query. (If not, caller can fall back to the trigger method, which --- 2636,2643 ---- char workmembuf[32]; int spi_result; SPIPlanPtr qplan; ! Snapshot snapshot; ! /* * Check to make sure current user has enough permissions to do the test * query. (If not, caller can fall back to the trigger method, which *************** *** 2770,2780 **** * really haven't got much choice.) We need at most one tuple returned, * so pass limit = 1. */ spi_result = SPI_execute_snapshot(qplan, NULL, NULL, ! CopySnapshot(GetLatestSnapshot()), InvalidSnapshot, true, 1); /* Check result */ if (spi_result != SPI_OK_SELECT) --- 2771,2783 ---- * really haven't got much choice.) We need at most one tuple returned, * so pass limit = 1. */ + snapshot = CopySnapshot(GetLatestSnapshot()); spi_result = SPI_execute_snapshot(qplan, NULL, NULL, ! snapshot, InvalidSnapshot, true, 1); + FreeSnapshot(snapshot); /* Check result */ if (spi_result != SPI_OK_SELECT) *************** *** 3310,3315 **** --- 3313,3322 ---- test_snapshot, crosscheck_snapshot, false, limit); + if (test_snapshot != InvalidSnapshot) + FreeSnapshot(test_snapshot); + if (crosscheck_snapshot != InvalidSnapshot) + FreeSnapshot(crosscheck_snapshot); /* Restore UID */ SetUserId(save_uid); Index: src/backend/utils/time/tqual.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v retrieving revision 1.101 diff -c -c -r1.101 tqual.c *** src/backend/utils/time/tqual.c 5 Jan 2007 22:19:47 -0000 1.101 --- src/backend/utils/time/tqual.c 24 Mar 2007 19:18:04 -0000 *************** *** 54,59 **** --- 54,67 ---- static SnapshotData SnapshotDirtyData; static SnapshotData SerializableSnapshotData; static SnapshotData LatestSnapshotData; + /* + * We use OpenSnapshotCount to record if there are any open snapshots. If + * there are not, we can update MyProc->xmin. A more sophisticated solution + * would be to keep track of the oldest open snapshot and prevent + * MyProc->xmin from going below that value, but the complexity doesn't + * seem worth it. + */ + static int OpenSnapshotCount = 0; /* Externally visible pointers to valid snapshots: */ Snapshot SnapshotDirty = &SnapshotDirtyData; *************** *** 1207,1220 **** /* First call in transaction? */ if (SerializableSnapshot == NULL) { ! SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData, true); return SerializableSnapshot; } if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false); return LatestSnapshot; } --- 1215,1228 ---- /* First call in transaction? */ if (SerializableSnapshot == NULL) { ! SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData); return SerializableSnapshot; } if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData); return LatestSnapshot; } *************** *** 1231,1237 **** if (SerializableSnapshot == NULL) elog(ERROR, "no snapshot has been set"); ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false); return LatestSnapshot; } --- 1239,1245 ---- if (SerializableSnapshot == NULL) elog(ERROR, "no snapshot has been set"); ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData); return LatestSnapshot; } *************** *** 1251,1256 **** --- 1259,1266 ---- Size subxipoff; Size size; + OpenSnapshotCount++; + /* We allocate any XID arrays needed in the same palloc block. */ size = subxipoff = sizeof(SnapshotData) + snapshot->xcnt * sizeof(TransactionId); *************** *** 1295,1301 **** void FreeSnapshot(Snapshot snapshot) { ! pfree(snapshot); } /* --- 1305,1320 ---- void FreeSnapshot(Snapshot snapshot) { ! if (snapshot != NULL) ! { ! OpenSnapshotCount--; ! pfree(snapshot); ! } ! } ! ! int GetOpenSnapshotCount(void) ! { ! return OpenSnapshotCount; } /* *************** *** 1312,1318 **** --- 1331,1339 ---- */ SerializableSnapshot = NULL; LatestSnapshot = NULL; + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = NULL; /* just for cleanliness */ + Assert(GetOpenSnapshotCount() == 0); } /* Index: src/include/utils/tqual.h =================================================================== RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v retrieving revision 1.65 diff -c -c -r1.65 tqual.h *** src/include/utils/tqual.h 5 Jan 2007 22:19:59 -0000 1.65 --- src/include/utils/tqual.h 24 Mar 2007 19:18:05 -0000 *************** *** 147,154 **** extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable); #endif /* TQUAL_H */ --- 147,155 ---- extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); + extern int GetOpenSnapshotCount(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot); #endif /* TQUAL_H */ Index: src/pl/plpgsql/src/pl_exec.c =================================================================== RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.190 diff -c -c -r1.190 pl_exec.c *** src/pl/plpgsql/src/pl_exec.c 15 Mar 2007 23:12:07 -0000 1.190 --- src/pl/plpgsql/src/pl_exec.c 24 Mar 2007 19:18:06 -0000 *************** *** 4139,4149 **** --- 4139,4153 ---- PG_CATCH(); { /* Restore global vars and propagate error */ + if (!estate->readonly_func) + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = saveActiveSnapshot; PG_RE_THROW(); } PG_END_TRY(); + if (!estate->readonly_func) + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = saveActiveSnapshot; SPI_pop();
> OTOH, do we have any evidence that this is worth bothering with at all? > I fear that the cases of long-running transactions that are problems > in the real world wouldn't be helped much --- for instance, pg_dump > wouldn't change behavior because it uses a serializable transaction. Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Well I think this would be the same infrastructure we would need to do > the other discussed improvement to address pg_dump's impact. That > would require us to publish the youngest xmax of the live > snapshots. Vacuum could deduce that that xid cannot possibly see any > transactions between the youngest extant xmax and the oldest > in-progress transaction. ... and do what with the knowledge? Not remove tuples, because any such tuples would be part of RECENTLY_DEAD update chains that that xact might be following now or in the future. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Well I think this would be the same infrastructure we would need to do >> the other discussed improvement to address pg_dump's impact. That >> would require us to publish the youngest xmax of the live >> snapshots. Vacuum could deduce that that xid cannot possibly see any >> transactions between the youngest extant xmax and the oldest >> in-progress transaction. > > ... and do what with the knowledge? Not remove tuples, because any such > tuples would be part of RECENTLY_DEAD update chains that that xact might > be following now or in the future. Well that just means it might require extra work, not that it would be impossible. Firstly, some tuples would not be part of a chain and could be cleaned up anyways. Others would be part of a HOT chain which might make it easier to clean them up. But even for tuples that are part of a chain there may be solutions. We could truncate the tuple to just the MVCC information so subsequent transactions can find the head. Or we might be able to go back and edit the forward link to skip the dead intermediate tuples (and somehow deal with the race conditions...) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, 2007-03-23 at 17:35 -0400, Tom Lane wrote: > To make intra-transaction advancing of xmin possible, we'd need to > explicitly track all of the backend's live snapshots, probably by > introducing a "snapshot cache" manager that gives out tracked > refcounts > as we do for some other structures like catcache entries. This might > have some other advantages (I think most of the current CopySnapshot > operations could be replaced by refcount increments) Seems like we should do this for many reasons, whether or not this is one of them. I seem to have butted heads and lost more than once with not being able to tell which Snapshots exist. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote: > Also, at some point a long-running transaction becomes a bottleneck > simply because its XID is itself the oldest thing visible in the > ProcArray and is determining everyone's xmin. How much daylight is > there really between "your xmin is old" and "your xid is old"? Hmm, yes. How often do we have an LRT that consists of multiple statements of significant duration? Not often, I'd wager. How much does it cost to optimise for this case? Did Heikki's patch to move the xmin forward during VACUUM get rejected? That seems like it has a much wider use case. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > Did Heikki's patch to move the xmin forward during VACUUM get rejected? > That seems like it has a much wider use case. It's still in the queue (which I have finally started to review in earnest). regards, tom lane
Simon Riggs wrote: > On Sat, 2007-03-24 at 11:48 -0400, Tom Lane wrote: > > > Also, at some point a long-running transaction becomes a bottleneck > > simply because its XID is itself the oldest thing visible in the > > ProcArray and is determining everyone's xmin. How much daylight is > > there really between "your xmin is old" and "your xid is old"? > > Hmm, yes. How often do we have an LRT that consists of multiple > statements of significant duration? Not often, I'd wager. > > How much does it cost to optimise for this case? Zero cost. It involves just tracking if there are any old snapshots, and if not, update the proc xmin rather than skipping the asignment, like we do now. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Simon Riggs wrote: >> How much does it cost to optimise for this case? > Zero cost. It involves just tracking if there are any old snapshots, I will be fascinated to hear how you define that as zero cost. It might be relatively low, but it's not zero, and for many simple cases (eg, all single-statement transactions) the benefit will indeed be zero. We need some serious consideration of the costs and benefits, not airy dismissals. I had originally thought that we could avoid CopySnapshot copying, which might buy back more than enough to cover the cost of tracking reference counts ... but in a quick look yesterday it seemed that the high-use code paths would still need a copy, because they are always copying off the static variables filled by GetTransactionSnapshot. Changing that would come with a tremendous memory penalty, or else a time penalty to scan the ProcArray twice to see how big the arrays need to be. [ thinks a bit... ] The other way we might possibly tackle that is to avoid copying by the expedient of just using those static snapshot variables in-place. SerializableSnapshot surely need never be copied since it remains unchanged till end of xact, and no use of the snap will survive longer than that. In simple cases LatestSnapshot is not overwritten until the prior value is no longer needed, either. If we could arrange to copy it only when setting up a cursor, or other long-lived usage, we'd be ahead of the game. But I'd certainly want a management layer in there to ensure that we don't overwrite a value that *is* still in use, and offhand I'm not sure what that layer would need to look like. Possibly some sort of double indirection so that callers would have a Snapshot pointer that remained valid after we'd copied off arrays that need to be updated? We already do have double indirection in the form of the Snapshot and xip[] pointers ... maybe attach refcounts to the xip arrays not the Snapshots per se? regards, tom lane
This is an optimization so I was thinking of something simpler, like incrementing/decrementing a counter every time we allocate/free a snapshot (like in the patch), and updating MyProc->xmin only if there are no open snapshots. I don't think it is worth tracking anything more complicated than that. Is that now possible to do cleanly? I am having trouble getting this to work in the attached patch. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Simon Riggs wrote: > >> How much does it cost to optimise for this case? > > > Zero cost. It involves just tracking if there are any old snapshots, > > I will be fascinated to hear how you define that as zero cost. It might > be relatively low, but it's not zero, and for many simple cases (eg, > all single-statement transactions) the benefit will indeed be zero. > We need some serious consideration of the costs and benefits, not airy > dismissals. > > I had originally thought that we could avoid CopySnapshot copying, > which might buy back more than enough to cover the cost of tracking > reference counts ... but in a quick look yesterday it seemed that the > high-use code paths would still need a copy, because they are always > copying off the static variables filled by GetTransactionSnapshot. > Changing that would come with a tremendous memory penalty, or else > a time penalty to scan the ProcArray twice to see how big the arrays > need to be. > > [ thinks a bit... ] The other way we might possibly tackle that is > to avoid copying by the expedient of just using those static snapshot > variables in-place. SerializableSnapshot surely need never be copied > since it remains unchanged till end of xact, and no use of the snap > will survive longer than that. In simple cases LatestSnapshot is not > overwritten until the prior value is no longer needed, either. If > we could arrange to copy it only when setting up a cursor, or other > long-lived usage, we'd be ahead of the game. But I'd certainly want > a management layer in there to ensure that we don't overwrite a value > that *is* still in use, and offhand I'm not sure what that layer would > need to look like. Possibly some sort of double indirection so that > callers would have a Snapshot pointer that remained valid after we'd > copied off arrays that need to be updated? We already do have double > indirection in the form of the Snapshot and xip[] pointers ... maybe > attach refcounts to the xip arrays not the Snapshots per se? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.281 diff -c -c -r1.281 index.c *** src/backend/catalog/index.c 25 Mar 2007 19:45:14 -0000 1.281 --- src/backend/catalog/index.c 26 Mar 2007 21:06:30 -0000 *************** *** 1358,1364 **** ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ --- 1358,1364 ---- ExprContext *econtext; Snapshot snapshot; TransactionId OldestXmin; ! /* * sanity checks */ *************** *** 1555,1561 **** ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! /* These may have been pointing to the now-gone estate */ indexInfo->ii_ExpressionsState = NIL; indexInfo->ii_PredicateState = NIL; --- 1555,1563 ---- ExecDropSingleTupleTableSlot(slot); FreeExecutorState(estate); ! if (snapshot != SnapshotNow) ! FreeSnapshot(snapshot); ! /* These may have been pointing to the now-gone estate */ indexInfo->ii_ExpressionsState = NIL; indexInfo->ii_PredicateState = NIL; Index: src/backend/commands/cluster.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.157 diff -c -c -r1.157 cluster.c *** src/backend/commands/cluster.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/cluster.c 26 Mar 2007 21:06:30 -0000 *************** *** 204,209 **** --- 204,210 ---- /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); cluster_rel(rvtc, true); CommitTransactionCommand(); Index: src/backend/commands/indexcmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.157 diff -c -c -r1.157 indexcmds.c *** src/backend/commands/indexcmds.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/indexcmds.c 26 Mar 2007 21:06:31 -0000 *************** *** 493,500 **** * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! snapshot = CopySnapshot(GetTransactionSnapshot()); ! ActiveSnapshot = snapshot; /* * Scan the index and the heap, insert any missing index entries. --- 493,500 ---- * We also set ActiveSnapshot to this snap, since functions in indexes may * need a snapshot. */ ! FreeSnapshot(ActiveSnapshot); ! ActiveSnapshot = snapshot = CopySnapshot(GetTransactionSnapshot()); /* * Scan the index and the heap, insert any missing index entries. *************** *** 1304,1309 **** --- 1304,1310 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); if (reindex_relation(relid, true)) ereport(NOTICE, Index: src/backend/commands/trigger.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.214 diff -c -c -r1.214 trigger.c *** src/backend/commands/trigger.c 19 Mar 2007 23:38:29 -0000 1.214 --- src/backend/commands/trigger.c 26 Mar 2007 21:06:32 -0000 *************** *** 2645,2651 **** --- 2645,2654 ---- */ events = &afterTriggers->events; if (events->head != NULL) + { + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + } /* * Run all the remaining triggers. Loop until they are all gone, just in Index: src/backend/commands/vacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v retrieving revision 1.349 diff -c -c -r1.349 vacuum.c *** src/backend/commands/vacuum.c 14 Mar 2007 18:48:55 -0000 1.349 --- src/backend/commands/vacuum.c 26 Mar 2007 21:06:33 -0000 *************** *** 412,417 **** --- 412,418 ---- { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else *************** *** 470,475 **** --- 471,477 ---- * necessary if we are called from autovacuum because autovacuum might * continue on to do an ANALYZE-only call. */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } *************** *** 964,969 **** --- 966,972 ---- if (vacstmt->full) { /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); } else Index: src/backend/executor/functions.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v retrieving revision 1.112 diff -c -c -r1.112 functions.c *** src/backend/executor/functions.c 13 Mar 2007 00:33:40 -0000 1.112 --- src/backend/executor/functions.c 26 Mar 2007 21:06:33 -0000 *************** *** 315,321 **** fcache->paramLI); /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */ ! /* Utility commands don't need Executor. */ if (es->qd->operation != CMD_UTILITY) { --- 315,321 ---- fcache->paramLI); /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */ ! /* Utility commands don't need Executor. */ if (es->qd->operation != CMD_UTILITY) { Index: src/backend/postmaster/autovacuum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.38 diff -c -c -r1.38 autovacuum.c *** src/backend/postmaster/autovacuum.c 23 Mar 2007 21:57:10 -0000 1.38 --- src/backend/postmaster/autovacuum.c 26 Mar 2007 21:06:34 -0000 *************** *** 884,889 **** --- 884,890 ---- StartTransactionCommand(); /* functions in indexes may want a snapshot set */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* Index: src/backend/storage/ipc/procarray.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v retrieving revision 1.23 diff -c -c -r1.23 procarray.c *** src/backend/storage/ipc/procarray.c 25 Mar 2007 19:45:14 -0000 1.23 --- src/backend/storage/ipc/procarray.c 26 Mar 2007 21:06:35 -0000 *************** *** 498,504 **** *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot, bool serializable) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; --- 498,504 ---- *---------- */ Snapshot ! GetSnapshotData(Snapshot snapshot) { ProcArrayStruct *arrayP = procArray; TransactionId xmin; *************** *** 510,519 **** Assert(snapshot != NULL); ! /* Serializable snapshot must be computed before any other... */ ! Assert(serializable ? ! !TransactionIdIsValid(MyProc->xmin) : ! TransactionIdIsValid(MyProc->xmin)); /* * Allocating space for maxProcs xids is usually overkill; numProcs would --- 510,517 ---- Assert(snapshot != NULL); ! /* Either we set an xmin, or we already have one. */ ! Assert(TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0); /* * Allocating space for maxProcs xids is usually overkill; numProcs would *************** *** 656,664 **** } } ! if (serializable) ! MyProc->xmin = TransactionXmin = xmin; LWLockRelease(ProcArrayLock); /* --- 654,681 ---- } } ! /* DEBUG */ ! if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0) ! fprintf(stderr, "%x:%x: set new xmin old=%x new=%x\n", MyProcPid, MyProc->xid, MyProc->xmin, xmin); ! else ! fprintf(stderr, "%x:%x: count=%d, not set\n", MyProcPid, MyProc->xid, GetOpenSnapshotCount()); + /* + * --- + * We need to set MyProc->xmin in a few cases: + * o set at transaction start for SERIALIZABLE visibility + * o set for new command in READ COMMITTED mode so + * VACUUM knows we don't care about completed transactions + * + * The problem is that we might have old cursors/portals + * that need to have older transaction visibility, so we + * check to see we have no open Snapshots before making + * the change. + * --- + */ + if (!TransactionIdIsValid(MyProc->xmin) || GetOpenSnapshotCount() == 0) + MyProc->xmin = TransactionXmin = xmin; + LWLockRelease(ProcArrayLock); /* Index: src/backend/tcop/fastpath.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/fastpath.c,v retrieving revision 1.96 diff -c -c -r1.96 fastpath.c *** src/backend/tcop/fastpath.c 3 Mar 2007 19:32:54 -0000 1.96 --- src/backend/tcop/fastpath.c 26 Mar 2007 21:06:35 -0000 *************** *** 308,313 **** --- 308,314 ---- * Now that we know we are in a valid transaction, set snapshot in case * needed by function itself or one of the datatype I/O routines. */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* Index: src/backend/tcop/postgres.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.529 diff -c -c -r1.529 postgres.c *** src/backend/tcop/postgres.c 22 Mar 2007 19:55:04 -0000 1.529 --- src/backend/tcop/postgres.c 26 Mar 2007 21:06:36 -0000 *************** *** 738,743 **** --- 738,744 ---- { if (needSnapshot) { + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); needSnapshot = false; } Index: src/backend/tcop/pquery.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v retrieving revision 1.115 diff -c -c -r1.115 pquery.c *** src/backend/tcop/pquery.c 13 Mar 2007 00:33:42 -0000 1.115 --- src/backend/tcop/pquery.c 26 Mar 2007 21:06:36 -0000 *************** *** 154,159 **** --- 154,160 ---- * Must always set snapshot for plannable queries. Note we assume that * caller will take care of restoring ActiveSnapshot on exit/error. */ + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* *************** *** 603,608 **** --- 604,611 ---- /* Restore global vars and propagate error */ ActivePortal = saveActivePortal; + if (portal->strategy == PORTAL_ONE_SELECT) + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = saveActiveSnapshot; CurrentResourceOwner = saveResourceOwner; PortalContext = savePortalContext; *************** *** 614,619 **** --- 617,624 ---- MemoryContextSwitchTo(oldContext); ActivePortal = saveActivePortal; + if (portal->strategy == PORTAL_ONE_SELECT) + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = saveActiveSnapshot; CurrentResourceOwner = saveResourceOwner; PortalContext = savePortalContext; *************** *** 1163,1169 **** --- 1168,1177 ---- IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || IsA(utilityStmt, CheckPointStmt))) + { + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + } else ActiveSnapshot = NULL; *************** *** 1177,1184 **** /* Some utility statements may change context on us */ MemoryContextSwitchTo(PortalGetHeapMemory(portal)); ! if (ActiveSnapshot) ! FreeSnapshot(ActiveSnapshot); ActiveSnapshot = NULL; } --- 1185,1191 ---- /* Some utility statements may change context on us */ MemoryContextSwitchTo(PortalGetHeapMemory(portal)); ! FreeSnapshot(ActiveSnapshot); ActiveSnapshot = NULL; } Index: src/backend/utils/adt/ri_triggers.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v retrieving revision 1.93 diff -c -c -r1.93 ri_triggers.c *** src/backend/utils/adt/ri_triggers.c 25 Mar 2007 19:45:14 -0000 1.93 --- src/backend/utils/adt/ri_triggers.c 26 Mar 2007 21:06:38 -0000 *************** *** 2636,2642 **** char workmembuf[32]; int spi_result; SPIPlanPtr qplan; ! /* * Check to make sure current user has enough permissions to do the test * query. (If not, caller can fall back to the trigger method, which --- 2636,2643 ---- char workmembuf[32]; int spi_result; SPIPlanPtr qplan; ! Snapshot snapshot; ! /* * Check to make sure current user has enough permissions to do the test * query. (If not, caller can fall back to the trigger method, which *************** *** 2770,2780 **** * really haven't got much choice.) We need at most one tuple returned, * so pass limit = 1. */ spi_result = SPI_execute_snapshot(qplan, NULL, NULL, ! CopySnapshot(GetLatestSnapshot()), InvalidSnapshot, true, 1); /* Check result */ if (spi_result != SPI_OK_SELECT) --- 2771,2783 ---- * really haven't got much choice.) We need at most one tuple returned, * so pass limit = 1. */ + snapshot = CopySnapshot(GetLatestSnapshot()); spi_result = SPI_execute_snapshot(qplan, NULL, NULL, ! snapshot, InvalidSnapshot, true, 1); + FreeSnapshot(snapshot); /* Check result */ if (spi_result != SPI_OK_SELECT) *************** *** 3310,3315 **** --- 3313,3322 ---- test_snapshot, crosscheck_snapshot, false, limit); + if (test_snapshot != InvalidSnapshot) + FreeSnapshot(test_snapshot); + if (crosscheck_snapshot != InvalidSnapshot) + FreeSnapshot(crosscheck_snapshot); /* Restore UID */ SetUserId(save_uid); Index: src/backend/utils/time/tqual.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v retrieving revision 1.102 diff -c -c -r1.102 tqual.c *** src/backend/utils/time/tqual.c 25 Mar 2007 19:45:14 -0000 1.102 --- src/backend/utils/time/tqual.c 26 Mar 2007 21:06:38 -0000 *************** *** 65,70 **** --- 65,79 ---- Snapshot LatestSnapshot = NULL; /* + * We use OpenSnapshotCount to record if there are any open snapshots. If + * there are not, we can update MyProc->xmin. A more sophisticated solution + * would be to keep track of the oldest open snapshot and prevent + * MyProc->xmin from going below that value, but the complexity doesn't + * seem worth it. + */ + static int OpenSnapshotCount = 0; + + /* * This pointer is not maintained by this module, but it's convenient * to declare it here anyway. Callers typically assign a copy of * GetTransactionSnapshot's result to ActiveSnapshot. *************** *** 1224,1237 **** /* First call in transaction? */ if (SerializableSnapshot == NULL) { ! SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData, true); return SerializableSnapshot; } if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false); return LatestSnapshot; } --- 1233,1246 ---- /* First call in transaction? */ if (SerializableSnapshot == NULL) { ! SerializableSnapshot = GetSnapshotData(&SerializableSnapshotData); return SerializableSnapshot; } if (IsXactIsoLevelSerializable) return SerializableSnapshot; ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData); return LatestSnapshot; } *************** *** 1248,1254 **** if (SerializableSnapshot == NULL) elog(ERROR, "no snapshot has been set"); ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData, false); return LatestSnapshot; } --- 1257,1263 ---- if (SerializableSnapshot == NULL) elog(ERROR, "no snapshot has been set"); ! LatestSnapshot = GetSnapshotData(&LatestSnapshotData); return LatestSnapshot; } *************** *** 1266,1271 **** --- 1275,1282 ---- Size subxipoff; Size size; + OpenSnapshotCount++; + /* We allocate any XID arrays needed in the same palloc block. */ size = subxipoff = sizeof(SnapshotData) + snapshot->xcnt * sizeof(TransactionId); *************** *** 1310,1316 **** void FreeSnapshot(Snapshot snapshot) { ! pfree(snapshot); } /* --- 1321,1337 ---- void FreeSnapshot(Snapshot snapshot) { ! if (snapshot != NULL) ! { ! OpenSnapshotCount--; ! Assert(OpenSnapshotCount >= 0); ! pfree(snapshot); ! } ! } ! ! int GetOpenSnapshotCount(void) ! { ! return OpenSnapshotCount; } /* *************** *** 1328,1333 **** --- 1349,1355 ---- SerializableSnapshot = NULL; LatestSnapshot = NULL; ActiveSnapshot = NULL; /* just for cleanliness */ + OpenSnapshotCount = 0; } /* Index: src/include/utils/tqual.h =================================================================== RCS file: /cvsroot/pgsql/src/include/utils/tqual.h,v retrieving revision 1.66 diff -c -c -r1.66 tqual.h *** src/include/utils/tqual.h 25 Mar 2007 19:45:14 -0000 1.66 --- src/include/utils/tqual.h 26 Mar 2007 21:06:39 -0000 *************** *** 150,157 **** extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable); #endif /* TQUAL_H */ --- 150,158 ---- extern Snapshot CopySnapshot(Snapshot snapshot); extern void FreeSnapshot(Snapshot snapshot); extern void FreeXactSnapshot(void); + extern int GetOpenSnapshotCount(void); /* in procarray.c; declared here to avoid including tqual.h in procarray.h: */ ! extern Snapshot GetSnapshotData(Snapshot snapshot); #endif /* TQUAL_H */ Index: src/pl/plpgsql/src/pl_exec.c =================================================================== RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.191 diff -c -c -r1.191 pl_exec.c *** src/pl/plpgsql/src/pl_exec.c 25 Mar 2007 23:27:59 -0000 1.191 --- src/pl/plpgsql/src/pl_exec.c 26 Mar 2007 21:06:41 -0000 *************** *** 4129,4139 **** --- 4129,4143 ---- PG_CATCH(); { /* Restore global vars and propagate error */ + if (!estate->readonly_func) + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = saveActiveSnapshot; PG_RE_THROW(); } PG_END_TRY(); + if (!estate->readonly_func) + FreeSnapshot(ActiveSnapshot); ActiveSnapshot = saveActiveSnapshot; SPI_pop();
Bruce Momjian <bruce@momjian.us> writes: > This is an optimization so I was thinking of something simpler, like > incrementing/decrementing a counter every time we allocate/free a > snapshot (like in the patch), and updating MyProc->xmin only if there > are no open snapshots. I don't think it is worth tracking anything more > complicated than that. Is that now possible to do cleanly? I am having > trouble getting this to work in the attached patch. No kidding. Even if you get it to pass the regression tests, it'll be a constant source of bugs. IMHO the *only* way to do this reliably is to have a centralized reference-count management module. I've been thinking more about the avoid-copy idea and I think it will indeed work, and be simpler than what we have now, to boot. Ideas: * There is only one (pair of) statically allocated, max-size XID arrays for GetSnapshotData to read into. When we want to acquire a new snap from GetSnapshotData, we palloc a new SnapshotData struct (and no more), make it point at the statically allocated arrays, fill and return it. * There can be at most one SnapshotData struct using the statically allocated arrays. We remember its address in a static variable (which goes NULL when there is no such struct). If we have to acquire a new snapshot before the old one's refcount has gone to zero, then at that time we palloc arrays just big enough for the old one, copy the static data to there, insert the palloc'd array pointers into the old one, and clear the static variable (or more likely, immediately set it to point to the new one). * All the current CopySnapshot operations are replaced by refcount increments. We need explicit refcount-decrement calls added, and ResourceOwner support to make sure they happen. (I'm imagining that snapshots will now live in a dedicated context, so they don't go away implicitly but need explicit release operations. The ResourceOwner infrastructure will help us find any missed releases.) * Decrementing the refcount on a SnapshotData pfree's it if the refcount has gone to zero, and also clears the aforementioned static pointer variable if it happens to point at that selfsame struct. Note that for a serializable transaction, or a "simple" read-committed transaction that doesn't use more than one snap at a time, this will always happen and so there will be zero copying cost. * In this scheme the distinction between SerializableSnapshot and LatestSnapshot vanishes, at least at the level of memory management. Probably SerializableSnapshot would translate into a refcount held by xact.c or some other upper-level module. * We keep all the live MVCC snapshots in a linked list managed by a "snapshot cache" module. We can traverse this list to determine the minimum xmin at any instant. Freeing a snap whose xmin equals the current MyProc->xmin causes us to traverse the list, determine the new backend-wide xmin, and update MyProc->xmin. Freeing the last snap causes us to reset MyProc->xmin to 0. (I'm not completely sure, but I think that we can just set MyProc->xmin in these cases, without bothering to lock the ProcArray --- any comments on that?) * GetSnapshotData no longer cares about serializable vs nonserializable snaps, either. Its rule is just "if MyProc->xmin is 0, then set it". This can happen only when creating the first snap in a transaction that currently has no live snaps. * The above two rules clearly suffice to maintain the invariant "MyProc->xmin is the minimum of the xmins of the live snapshots in the transaction, or zero when there are none". Which is what we want. On the whole though I think we should let this idea go till 8.4; we have a lot to deal with for 8.3 and a definite shortage of evidence that advancing xmin will buy much. Mu gut feeling is that the above design would save about enough in snapshot-copying costs to pay for its extra management logic, but we won't come out ahead unless advancing xmin intra-transaction really helps, and I'm not sure I believe that (except in the special case of VACUUM, and we already have a solution for that, which would be independent of this). regards, tom lane
Tom Lane wrote: > On the whole though I think we should let this idea go till 8.4; we have > a lot to deal with for 8.3 and a definite shortage of evidence that > advancing xmin will buy much. Mu gut feeling is that the above design > would save about enough in snapshot-copying costs to pay for its extra > management logic, but we won't come out ahead unless advancing xmin > intra-transaction really helps, and I'm not sure I believe that (except > in the special case of VACUUM, and we already have a solution for that, > which would be independent of this). The improvement is going to be a win for multi-statement transactions --- the only question is how often are they long-running. It does seem best to put this on the TODO for 8.4, and I will do that now. The only thing that makes it tempting to get into 8.3 is that we could advertise this release as a major "space reuse" release because of HOT, autovacuum on by default, multiple autovacuum processes, and, if we added it, improved VACUUM for multi-statement transactions. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I have a question about what would happen for a transaction running a command like COPY FROM. Is it possible it would manage to arrange to have no live snapshots at all? So it would have no impact on concurrent VACUUMs? What about something running a large pg_restore? "Tom Lane" <tgl@sss.pgh.pa.us> writes: > On the whole though I think we should let this idea go till 8.4; I tend to agree but for a different reason. I think it's something that will open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I think (or hope at any rate) we'll think of at least a few new things we can do with the new more precise information it exposes. Just as an example, if you find you have no live snapshots can you throw out the combocid hash? Any tuple you find with a combocid that's been discarded that way must predate your current scan and therefore is deleted for you. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > > I have a question about what would happen for a transaction running a command > like COPY FROM. Is it possible it would manage to arrange to have no live > snapshots at all? So it would have no impact on concurrent VACUUMs? What about > something running a large pg_restore? Interesting idea. If the table had triggers, it would need a snapshot, but if not, yea, that is certainly possible. --------------------------------------------------------------------------- > > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > On the whole though I think we should let this idea go till 8.4; > > I tend to agree but for a different reason. I think it's something that will > open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I > think (or hope at any rate) we'll think of at least a few new things we can do > with the new more precise information it exposes. > > Just as an example, if you find you have no live snapshots can you throw out > the combocid hash? Any tuple you find with a combocid that's been discarded > that way must predate your current scan and therefore is deleted for you. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> ... but we won't come out ahead unless advancing xmin >> intra-transaction really helps, and I'm not sure I believe that (except >> in the special case of VACUUM, and we already have a solution for that, >> which would be independent of this). > The improvement is going to be a win for multi-statement transactions > --- the only question is how often are they long-running. Uh, no, that's not very clear. A long-running transaction will be a VACUUM bottleneck because of its own XID, never mind its xmin. To make this helpful, you have to posit a lot of overlapping long-running transactions (such that the distance back to GlobalXmin might average about twice the distance back to the oldest live XID). That's not impossible but I wonder whether it's not mostly a token of bad application design. > It does seem best to put this on the TODO for 8.4, and I will do that > now. Agreed. Quite aside from the time needed for a reasonable implementation, we'd really need to do more performance-testing than we have time for now. regards, tom lane
Bruce Momjian <bruce@momjian.us> writes: > Gregory Stark wrote: >> I have a question about what would happen for a transaction running a command >> like COPY FROM. Is it possible it would manage to arrange to have no live >> snapshots at all? So it would have no impact on concurrent VACUUMs? What about >> something running a large pg_restore? > Interesting idea. Indeed. Currently, COPY forcibly sets a snapshot on the off chance something will use it, but I could certainly see making that happen "lazily", ie not at all in the simple case. pg_restore is probably a lost cause, at least if you are running it in single-transaction mode. I guess there'd be tradeoffs as to whether to do that or not ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> ... but we won't come out ahead unless advancing xmin > >> intra-transaction really helps, and I'm not sure I believe that (except > >> in the special case of VACUUM, and we already have a solution for that, > >> which would be independent of this). > > > The improvement is going to be a win for multi-statement transactions > > --- the only question is how often are they long-running. > > Uh, no, that's not very clear. A long-running transaction will be a > VACUUM bottleneck because of its own XID, never mind its xmin. To make > this helpful, you have to posit a lot of overlapping long-running > transactions (such that the distance back to GlobalXmin might average > about twice the distance back to the oldest live XID). That's not > impossible but I wonder whether it's not mostly a token of bad > application design. Well, my secondary addition was to start MyProc->xmin with the current xid counter, rather than your own xid. This is because your xid is already in progress and so will not be touched, so why not start with the current xid counter. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Gregory Stark wrote: > >> I have a question about what would happen for a transaction running a command > >> like COPY FROM. Is it possible it would manage to arrange to have no live > >> snapshots at all? So it would have no impact on concurrent VACUUMs? What about > >> something running a large pg_restore? > > > Interesting idea. > > Indeed. Currently, COPY forcibly sets a snapshot on the off chance > something will use it, but I could certainly see making that happen > "lazily", ie not at all in the simple case. > > pg_restore is probably a lost cause, at least if you are running it > in single-transaction mode. I guess there'd be tradeoffs as to whether > to do that or not ... The bottom line is that more optimizations for VACUUM dead tuple identification are possible. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Uh, no, that's not very clear. A long-running transaction will be a >> VACUUM bottleneck because of its own XID, never mind its xmin. > Well, my secondary addition was to start MyProc->xmin with the current > xid counter, rather than your own xid. Don't tell me you seriously believe that would work. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> Uh, no, that's not very clear. A long-running transaction will be a > >> VACUUM bottleneck because of its own XID, never mind its xmin. > > > Well, my secondary addition was to start MyProc->xmin with the current > > xid counter, rather than your own xid. > > Don't tell me you seriously believe that would work. I do. Please tell me why it will not. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> Well, my secondary addition was to start MyProc->xmin with the current >>> xid counter, rather than your own xid. >> >> Don't tell me you seriously believe that would work. > I do. Please tell me why it will not. You can't have GlobalXmin greater than your own xid, else log space (particularly pg_subtrans) may be recycled too soon. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >>> Well, my secondary addition was to start MyProc->xmin with the current > >>> xid counter, rather than your own xid. > >> > >> Don't tell me you seriously believe that would work. > > > I do. Please tell me why it will not. > > You can't have GlobalXmin greater than your own xid, else log space > (particularly pg_subtrans) may be recycled too soon. OK, so maybe GlobalXmin would have to be split into two new variables that control log space and dead-row detection separately. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +