Proposed patch for ANALYZE overcounting dead rows - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Proposed patch for ANALYZE overcounting dead rows |
Date | |
Msg-id | 6363.1195253903@sss.pgh.pa.us Whole thread Raw |
List | pgsql-patches |
There's been a thread in pgsql-performance about ANALYZE reporting an overly large number of dead rows because it counts anything that's not visible-per-SnapshotNow-rules as "dead", in particular insert-in-progress tuples are reported as "dead". This seems a bad idea because it may prompt autovacuum to launch a useless vacuum. The attached patch revises ANALYZE to distinguish tuple states the same way VACUUM does, and to not count INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples as either "live" or "dead". Comments? In particular, should we report a count of in-doubt tuples in the eventual LOG message, and does anyone want to argue for still counting DELETE_IN_PROGRESS as "dead"? regards, tom lane *** src/backend/commands/analyze.c.orig Thu Nov 15 17:38:16 2007 --- src/backend/commands/analyze.c Fri Nov 16 17:45:10 2007 *************** *** 32,37 **** --- 32,38 ---- #include "pgstat.h" #include "postmaster/autovacuum.h" #include "storage/proc.h" + #include "storage/procarray.h" #include "utils/acl.h" #include "utils/datum.h" #include "utils/lsyscache.h" *************** *** 835,840 **** --- 836,842 ---- double deadrows = 0; /* # dead rows seen */ double rowstoskip = -1; /* -1 means not set yet */ BlockNumber totalblocks; + TransactionId OldestXmin; BlockSamplerData bs; double rstate; *************** *** 842,847 **** --- 844,852 ---- totalblocks = RelationGetNumberOfBlocks(onerel); + /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */ + OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true); + /* Prepare for sampling block numbers */ BlockSampler_Init(&bs, totalblocks, targrows); /* Prepare for sampling rows */ *************** *** 862,917 **** * We must maintain a pin on the target page's buffer to ensure that * the maxoffset value stays good (else concurrent VACUUM might delete * tuples out from under us). Hence, pin the page until we are done ! * looking at it. We don't maintain a lock on the page, so tuples ! * could get added to it, but we ignore such tuples. */ targbuffer = ReadBufferWithStrategy(onerel, targblock, vac_strategy); LockBuffer(targbuffer, BUFFER_LOCK_SHARE); targpage = BufferGetPage(targbuffer); maxoffset = PageGetMaxOffsetNumber(targpage); - LockBuffer(targbuffer, BUFFER_LOCK_UNLOCK); /* Inner loop over all tuples on the selected page */ for (targoffset = FirstOffsetNumber; targoffset <= maxoffset; targoffset++) { HeapTupleData targtuple; ItemPointerSet(&targtuple.t_self, targblock, targoffset); ! /* We use heap_release_fetch to avoid useless bufmgr traffic */ ! if (heap_release_fetch(onerel, SnapshotNow, ! &targtuple, &targbuffer, ! true, NULL)) { /* * The first targrows live rows are simply copied into the * reservoir. Then we start replacing tuples in the sample ! * until we reach the end of the relation. This algorithm is ! * from Jeff Vitter's paper (see full citation below). It ! * works by repeatedly computing the number of tuples to skip ! * before selecting a tuple, which replaces a randomly chosen ! * element of the reservoir (current set of tuples). At all ! * times the reservoir is a true random sample of the tuples ! * we've passed over so far, so when we fall off the end of ! * the relation we're done. */ if (numrows < targrows) rows[numrows++] = heap_copytuple(&targtuple); else { /* ! * t in Vitter's paper is the number of records already ! * processed. If we need to compute a new S value, we ! * must use the not-yet-incremented value of liverows as ! * t. */ if (rowstoskip < 0) ! rowstoskip = get_next_S(liverows, targrows, &rstate); ! if (rowstoskip <= 0) { /* ! * Found a suitable tuple, so save it, replacing one ! * old tuple at random */ int k = (int) (targrows * random_fract()); --- 867,935 ---- * We must maintain a pin on the target page's buffer to ensure that * the maxoffset value stays good (else concurrent VACUUM might delete * tuples out from under us). Hence, pin the page until we are done ! * looking at it. We also choose to hold sharelock on the buffer ! * throughout --- we could release and re-acquire sharelock for ! * each tuple, but since we aren't doing much work per tuple, the ! * extra lock traffic is probably better avoided. */ targbuffer = ReadBufferWithStrategy(onerel, targblock, vac_strategy); LockBuffer(targbuffer, BUFFER_LOCK_SHARE); targpage = BufferGetPage(targbuffer); maxoffset = PageGetMaxOffsetNumber(targpage); /* Inner loop over all tuples on the selected page */ for (targoffset = FirstOffsetNumber; targoffset <= maxoffset; targoffset++) { + ItemId itemid; HeapTupleData targtuple; + itemid = PageGetItemId(targpage, targoffset); + + /* Ignore unused, redirect, and dead tuple slots */ + if (!ItemIdIsNormal(itemid)) + continue; + ItemPointerSet(&targtuple.t_self, targblock, targoffset); ! ! targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid); ! targtuple.t_len = ItemIdGetLength(itemid); ! ! switch (HeapTupleSatisfiesVacuum(targtuple.t_data, ! OldestXmin, ! targbuffer)) { + case HEAPTUPLE_LIVE: + /* * The first targrows live rows are simply copied into the * reservoir. Then we start replacing tuples in the sample ! * until we reach the end of the relation. This algorithm ! * is from Jeff Vitter's paper (see full citation ! * below). It works by repeatedly computing the number of ! * tuples to skip before selecting a tuple, which replaces ! * a randomly chosen element of the reservoir (current set ! * of tuples). At all times the reservoir is a true ! * random sample of the tuples we've passed over so far, ! * so when we fall off the end of the relation we're done. */ if (numrows < targrows) rows[numrows++] = heap_copytuple(&targtuple); else { /* ! * t in Vitter's paper is the number of records ! * already processed. If we need to compute a new S ! * value, we must use the not-yet-incremented value of ! * liverows as t. */ if (rowstoskip < 0) ! rowstoskip = get_next_S(liverows, targrows, ! &rstate); if (rowstoskip <= 0) { /* ! * Found a suitable tuple, so save it, replacing ! * one old tuple at random */ int k = (int) (targrows * random_fract()); *************** *** 924,939 **** } liverows += 1; ! } ! else ! { ! /* Count dead rows, but not empty slots */ ! if (targtuple.t_data != NULL) deadrows += 1; } } ! /* Now release the pin on the page */ ReleaseBuffer(targbuffer); } --- 942,968 ---- } liverows += 1; ! break; ! ! case HEAPTUPLE_DEAD: ! case HEAPTUPLE_RECENTLY_DEAD: ! /* Count dead and recently-dead rows */ deadrows += 1; + break; + + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + /* We do not currently count in-doubt rows */ + break; + + default: + elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result"); + break; } } ! /* Now release the lock and pin on the page */ ! LockBuffer(targbuffer, BUFFER_LOCK_UNLOCK); ReleaseBuffer(targbuffer); }
pgsql-patches by date: