Thread: More vacuum.c refactoring
Near the end of repair_frag() in vacuum.c -- under the comment /* clean moved tuples from last page in Nvacpagelist list */ -- there is code that marks itemids as unused. Itemids affected are those referring to tuples that have been moved off the last page. This code is very similar to vacuum_page(). The major difference is that vacuum_page() uses vacpage->offsets while the code in repair_frag() looks for MOVED_OFF bits in tuple headers. AFAICS the tuples with the MOVED_OFF bit set are exactly those referenced by vacpage->offsets. The attached patch passes make check and make installcheck. Please apply unless I'm missing something. Servus Manfred diff -Ncr ../base/src/backend/commands/vacuum.c src/backend/commands/vacuum.c *** ../base/src/backend/commands/vacuum.c Wed Jun 2 21:46:59 2004 --- src/backend/commands/vacuum.c Thu Jun 10 18:50:26 2004 *************** *** 2288,2355 **** vacpage->offsets_free > 0) { Buffer buf; - Page page; - OffsetNumber unused[BLCKSZ / sizeof(OffsetNumber)]; - OffsetNumber offnum, - maxoff; - int uncnt; - int num_tuples = 0; buf = ReadBuffer(onerel, vacpage->blkno); LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); ! page = BufferGetPage(buf); ! maxoff = PageGetMaxOffsetNumber(page); ! for (offnum = FirstOffsetNumber; ! offnum <= maxoff; ! offnum = OffsetNumberNext(offnum)) ! { ! ItemId itemid = PageGetItemId(page, offnum); ! HeapTupleHeader htup; ! ! if (!ItemIdIsUsed(itemid)) ! continue; ! htup = (HeapTupleHeader) PageGetItem(page, itemid); ! if (htup->t_infomask & HEAP_XMIN_COMMITTED) ! continue; ! ! /* ! ** See comments in the walk-along-page loop above, why we ! ** have Asserts here instead of if (...) elog(ERROR). ! */ ! Assert(!(htup->t_infomask & HEAP_MOVED_IN)); ! Assert(htup->t_infomask & HEAP_MOVED_OFF); ! Assert(HeapTupleHeaderGetXvac(htup) == myXID); ! ! itemid->lp_flags &= ~LP_USED; ! num_tuples++; ! ! } ! Assert(vacpage->offsets_free == num_tuples); ! ! START_CRIT_SECTION(); ! ! uncnt = PageRepairFragmentation(page, unused); ! ! /* XLOG stuff */ ! if (!onerel->rd_istemp) ! { ! XLogRecPtr recptr; ! ! recptr = log_heap_clean(onerel, buf, unused, uncnt); ! PageSetLSN(page, recptr); ! PageSetSUI(page, ThisStartUpID); ! } ! else ! { ! /* ! * No XLOG record, but still need to flag that XID exists ! * on disk ! */ ! MyXactMadeTempRelUpdate = true; ! } ! ! END_CRIT_SECTION(); ! LockBuffer(buf, BUFFER_LOCK_UNLOCK); WriteBuffer(buf); } --- 2288,2297 ---- vacpage->offsets_free > 0) { Buffer buf; buf = ReadBuffer(onerel, vacpage->blkno); LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); ! vacuum_page(onerel, buf, vacpage); LockBuffer(buf, BUFFER_LOCK_UNLOCK); WriteBuffer(buf); }
Manfred Koizar <mkoi-pg@aon.at> writes: > This code is very similar to vacuum_page(). The major difference is > that vacuum_page() uses vacpage->offsets while the code in repair_frag() > looks for MOVED_OFF bits in tuple headers. AFAICS the tuples with the > MOVED_OFF bit set are exactly those referenced by vacpage->offsets. This does not make me comfortable. You *think* that two different bits of code are doing the same thing, so you want to hack up vacuum.c? This module is delicate code --- we've had tons of bugs there in the past --- and no I have zero confidence that passing the regression tests proves anything, because all those prior bugs passed the regression tests. regards, tom lane
On Thu, 10 Jun 2004 17:19:22 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >This does not make me comfortable. I understand you, honestly. Do I read between your lines that you didn't review my previous vacuum.c refactoring patch? Please do. It'd make *me* more comfortable. > You *think* that two different bits of code are doing the same thing, I see three significant differences between the code in repair_frag() and vacuum_page(). 1) vacuum_page() hasAssert(vacpage->offsets_used == 0); vacpage is the last VacPage that has been inserted into Nvacpagelist. It is allocated in line 1566, offsets_used is immediately set to 0 and never changed. So this Assert(...) doesn't hurt. 2) In vacuum_page() the lp_flags are set inside a critical section. This is no problem because the clear-used-flags loop does not elog(ERROR, ...). Please correct me if I'm wrong. 3) vacuum_page() uses vacpage->offsets to locate the itemids that are to be updated. If we can show that these are the same itemids that belong to the tuples that are found by inspecting the tuple headers, then the two code snippets are equivalent. The first hint that this is the case isAssert(vacpage->offsets_free == num_tuples); So both spots expect to update the same number of itemids. What about the contents of the offsets[] array? Offset numbers are entered into this array by statements like vacpage->offsets[vacpage->offsets_free++] = offnum; in or immediately after the walk-along-page loop. These assignments are preceded either by code that sets the appropriate infomask bits or by assertions that the bits are already set appropriately. The rest (from PageRepairFragmentation to END_CRIT_SECTION) is identical. > so you want to hack up vacuum.c? This >module is delicate code --- we've had tons of bugs there in the past But why is it so delicate? Not only because it handles difficult problems, but also because it is written in a not very maintenance-friendly way. Before I started refactoring the code repair_frag() had more than 1100 lines and (almost) all variables used anywhere in the function were declared at function level. We cannot declare a code freeze for a module just because it is so badly written that every change is likely to break it. Someday someone will *have* to change it. Better we break it today in an effort to make the code clearer. >--- and no I have zero confidence that passing the regression tests >proves anything Unfortunately you are right :-( ServusManfred
On Fri, Jun 11, 2004 at 02:00:07AM +0200, Manfred Koizar wrote: If I may ... > > so you want to hack up vacuum.c? This > >module is delicate code --- we've had tons of bugs there in the past > > But why is it so delicate? Not only because it handles difficult > problems, but also because it is written in a not very > maintenance-friendly way. Before I started refactoring the code > repair_frag() had more than 1100 lines and (almost) all variables used > anywhere in the function were declared at function level. I agree. This code is horrid. I also agree with Tom in that this should be done with extreme caution, but it is a needed task. Maybe we could establish heavier testing for this kind of change so potential patches can be tested extensively. Concurrent vacuums with all kinds of imaginable operations (insert, updates, deletes), in tight loops, could be a start. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No es bueno caminar con un hombre muerto"
Manfred Koizar <mkoi-pg@aon.at> writes: > I understand you, honestly. Do I read between your lines that you > didn't review my previous vacuum.c refactoring patch? Please do. It'd > make *me* more comfortable. I did not yet, but I will get to it. I encourage everyone else to take a look too. I agree with Alvaro that fooling with this code merits extreme caution. BTW, I do not at all mean to suggest that vacuum.c contains no bugs at the moment ;-). I suspect for example that it is a bit random about whether MOVED_OFF/MOVED_IN bits get cleared immediately, or only by the next transaction that chances to visit the tuple. The next-transaction-fixup behavior has to be there in case the VACUUM transaction crashes, but that doesn't mean that VACUUM should deliberately leave work undone. > I see three significant differences between the code in repair_frag() > and vacuum_page(). Will study these comments later, but it's too late at night here... again, the more eyeballs on this the better... regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Maybe we could establish heavier testing for this kind of change so > potential patches can be tested extensively. Concurrent vacuums with > all kinds of imaginable operations (insert, updates, deletes), in tight > loops, could be a start. VACUUM FULL takes an exclusive lock, so it should not have to worry about concurrent operations on the table. What we have to think about is the initial states it can see. regards, tom lane
Where are we on this? --------------------------------------------------------------------------- Tom Lane wrote: > Manfred Koizar <mkoi-pg@aon.at> writes: > > I understand you, honestly. Do I read between your lines that you > > didn't review my previous vacuum.c refactoring patch? Please do. It'd > > make *me* more comfortable. > > I did not yet, but I will get to it. I encourage everyone else to > take a look too. I agree with Alvaro that fooling with this code > merits extreme caution. > > BTW, I do not at all mean to suggest that vacuum.c contains no bugs > at the moment ;-). I suspect for example that it is a bit random > about whether MOVED_OFF/MOVED_IN bits get cleared immediately, or > only by the next transaction that chances to visit the tuple. The > next-transaction-fixup behavior has to be there in case the VACUUM > transaction crashes, but that doesn't mean that VACUUM should > deliberately leave work undone. > > > I see three significant differences between the code in repair_frag() > > and vacuum_page(). > > Will study these comments later, but it's too late at night here... > again, the more eyeballs on this the better... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Where are we on this, 2x. :-) --------------------------------------------------------------------------- Bruce Momjian wrote: > > Where are we on this? > > --------------------------------------------------------------------------- > > Tom Lane wrote: > > Manfred Koizar <mkoi-pg@aon.at> writes: > > > I understand you, honestly. Do I read between your lines that you > > > didn't review my previous vacuum.c refactoring patch? Please do. It'd > > > make *me* more comfortable. > > > > I did not yet, but I will get to it. I encourage everyone else to > > take a look too. I agree with Alvaro that fooling with this code > > merits extreme caution. > > > > BTW, I do not at all mean to suggest that vacuum.c contains no bugs > > at the moment ;-). I suspect for example that it is a bit random > > about whether MOVED_OFF/MOVED_IN bits get cleared immediately, or > > only by the next transaction that chances to visit the tuple. The > > next-transaction-fixup behavior has to be there in case the VACUUM > > transaction crashes, but that doesn't mean that VACUUM should > > deliberately leave work undone. > > > > > I see three significant differences between the code in repair_frag() > > > and vacuum_page(). > > > > Will study these comments later, but it's too late at night here... > > again, the more eyeballs on this the better... > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
[Sorry for the late reply. I'm still struggling to catch up after vacation ...] On Fri, 9 Jul 2004 21:29:52 -0400 (EDT), Bruce Momjian <pgman@candle.pha.pa.us> wrote: > >Where are we on this, 2x. :-) Here: >> Tom Lane wrote: >> > Will study these comments later, but it's too late at night here... ServusManfred
Manfred Koizar <mkoi-pg@aon.at> writes: > On Fri, 9 Jul 2004 21:29:52 -0400 (EDT), Bruce Momjian > <pgman@candle.pha.pa.us> wrote: >> >> Where are we on this, 2x. :-) > Here: > Tom Lane wrote: > Will study these comments later, but it's too late at night here... I haven't had time to review the original, already-applied refactoring, let alone the follow-up. Considering that there's been at least one bug report against the original (unused variable IIRC), I think it does need reviewing. regards, tom lane
[Sorry, Neil, for abusing your thread. Moving this discussion back to where it belongs.] On Tue, 18 Jan 2005 13:17:17 -0300, Alvaro Herrera <alvherre@dcc.uchile.cl> wrote: >Hmm. I think this is a good idea on principle, but what happens in case >a previous vacuum was interrupted? Is there a possibility that tuples >belonging to that vacuum are still marked MOVED_OFF but are not in >vacpage->offsets, for example? These bits are handled by an earlier phase of VACUUM, by HeapTupleSatisfiesVacuum() in scan_heap(). My first vacuum.c refactoring patch, rev 1.281 2004-06-08, added these comments in repair_frag(): /** VACUUM FULL has an exclusive lock on the relation. So* normally no other transaction can have pending INSERTs or* DELETEsin this relation. A tuple is either* (a) a tuple in a system catalog, inserted or deleted by* a not yet committedtransaction or* (b) dead (XMIN_INVALID or XMAX_COMMITTED) or* (c) inserted by a committed xact (XMIN_COMMITTED)or* (d) moved by the currently running VACUUM.* In case (a) we wouldn't be in repair_frag() at all.* Incase (b) we cannot be here, because scan_heap() has* already marked the item as unused, see continue above.* Case (c) iswhat normally is to be expected.* Case (d) is only possible, if a whole tuple chain has been* moved while processing thisor a higher numbered block.*/ /** There cannot be another concurrently running VACUUM. If* the tuple had been moved in by a previous VACUUM, the* visibilitycheck would have set XMIN_COMMITTED. If the* tuple had been moved in by the currently running VACUUM,* the loopwould have been terminated. We had* elog(ERROR, ...) here, but as we are testing for a* can't-happen condition, Assert()seems more appropriate.*/ /** MOVED_OFF by another VACUUM would have caused the* visibility check to set XMIN_COMMITTED or XMIN_INVALID.*/ This should answer your question, unless the comments are wrong. (BTW parts of that patch have been backed out by someone, so you won't find the second comment in current sources.) As for the problem whether the two code paths deal with the same set of tuples, read http://archives.postgresql.org/pgsql-hackers/2004-06/msg00410.php: | [...] These assignments are | preceded either by code that sets the appropriate infomask bits or by | assertions that the bits are already set appropriately. ServusManfred
On Wed, 19 Jan 2005 18:57:48 +0100, I wrote: > My first vacuum.c >refactoring patch, rev 1.281 2004-06-08, added these comments in >repair_frag(): > >/* > * VACUUM FULL has an exclusive lock on the relation. So > * normally no other transaction can have pending INSERTs or > * DELETEs in this relation. A tuple is either > * (a) a tuple in a system catalog, inserted or deleted by > * a not yet committed transaction or > * (b) dead (XMIN_INVALID or XMAX_COMMITTED) or > * (c) inserted by a committed xact (XMIN_COMMITTED) or > * (d) moved by the currently running VACUUM. > * In case (a) we wouldn't be in repair_frag() at all. > * In case (b) we cannot be here, because scan_heap() has > * already marked the item as unused, see continue above. > * Case (c) is what normally is to be expected. > * Case (d) is only possible, if a whole tuple chain has been > * moved while processing this or a higher numbered block. > */ It turns out that this comment is not quite correct. It is incomplete. Case (b) should be: known dead (XMIN_INVALID, or XMAX_COMMITTED and xmax is visible to all active transactions). And there is a fifth possibility: (e) deleted (XMAX_COMMITTED) but at least one active transaction does not see the deleting transaction. The patch seems to imply that case (e) is a subcase of (b), but effectively tuples in this state are treated more like (c). ServusManfred
Manfred Koizar wrote: > On Wed, 19 Jan 2005 18:57:48 +0100, I wrote: > > My first vacuum.c > >refactoring patch, rev 1.281 2004-06-08, added these comments in > >repair_frag(): > > > >/* > > * VACUUM FULL has an exclusive lock on the relation. So > > * normally no other transaction can have pending INSERTs or > > * DELETEs in this relation. A tuple is either > > * (a) a tuple in a system catalog, inserted or deleted by > > * a not yet committed transaction or > > * (b) dead (XMIN_INVALID or XMAX_COMMITTED) or > > * (c) inserted by a committed xact (XMIN_COMMITTED) or > > * (d) moved by the currently running VACUUM. > > * In case (a) we wouldn't be in repair_frag() at all. > > * In case (b) we cannot be here, because scan_heap() has > > * already marked the item as unused, see continue above. > > * Case (c) is what normally is to be expected. > > * Case (d) is only possible, if a whole tuple chain has been > > * moved while processing this or a higher numbered block. > > */ > > It turns out that this comment is not quite correct. It is incomplete. > Case (b) should be: known dead (XMIN_INVALID, or XMAX_COMMITTED and xmax > is visible to all active transactions). > > And there is a fifth possibility: (e) deleted (XMAX_COMMITTED) but at > least one active transaction does not see the deleting transaction. > > The patch seems to imply that case (e) is a subcase of (b), but > effectively tuples in this state are treated more like (c). OK, comment updated to: /* --- * VACUUM FULL has an exclusive lock on the relation. So * normally no other transactioncan have pending INSERTs or * DELETEs in this relation. A tuple is either: * (a) atuple in a system catalog, inserted or deleted * by a not yet committed transaction * (b) known dead (XMIN_INVALID, or XMAX_COMMITTED and xmax * is visible to all active transactions) * (c) inserted by a committed xact (XMIN_COMMITTED) * (d) moved by the currently running VACUUM. * (e) deleted (XMAX_COMMITTED) but at least one active * transaction does notsee the deleting transaction * In case (a) we wouldn't be in repair_frag() at all. * In case (b)we cannot be here, because scan_heap() has * already marked the item as unused, see continue above. Case * (c) is what normally is to be expected. Case (d) is only * possible, if a whole tuple chain has beenmoved while * processing this or a higher numbered block. * --- */ -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073