Thread: Concurrent VACUUM: first results
Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock lockfile, and tried it out. Turns out it's not quite such a no-brainer as I'd hoped. Several problems emerged: 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM ANALYZE" blows up. The problem seems to be that "VACUUM ANALYZE"'s first move is to delete all available rows in pg_statistic. That generates a conflict against other vacuums that might be inserting new rows in pg_statistic. The newly started VACUUM will almost always hang up on a not-yet-committed pg_statistic row, waiting to see whether that row commits so that it can delete it. Even more interesting, the older VACUUM generally hangs up shortly after that; I'm not perfectly clear on what *it's* waiting on, but it's obviously a mutual deadlock situation. The two VACUUMs don't fail with a nice "deadlock detected" message, either ... it's more like a 60-second spinlock timeout, followed by abort() coredumps in both backends, followed by the postmaster killing every other backend in sight. That's clearly not acceptable behavior for production databases. I find this really disturbing whether we allow concurrent VACUUMs or not, because now I'm afraid that other sorts of system-table updates can show the same ungraceful response to deadlock situations. I have a vague recollection that Vadim said something about interlocks between multiple writers only being done properly for user tables not system tables ... if that's what this is, I think it's a must-fix problem. 2. I was able to avoid the deadlock by removing the code that tries to delete every pg_statistic tuple in sight. The remaining code deletes (and then recreates) pg_statistics tuples for each table it processes, while it's processing the table and holding an exclusive lock on the table. So, there's no danger of cross-VACUUM deadlocks. The trouble is that pg_statistics tuples for deleted tables won't ever go away, since VACUUM will never consider them. I suppose this could be fixed by modifying DROP TABLE to delete pg_statistics tuples applying to the target table. 3. I tried running VACUUMs in parallel with the regress tests, and saw a lot of messages like NOTICE: Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - can't shrink relation Looking at the code, this is normal behavior for VACUUM when it sees not-yet-committed tuples, and has nothing to do with whether there's another VACUUM going on elsewhere. BUT: why the heck are we getting these at all, especially on user tables? VACUUM's grabbed an exclusive lock on the target table; shouldn't that mean that all write transactions on the target have committed? This looks like it could be a symptom of a locking bug. Do we want to press ahead with fixing these problems, or should I just discard my changes uncommitted? Two of the three points look like things we need to worry about whether VACUUM is concurrent or not, but maybe I'm misinterpreting what I see. Comments? regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > Sent: Tuesday, November 23, 1999 3:41 PM > To: pgsql-hackers@postgreSQL.org > Subject: [HACKERS] Concurrent VACUUM: first results > > > Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock > lockfile, and tried it out. Turns out it's not quite such a no-brainer > as I'd hoped. Several problems emerged: > > 3. I tried running VACUUMs in parallel with the regress tests, and saw > a lot of messages like > NOTICE: Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - > can't shrink relation > Looking at the code, this is normal behavior for VACUUM when it sees > not-yet-committed tuples, and has nothing to do with whether there's > another VACUUM going on elsewhere. BUT: why the heck are we getting > these at all, especially on user tables? VACUUM's grabbed an exclusive > lock on the target table; shouldn't that mean that all write > transactions on the target have committed? This looks like it could > be a symptom of a locking bug. > Doesn't DoCopy() in copy.c unlock the target relation too early by heap_close() ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock > lockfile, and tried it out. Turns out it's not quite such a no-brainer > as I'd hoped. Several problems emerged: > > 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM > ANALYZE" blows up. The problem seems to be that "VACUUM ANALYZE"'s > first move is to delete all available rows in pg_statistic. That > generates a conflict against other vacuums that might be inserting new > rows in pg_statistic. The newly started VACUUM will almost always hang > up on a not-yet-committed pg_statistic row, waiting to see whether that > row commits so that it can delete it. Even more interesting, the older > VACUUM generally hangs up shortly after that; I'm not perfectly clear on > what *it's* waiting on, but it's obviously a mutual deadlock situation. > The two VACUUMs don't fail with a nice "deadlock detected" message, > either ... it's more like a 60-second spinlock timeout, followed by > abort() coredumps in both backends, followed by the postmaster killing > every other backend in sight. That's clearly not acceptable behavior > for production databases. > The following stuff in vc_vacuum() may be a cause. /* get list of relations */ vrl = vc_getrels(VacRelP); if (analyze && VacRelP == NULL && vrl != NULL) vc_delhilowstats(InvalidOid, 0, NULL); /* vacuum each heap relation */ CommitTransactionCommand() is executed at the end of vc_getrels() and vc_delhilowstats() is executed without StartTransactionCommand(). Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > CommitTransactionCommand() is executed at the end of vc_getrels() > and vc_delhilowstats() is executed without StartTransactionCommand(). Oooh, good eye! I wonder how long that bug's been there? I'm still inclined to remove that call to vc_delhilowstats, because it seems like a global delete of statistics can't help but be a problem. But if we keep it, you're dead right: it has to be inside a transaction. regards, tom lane
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > CommitTransactionCommand() is executed at the end of vc_getrels() > > and vc_delhilowstats() is executed without StartTransactionCommand(). > > Oooh, good eye! I wonder how long that bug's been there? > > I'm still inclined to remove that call to vc_delhilowstats, because it > seems like a global delete of statistics can't help but be a problem. > But if we keep it, you're dead right: it has to be inside a transaction. > With the new pg_statistic cache, you can efficiently do a heap_insert or heap_replace dending on whether an entry already exists. In the old code, that was hard because you had to scan entire table looking for a match so I just did a delete, and they I knew to do an insert. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > > CommitTransactionCommand() is executed at the end of vc_getrels() > > > and vc_delhilowstats() is executed without StartTransactionCommand(). > > > > Oooh, good eye! I wonder how long that bug's been there? > > > > I'm still inclined to remove that call to vc_delhilowstats, because it > > seems like a global delete of statistics can't help but be a problem. > > But if we keep it, you're dead right: it has to be inside a transaction. > > > > With the new pg_statistic cache, you can efficiently do a heap_insert or > heap_replace dending on whether an entry already exists. In the old > code, that was hard because you had to scan entire table looking for a > match so I just did a delete, and they I knew to do an insert. > I have a anxiety about the index of pg_statistic(pg_statistic itself also ?). vc_updstats() may be called in immediately committed mode. vacuum calls TransactionIdCommit() after moving tuples in order to delete index tuples and truncate the relation safely. It's necessary but the state is out of PostgreSQL's recovery mechanism. heap_insert() is imediately committed. If index_insert() fails there remains a heap tuple which doesn't have a corresponding index entry. Moreover duplicate index check is useless. The check is done after heap tuples are inserted(and committed). Should vc_updstats() be moved before TransactionIdCommit() ? I'm not sure. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> I have a anxiety about the index of pg_statistic(pg_statistic itself also > ?). > > vc_updstats() may be called in immediately committed mode. > vacuum calls TransactionIdCommit() after moving tuples in order > to delete index tuples and truncate the relation safely. > > It's necessary but the state is out of PostgreSQL's recovery > mechanism. > heap_insert() is imediately committed. If index_insert() fails > there remains a heap tuple which doesn't have a corresponding > index entry. Huh. Heap_insert writes to disk, but there it is not used unless the transaction gets committed, right? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > I have a anxiety about the index of pg_statistic(pg_statistic > itself also > > ?). > > > > vc_updstats() may be called in immediately committed mode. > > vacuum calls TransactionIdCommit() after moving tuples in order > > to delete index tuples and truncate the relation safely. > > > > It's necessary but the state is out of PostgreSQL's recovery > > mechanism. > > heap_insert() is imediately committed. If index_insert() fails > > there remains a heap tuple which doesn't have a corresponding > > index entry. > > > Huh. Heap_insert writes to disk, but there it is not used unless the > transaction gets committed, right? > This could occur only in vacuum. There's a quick hack in vc_rpfheap(). if (num_moved > 0) { /* * We have to commit our tuple' movings before we'll truncate * relation, but we shouldn't lose our locks. And so - quick hac k: * flush buffers and record status of current transaction as * committed, and continue. -vadim 11/13/96 */ FlushBufferPool(!TransactionFlushEnabled()); TransactionIdCommit(myXID); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FlushBufferPool(!TransactionFlushEnabled()); } vc_updstats() may be called in the already committed transaction. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > Huh. Heap_insert writes to disk, but there it is not used unless the > > transaction gets committed, right? > > > > This could occur only in vacuum. > There's a quick hack in vc_rpfheap(). > > if (num_moved > 0) > { > > /* > * We have to commit our tuple' movings before we'll > truncate > * relation, but we shouldn't lose our locks. And so - quick > hac > k: > * flush buffers and record status of current transaction as > * committed, and continue. - vadim 11/13/96 > */ > FlushBufferPool(!TransactionFlushEnabled()); > TransactionIdCommit(myXID); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > FlushBufferPool(!TransactionFlushEnabled()); > } > > vc_updstats() may be called in the already committed transaction. Oh, that is tricky that they have committed the transaction and continue working in an already committed. Yikes. Any idea why we have to commit it early? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > /* > > * We have to commit our tuple' movings before we'll truncate ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > * relation, but we shouldn't lose our locks. And so - quick hack: ^^^^^^^^ ... or moved tuples may be lost in the case of DB/OS crash etc that may occur after truncation but before commit... > > * flush buffers and record status of current transaction as > > * committed, and continue. - vadim 11/13/96 > > */ > > FlushBufferPool(!TransactionFlushEnabled()); > > TransactionIdCommit(myXID); > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > FlushBufferPool(!TransactionFlushEnabled()); > > } > > > > vc_updstats() may be called in the already committed transaction. > > Oh, that is tricky that they have committed the transaction and continue > working in an already committed. Yikes. Any idea why we have to commit > it early? Vadim
Vadim Mikheev <vadim@krs.ru> writes: >>>> * We have to commit our tuple' movings before we'll truncate >>>> * relation, but we shouldn't lose our locks. And so - quick hack: > ... or moved tuples may be lost in the case of DB/OS crash etc > that may occur after truncation but before commit... I wonder whether there isn't a cleaner way to do this. What about removing this early commit, and doing everything else the way that VACUUM does it, except that the physical truncate of the relation file happens *after* the commit at the end of vc_vacone? While I'm asking silly questions: why does VACUUM relabel tuples with its own xact ID anyway? I suppose that's intended to improve robustness in case of a crash --- but if there's a crash partway through VACUUM, it seems like data corruption is inevitable. How can you pack tuples into the minimum number of pages without creating duplicate or missing tuples, if you are unlucky enough to crash before deleting the tuples from their original pages? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, November 26, 1999 2:56 PM > To: Vadim Mikheev > Cc: Bruce Momjian; Hiroshi Inoue; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Concurrent VACUUM: first results > > > Vadim Mikheev <vadim@krs.ru> writes: > >>>> * We have to commit our tuple' movings before we'll truncate > >>>> * relation, but we shouldn't lose our locks. And so - quick hack: > > > ... or moved tuples may be lost in the case of DB/OS crash etc > > that may occur after truncation but before commit... > > I wonder whether there isn't a cleaner way to do this. What about > removing this early commit, and doing everything else the way that > VACUUM does it, except that the physical truncate of the relation > file happens *after* the commit at the end of vc_vacone? > I think there exists another reason. We couldn't delete index tuples for deleted but not yet committed heap tuples. If we could start another transaction without releasing exclusive lock for the target relation,it would be better. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> I wonder whether there isn't a cleaner way to do this. > I think there exists another reason. > We couldn't delete index tuples for deleted but not yet committed > heap tuples. My first thought was "Good point". But my second was "why should vacuum need to deal with that case?". If vacuum grabs an exclusive lock on a relation, it should *not* ever see tuples with uncertain commit status, no? > If we could start another transaction without releasing exclusive > lock for the target relation,it would be better. Seems like that might be doable, if we really do need it. regards, tom lane
Hiroshi Inoue wrote: > > > Vadim Mikheev <vadim@krs.ru> writes: > > >>>> * We have to commit our tuple' movings before we'll truncate > > >>>> * relation, but we shouldn't lose our locks. And so - quick hack: > > > > > ... or moved tuples may be lost in the case of DB/OS crash etc > > > that may occur after truncation but before commit... > > > > I wonder whether there isn't a cleaner way to do this. What about > > removing this early commit, and doing everything else the way that > > VACUUM does it, except that the physical truncate of the relation > > file happens *after* the commit at the end of vc_vacone? > > > > I think there exists another reason. > We couldn't delete index tuples for deleted but not yet committed > heap tuples. You're right! I just don't remember all reasons why I did as it's done -:)) > If we could start another transaction without releasing exclusive > lock for the target relation,it would be better. So. What's problem?! Start it! Commit "moving" xid, get new xid and go! Vadim
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> I wonder whether there isn't a cleaner way to do this. > > > I think there exists another reason. > > We couldn't delete index tuples for deleted but not yet committed > > heap tuples. > > My first thought was "Good point". But my second was "why should > vacuum need to deal with that case?". If vacuum grabs an exclusive > lock on a relation, it should *not* ever see tuples with uncertain > commit status, no? What if vacuum will crash after deleting index tuples pointing to heap tuples in old places but before commit? Index will be broken. Vadim
Tom Lane wrote: > > While I'm asking silly questions: why does VACUUM relabel tuples > with its own xact ID anyway? I suppose that's intended to improve > robustness in case of a crash --- but if there's a crash partway > through VACUUM, it seems like data corruption is inevitable. How > can you pack tuples into the minimum number of pages without creating > duplicate or missing tuples, if you are unlucky enough to crash before > deleting the tuples from their original pages? VACUUM: 1. has to preserve t_xmin/t_xmax in moved tuples (or MVCC will be broken) and so stores xid in t_cmin. 2. turns HEAP_XMIN_COMMITTED off in both tuple versions (in old and new places). 3. sets HEAP_MOVED_IN in tuples in new places and HEAP_MOVED_OFF in tuples in old places. Seeing HEAP_MOVED_IN/HEAP_MOVED_OFF (this is tested for tuples with HEAP_XMIN_COMMITTED off only, just to don't test in all cases) tqual.c funcs will check is tuple->t_cmin committed or not - ie was VACUUM succeded in moving or not. And so, single vacuum xid commit ensures that there will be neither duplicates nor lost tuples. Sorry, I should to describe this half year ago, but... Vadim
> > Hiroshi Inoue wrote: > > > > > Vadim Mikheev <vadim@krs.ru> writes: > > > >>>> * We have to commit our tuple' movings before we'll truncate > > > >>>> * relation, but we shouldn't lose our locks. And so - quick hack: > > > > > > > ... or moved tuples may be lost in the case of DB/OS crash etc > > > > that may occur after truncation but before commit... > > > > > > I wonder whether there isn't a cleaner way to do this. What about > > > removing this early commit, and doing everything else the way that > > > VACUUM does it, except that the physical truncate of the relation > > > file happens *after* the commit at the end of vc_vacone? > > > > > > > I think there exists another reason. > > We couldn't delete index tuples for deleted but not yet committed > > heap tuples. > > You're right! > I just don't remember all reasons why I did as it's done -:)) > > > If we could start another transaction without releasing exclusive > > lock for the target relation,it would be better. > > So. What's problem?! Start it! Commit "moving" xid, get new xid and go! > After a thought,I remembered that members of xidHash hold xids. Must I replace them by new xid ? Seems it isn't a cleaner way than current. There's another way. We could commit and start Transaction after truncation and before vc_updstats(). One of the problem is that cache invalidation is issued in vc_ updstats(). It is preferable that cache invalidation is called immedately after/before truncation,isn't it ? Any other problems ? BTW how(or what) would WAL do when vacuum fails after TransactionIdCommit() ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
I have committed the code change to remove pg_vlock locking from VACUUM. It turns out the problems I was seeing initially were all due to minor bugs in the lock manager and vacuum itself. > 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM > ANALYZE" blows up. The problem seems to be that "VACUUM ANALYZE"'s > first move is to delete all available rows in pg_statistic. The real problem was that VACUUM ANALYZE tried to delete those rows *while it was outside of any transaction*. If there was a concurrent VACUUM inserting tuples into pg_statistic, the new VACUUM would end up calling XactLockTableWait() with an invalid XID, which caused a failure inside lock.c --- and the failure path neglected to release the spinlock on the lock table. This was compounded by lmgr.c not bothering to check the return code from LockAcquire(). So the lock apparently succeeded, and then all the backends would die with "stuck spinlock" next time they tried to do any lockmanager operations. I have fixed the simpler aspects of the problem by adding missing SpinRelease() calls to lock.c, making lmgr.c test for failure, and altering VACUUM to not do the bogus row deletion. But I suspect that there is more to this that I don't understand. Why does calling XactLockTableWait() with an already-committed XID cause the following code in lock.c to trigger? Is this evidence of a logic bug in lock.c, or at least of inadequate checks for bogus input? /* * Check the xid entry status, in case something in the ipc * communication doesn't work correctly. */ if (!((result->nHolding > 0) && (result->holders[lockmode] > 0))) { XID_PRINT_AUX("LockAcquire:INCONSISTENT ", result); LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock, lockmode); /* Should we retry ? */ SpinRelease(masterLock); <<<<<<<<<<<< just added by me return FALSE; } > 3. I tried running VACUUMs in parallel with the regress tests, and saw > a lot of messages like > NOTICE: Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - can't shrink relation Hiroshi pointed out that this was probably due to copy.c releasing the lock prematurely on the table that is the destination of a COPY. With that fixed, I get many fewer of these messages, and they're all for system relations --- which is to be expected. Since we don't hold locks for system relations until xact commit, it's possible for VACUUM to see uncommitted tuples when it is vacuuming a system relation. So I think an occasional message like the above is OK as long as it mentions a system relation. I have been running multiple concurrent vacuums in parallel with the regress tests, and things seem to mostly work. Quite a few regress tests erratically "fail" under this load because they emit results in different orders than the expected output shows --- not too surprising if a VACUUM has come by and reordered a table. I am still seeing occasional glitches; for example, one vacuum failed with NOTICE: FlushRelationBuffers(onek, 24): block 40 is referenced (private 0, global 1) FATAL 1: VACUUM (vc_rpfheap): FlushRelationBuffers returned -2 pqReadData() -- backend closed the channel unexpectedly. I believe that these errors are not specifically caused by concurrent vacuums, but are symptoms of locking-related bugs that we still have to flush out and fix (cf. discussions on pg_hackers around 9/19/99). So I've gone ahead and committed the change to allow concurrent vacuums. regards, tom lane
> > I have committed the code change to remove pg_vlock locking from VACUUM. > It turns out the problems I was seeing initially were all due to minor > bugs in the lock manager and vacuum itself. > > > 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM > > ANALYZE" blows up. The problem seems to be that "VACUUM ANALYZE"'s > > first move is to delete all available rows in pg_statistic. > > The real problem was that VACUUM ANALYZE tried to delete those rows > *while it was outside of any transaction*. If there was a concurrent > VACUUM inserting tuples into pg_statistic, the new VACUUM would end up > calling XactLockTableWait() with an invalid XID, which caused a failure Hmm,what I could have seen here was always LockRelation(..,RowExclu siveLock). But the cause may be same. We couldn't get xids of not running *transaction*s because its proc->xid is set to 0(InvalidTransactionId). So blocking transaction couldn' find an xidLookupEnt in xidTable corresponding to the not running *transaction* when it tries to LockResolveConflicts() in LockReleaseAll() and couldn't GrantLock() to XidLookupEnt corresponding to the not running *transac tion*. After all LockAcquire() from not running *transaction* always fails once it is blocked. > I have fixed the simpler aspects of the problem by adding missing > SpinRelease() calls to lock.c, making lmgr.c test for failure, and > altering VACUUM to not do the bogus row deletion. But I suspect that > there is more to this that I don't understand. Why does calling > XactLockTableWait() with an already-committed XID cause the following It's seems strange. Isn't it waiting for a being deleted tuple by vc_upd stats() in vc_vacone() ? > code in lock.c to trigger? Is this evidence of a logic bug in lock.c, > or at least of inadequate checks for bogus input? > > /* > * Check the xid entry status, in case something in the ipc > * communication doesn't work correctly. > */ > if (!((result->nHolding > 0) && (result->holders[lockmode] > 0))) > { > XID_PRINT_AUX("LockAcquire: INCONSISTENT ", result); > LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock, lockmode); > /* Should we retry ? */ > SpinRelease(masterLock); <<<<<<<<<<<< just added by me > return FALSE; > } > This is the third time I came here and it was always caused by other bugs. Regards, Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > We couldn't get xids of not running *transaction*s because its proc->xid > is set to 0(InvalidTransactionId). So blocking transaction couldn' find an > xidLookupEnt in xidTable corresponding to the not running *transaction* > when it tries to LockResolveConflicts() in LockReleaseAll() and couldn't > GrantLock() to XidLookupEnt corresponding to the not running *transac > tion*. After all LockAcquire() from not running *transaction* always fails > once it is blocked. OK, I can believe that ... I assumed that proc->xid still had the ID of the last transaction, but if it's set to 0 during xact cleanup then this behavior makes sense. Still, it seems like lock.c should detect the missing table entry and fail sooner than it does ... >> I suspect that >> there is more to this that I don't understand. Why does calling >> XactLockTableWait() with an already-committed XID cause the following >> code in lock.c to trigger? Is this evidence of a logic bug in lock.c, >> or at least of inadequate checks for bogus input? > It's seems strange. Isn't it waiting for a being deleted tuple by vc_upd > stats() in vc_vacone() ? No --- the process that reaches the "INCONSISTENT" exit is the one that is trying to do the deletion of pg_statistic rows (during VACUUM startup). Presumably, it's found a row that is stored but not yet committed by another VACUUM, and is trying to wait for that transaction to commit. regards, tom lane