Thread: Concurrent VACUUM: first results

Concurrent VACUUM: first results

From
Tom Lane
Date:
Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock
lockfile, and tried it out.  Turns out it's not quite such a no-brainer
as I'd hoped.  Several problems emerged:

1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM
ANALYZE" blows up.  The problem seems to be that "VACUUM ANALYZE"'s
first move is to delete all available rows in pg_statistic.  That
generates a conflict against other vacuums that might be inserting new
rows in pg_statistic.  The newly started VACUUM will almost always hang
up on a not-yet-committed pg_statistic row, waiting to see whether that
row commits so that it can delete it.  Even more interesting, the older
VACUUM generally hangs up shortly after that; I'm not perfectly clear on
what *it's* waiting on, but it's obviously a mutual deadlock situation.
The two VACUUMs don't fail with a nice "deadlock detected" message,
either ... it's more like a 60-second spinlock timeout, followed by
abort() coredumps in both backends, followed by the postmaster killing
every other backend in sight.  That's clearly not acceptable behavior
for production databases.

I find this really disturbing whether we allow concurrent VACUUMs or
not, because now I'm afraid that other sorts of system-table updates
can show the same ungraceful response to deadlock situations.  I have
a vague recollection that Vadim said something about interlocks between
multiple writers only being done properly for user tables not system
tables ... if that's what this is, I think it's a must-fix problem.

2. I was able to avoid the deadlock by removing the code that tries to
delete every pg_statistic tuple in sight.  The remaining code deletes
(and then recreates) pg_statistics tuples for each table it processes,
while it's processing the table and holding an exclusive lock on the
table.  So, there's no danger of cross-VACUUM deadlocks.  The trouble is
that pg_statistics tuples for deleted tables won't ever go away, since
VACUUM will never consider them.  I suppose this could be fixed by
modifying DROP TABLE to delete pg_statistics tuples applying to the
target table.

3. I tried running VACUUMs in parallel with the regress tests, and saw
a lot of messages like
NOTICE:  Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - can't shrink relation
Looking at the code, this is normal behavior for VACUUM when it sees
not-yet-committed tuples, and has nothing to do with whether there's
another VACUUM going on elsewhere.  BUT: why the heck are we getting
these at all, especially on user tables?  VACUUM's grabbed an exclusive
lock on the target table; shouldn't that mean that all write
transactions on the target have committed?  This looks like it could
be a symptom of a locking bug.

Do we want to press ahead with fixing these problems, or should I just
discard my changes uncommitted?  Two of the three points look like
things we need to worry about whether VACUUM is concurrent or not,
but maybe I'm misinterpreting what I see.  Comments?
        regards, tom lane


RE: [HACKERS] Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> Sent: Tuesday, November 23, 1999 3:41 PM
> To: pgsql-hackers@postgreSQL.org
> Subject: [HACKERS] Concurrent VACUUM: first results
> 
> 
> Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock
> lockfile, and tried it out.  Turns out it's not quite such a no-brainer
> as I'd hoped.  Several problems emerged:
> 
> 3. I tried running VACUUMs in parallel with the regress tests, and saw
> a lot of messages like
> NOTICE:  Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - 
> can't shrink relation
> Looking at the code, this is normal behavior for VACUUM when it sees
> not-yet-committed tuples, and has nothing to do with whether there's
> another VACUUM going on elsewhere.  BUT: why the heck are we getting
> these at all, especially on user tables?  VACUUM's grabbed an exclusive
> lock on the target table; shouldn't that mean that all write
> transactions on the target have committed?  This looks like it could
> be a symptom of a locking bug.
>

Doesn't DoCopy() in copy.c unlock the target relation too early
by heap_close() ?
Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


RE: [HACKERS] Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
> 
> Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock
> lockfile, and tried it out.  Turns out it's not quite such a no-brainer
> as I'd hoped.  Several problems emerged:
> 
> 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM
> ANALYZE" blows up.  The problem seems to be that "VACUUM ANALYZE"'s
> first move is to delete all available rows in pg_statistic.  That
> generates a conflict against other vacuums that might be inserting new
> rows in pg_statistic.  The newly started VACUUM will almost always hang
> up on a not-yet-committed pg_statistic row, waiting to see whether that
> row commits so that it can delete it.  Even more interesting, the older
> VACUUM generally hangs up shortly after that; I'm not perfectly clear on
> what *it's* waiting on, but it's obviously a mutual deadlock situation.
> The two VACUUMs don't fail with a nice "deadlock detected" message,
> either ... it's more like a 60-second spinlock timeout, followed by
> abort() coredumps in both backends, followed by the postmaster killing
> every other backend in sight.  That's clearly not acceptable behavior
> for production databases.
>

The following stuff in vc_vacuum() may be a cause.
       /* get list of relations */       vrl = vc_getrels(VacRelP);
       if (analyze && VacRelP == NULL && vrl != NULL)               vc_delhilowstats(InvalidOid, 0, NULL);
       /* vacuum each heap relation */ 


CommitTransactionCommand() is executed at the end of vc_getrels()
and vc_delhilowstats() is executed without StartTransactionCommand().

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] Concurrent VACUUM: first results

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> CommitTransactionCommand() is executed at the end of vc_getrels()
> and vc_delhilowstats() is executed without StartTransactionCommand().

Oooh, good eye!  I wonder how long that bug's been there?

I'm still inclined to remove that call to vc_delhilowstats, because it
seems like a global delete of statistics can't help but be a problem.
But if we keep it, you're dead right: it has to be inside a transaction.
        regards, tom lane


Re: [HACKERS] Concurrent VACUUM: first results

From
Bruce Momjian
Date:
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > CommitTransactionCommand() is executed at the end of vc_getrels()
> > and vc_delhilowstats() is executed without StartTransactionCommand().
> 
> Oooh, good eye!  I wonder how long that bug's been there?
> 
> I'm still inclined to remove that call to vc_delhilowstats, because it
> seems like a global delete of statistics can't help but be a problem.
> But if we keep it, you're dead right: it has to be inside a transaction.
> 

With the new pg_statistic cache, you can efficiently do a heap_insert or
heap_replace dending on whether an entry already exists.  In the old
code, that was hard because you had to scan entire table looking for a
match so I just did a delete, and they I knew to do an insert.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
>
> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > > CommitTransactionCommand() is executed at the end of vc_getrels()
> > > and vc_delhilowstats() is executed without StartTransactionCommand().
> >
> > Oooh, good eye!  I wonder how long that bug's been there?
> >
> > I'm still inclined to remove that call to vc_delhilowstats, because it
> > seems like a global delete of statistics can't help but be a problem.
> > But if we keep it, you're dead right: it has to be inside a transaction.
> >
>
> With the new pg_statistic cache, you can efficiently do a heap_insert or
> heap_replace dending on whether an entry already exists.  In the old
> code, that was hard because you had to scan entire table looking for a
> match so I just did a delete, and they I knew to do an insert.
>

I have a anxiety about the index of pg_statistic(pg_statistic itself also
?).

vc_updstats() may be called in immediately committed mode.
vacuum calls TransactionIdCommit() after moving tuples in order
to delete index tuples and truncate the relation safely.

It's necessary but the state is out of PostgreSQL's recovery
mechanism.
heap_insert() is imediately committed. If index_insert() fails
there remains a heap tuple which doesn't have a corresponding
index entry.

Moreover duplicate index check is useless. The check is done
after heap tuples are inserted(and committed).

Should vc_updstats() be moved before TransactionIdCommit() ?
I'm not sure.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] Concurrent VACUUM: first results

From
Bruce Momjian
Date:
> I have a anxiety about the index of pg_statistic(pg_statistic itself also
> ?).
> 
> vc_updstats() may be called in immediately committed mode.
> vacuum calls TransactionIdCommit() after moving tuples in order
> to delete index tuples and truncate the relation safely.
> 
> It's necessary but the state is out of PostgreSQL's recovery
> mechanism.
> heap_insert() is imediately committed. If index_insert() fails
> there remains a heap tuple which doesn't have a corresponding
> index entry.


Huh.  Heap_insert writes to disk, but there it is not used unless the
transaction gets committed, right?


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
>
> > I have a anxiety about the index of pg_statistic(pg_statistic
> itself also
> > ?).
> >
> > vc_updstats() may be called in immediately committed mode.
> > vacuum calls TransactionIdCommit() after moving tuples in order
> > to delete index tuples and truncate the relation safely.
> >
> > It's necessary but the state is out of PostgreSQL's recovery
> > mechanism.
> > heap_insert() is imediately committed. If index_insert() fails
> > there remains a heap tuple which doesn't have a corresponding
> > index entry.
>
>
> Huh.  Heap_insert writes to disk, but there it is not used unless the
> transaction gets committed, right?
>

This could occur only in vacuum.
There's a quick hack in vc_rpfheap().
       if (num_moved > 0)       {
               /*                * We have to commit our tuple' movings before we'll
truncate                * relation, but we shouldn't lose our locks. And so - quick
hac
k:                * flush buffers and record status of current transaction as                * committed, and continue.
-vadim 11/13/96                */               FlushBufferPool(!TransactionFlushEnabled());
TransactionIdCommit(myXID);   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FlushBufferPool(!TransactionFlushEnabled());      }
 

vc_updstats() may be called in the already committed transaction.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] Concurrent VACUUM: first results

From
Bruce Momjian
Date:
> > Huh.  Heap_insert writes to disk, but there it is not used unless the
> > transaction gets committed, right?
> >
> 
> This could occur only in vacuum.
> There's a quick hack in vc_rpfheap().
> 
>         if (num_moved > 0)
>         {
> 
>                 /*
>                  * We have to commit our tuple' movings before we'll
> truncate
>                  * relation, but we shouldn't lose our locks. And so - quick
> hac
> k:
>                  * flush buffers and record status of current transaction as
>                  * committed, and continue. - vadim 11/13/96
>                  */
>                 FlushBufferPool(!TransactionFlushEnabled());
>                 TransactionIdCommit(myXID);
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 FlushBufferPool(!TransactionFlushEnabled());
>         }
> 
> vc_updstats() may be called in the already committed transaction.

Oh, that is tricky that they have committed the transaction and continue
working in an already committed.  Yikes.  Any idea why we have to commit
it early?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Concurrent VACUUM: first results

From
Vadim Mikheev
Date:
Bruce Momjian wrote:
> 
> > /*
> >  * We have to commit our tuple' movings before we'll truncate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >  * relation, but we shouldn't lose our locks. And so - quick hack:      ^^^^^^^^

... or moved tuples may be lost in the case of DB/OS crash etc   that may occur after truncation but before commit...

> >  * flush buffers and record status of current transaction as
> >  * committed, and continue. - vadim 11/13/96
> >  */
> >                 FlushBufferPool(!TransactionFlushEnabled());
> >                 TransactionIdCommit(myXID);
> >           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >                 FlushBufferPool(!TransactionFlushEnabled());
> >         }
> >
> > vc_updstats() may be called in the already committed transaction.
> 
> Oh, that is tricky that they have committed the transaction and continue
> working in an already committed.  Yikes.  Any idea why we have to commit
> it early?

Vadim


Re: [HACKERS] Concurrent VACUUM: first results

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
>>>> * We have to commit our tuple' movings before we'll truncate
>>>> * relation, but we shouldn't lose our locks. And so - quick hack:

> ... or moved tuples may be lost in the case of DB/OS crash etc
>     that may occur after truncation but before commit...

I wonder whether there isn't a cleaner way to do this.  What about
removing this early commit, and doing everything else the way that
VACUUM does it, except that the physical truncate of the relation
file happens *after* the commit at the end of vc_vacone?

While I'm asking silly questions: why does VACUUM relabel tuples
with its own xact ID anyway?  I suppose that's intended to improve
robustness in case of a crash --- but if there's a crash partway
through VACUUM, it seems like data corruption is inevitable.  How
can you pack tuples into the minimum number of pages without creating
duplicate or missing tuples, if you are unlucky enough to crash before
deleting the tuples from their original pages?
        regards, tom lane


RE: [HACKERS] Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, November 26, 1999 2:56 PM
> To: Vadim Mikheev
> Cc: Bruce Momjian; Hiroshi Inoue; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] Concurrent VACUUM: first results 
> 
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> >>>> * We have to commit our tuple' movings before we'll truncate
> >>>> * relation, but we shouldn't lose our locks. And so - quick hack:
> 
> > ... or moved tuples may be lost in the case of DB/OS crash etc
> >     that may occur after truncation but before commit...
> 
> I wonder whether there isn't a cleaner way to do this.  What about
> removing this early commit, and doing everything else the way that
> VACUUM does it, except that the physical truncate of the relation
> file happens *after* the commit at the end of vc_vacone?
>

I think there exists another reason.
We couldn't delete index tuples for deleted but not yet committed
heap tuples.

If we could start another transaction without releasing exclusive
lock for the target relation,it would be better.

Regards.
Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] Concurrent VACUUM: first results

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> I wonder whether there isn't a cleaner way to do this.

> I think there exists another reason.
> We couldn't delete index tuples for deleted but not yet committed
> heap tuples.

My first thought was "Good point".  But my second was "why should
vacuum need to deal with that case?".  If vacuum grabs an exclusive
lock on a relation, it should *not* ever see tuples with uncertain
commit status, no?

> If we could start another transaction without releasing exclusive
> lock for the target relation,it would be better.

Seems like that might be doable, if we really do need it.
        regards, tom lane


Re: [HACKERS] Concurrent VACUUM: first results

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> > Vadim Mikheev <vadim@krs.ru> writes:
> > >>>> * We have to commit our tuple' movings before we'll truncate
> > >>>> * relation, but we shouldn't lose our locks. And so - quick hack:
> >
> > > ... or moved tuples may be lost in the case of DB/OS crash etc
> > >     that may occur after truncation but before commit...
> >
> > I wonder whether there isn't a cleaner way to do this.  What about
> > removing this early commit, and doing everything else the way that
> > VACUUM does it, except that the physical truncate of the relation
> > file happens *after* the commit at the end of vc_vacone?
> >
> 
> I think there exists another reason.
> We couldn't delete index tuples for deleted but not yet committed
> heap tuples.

You're right!
I just don't remember all reasons why I did as it's done -:))

> If we could start another transaction without releasing exclusive
> lock for the target relation,it would be better.

So. What's problem?! Start it! Commit "moving" xid, get new xid and go!

Vadim


Re: [HACKERS] Concurrent VACUUM: first results

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> I wonder whether there isn't a cleaner way to do this.
> 
> > I think there exists another reason.
> > We couldn't delete index tuples for deleted but not yet committed
> > heap tuples.
> 
> My first thought was "Good point".  But my second was "why should
> vacuum need to deal with that case?".  If vacuum grabs an exclusive
> lock on a relation, it should *not* ever see tuples with uncertain
> commit status, no?

What if vacuum will crash after deleting index tuples pointing
to heap tuples in old places but before commit? Index will
be broken.

Vadim


Re: [HACKERS] Concurrent VACUUM: first results

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> While I'm asking silly questions: why does VACUUM relabel tuples
> with its own xact ID anyway?  I suppose that's intended to improve
> robustness in case of a crash --- but if there's a crash partway
> through VACUUM, it seems like data corruption is inevitable.  How
> can you pack tuples into the minimum number of pages without creating
> duplicate or missing tuples, if you are unlucky enough to crash before
> deleting the tuples from their original pages?

VACUUM:
1. has to preserve t_xmin/t_xmax in moved tuples  (or MVCC will be broken) and so stores xid in t_cmin.
2. turns HEAP_XMIN_COMMITTED off in both tuple versions   (in old and new places).
3. sets HEAP_MOVED_IN in tuples in new places and  HEAP_MOVED_OFF in tuples in old places.

Seeing HEAP_MOVED_IN/HEAP_MOVED_OFF (this is tested for
tuples with HEAP_XMIN_COMMITTED off only, just to don't
test in all cases) tqual.c funcs will check is tuple->t_cmin
committed or not - ie was VACUUM succeded in moving or not.
And so, single vacuum xid commit ensures that there will be
neither duplicates nor lost tuples.

Sorry, I should to describe this half year ago, but...

Vadim


RE: [HACKERS] Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
>
> Hiroshi Inoue wrote:
> >
> > > Vadim Mikheev <vadim@krs.ru> writes:
> > > >>>> * We have to commit our tuple' movings before we'll truncate
> > > >>>> * relation, but we shouldn't lose our locks. And so - quick hack:
> > >
> > > > ... or moved tuples may be lost in the case of DB/OS crash etc
> > > >     that may occur after truncation but before commit...
> > >
> > > I wonder whether there isn't a cleaner way to do this.  What about
> > > removing this early commit, and doing everything else the way that
> > > VACUUM does it, except that the physical truncate of the relation
> > > file happens *after* the commit at the end of vc_vacone?
> > >
> >
> > I think there exists another reason.
> > We couldn't delete index tuples for deleted but not yet committed
> > heap tuples.
>
> You're right!
> I just don't remember all reasons why I did as it's done -:))
>
> > If we could start another transaction without releasing exclusive
> > lock for the target relation,it would be better.
>
> So. What's problem?! Start it! Commit "moving" xid, get new xid and go!
>

After a thought,I remembered that members of xidHash hold xids.
Must I replace them by new xid ?
Seems it isn't a cleaner way than current.

There's another way.
We could commit and start Transaction after truncation and
before vc_updstats().
One of the problem is that cache invalidation is issued in vc_
updstats().
It is preferable that cache invalidation is called immedately
after/before truncation,isn't it ?
Any other problems ?


BTW how(or what) would WAL do when vacuum fails after
TransactionIdCommit() ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: Concurrent VACUUM: first results

From
Tom Lane
Date:
I have committed the code change to remove pg_vlock locking from VACUUM.
It turns out the problems I was seeing initially were all due to minor
bugs in the lock manager and vacuum itself.

> 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM
> ANALYZE" blows up.  The problem seems to be that "VACUUM ANALYZE"'s
> first move is to delete all available rows in pg_statistic.

The real problem was that VACUUM ANALYZE tried to delete those rows
*while it was outside of any transaction*.  If there was a concurrent
VACUUM inserting tuples into pg_statistic, the new VACUUM would end up
calling XactLockTableWait() with an invalid XID, which caused a failure
inside lock.c --- and the failure path neglected to release the spinlock
on the lock table.  This was compounded by lmgr.c not bothering to check
the return code from LockAcquire().  So the lock apparently succeeded,
and then all the backends would die with "stuck spinlock" next time they
tried to do any lockmanager operations.

I have fixed the simpler aspects of the problem by adding missing
SpinRelease() calls to lock.c, making lmgr.c test for failure, and
altering VACUUM to not do the bogus row deletion.  But I suspect that
there is more to this that I don't understand.  Why does calling
XactLockTableWait() with an already-committed XID cause the following
code in lock.c to trigger?  Is this evidence of a logic bug in lock.c,
or at least of inadequate checks for bogus input?
       /*        * Check the xid entry status, in case something in the ipc        * communication doesn't work
correctly.       */       if (!((result->nHolding > 0) && (result->holders[lockmode] > 0)))       {
XID_PRINT_AUX("LockAcquire:INCONSISTENT ", result);           LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock,
lockmode);          /* Should we retry ? */           SpinRelease(masterLock);   <<<<<<<<<<<< just added by me
return FALSE;       }
 

> 3. I tried running VACUUMs in parallel with the regress tests, and saw
> a lot of messages like
> NOTICE:  Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - can't shrink relation

Hiroshi pointed out that this was probably due to copy.c releasing the
lock prematurely on the table that is the destination of a COPY.  With
that fixed, I get many fewer of these messages, and they're all for
system relations --- which is to be expected.  Since we don't hold locks
for system relations until xact commit, it's possible for VACUUM to see
uncommitted tuples when it is vacuuming a system relation.  So I think
an occasional message like the above is OK as long as it mentions a
system relation.

I have been running multiple concurrent vacuums in parallel with the
regress tests, and things seem to mostly work.  Quite a few regress
tests erratically "fail" under this load because they emit results in
different orders than the expected output shows --- not too surprising
if a VACUUM has come by and reordered a table.

I am still seeing occasional glitches; for example, one vacuum failed
with

NOTICE:  FlushRelationBuffers(onek, 24): block 40 is referenced (private 0, global 1)
FATAL 1:  VACUUM (vc_rpfheap): FlushRelationBuffers returned -2
pqReadData() -- backend closed the channel unexpectedly.

I believe that these errors are not specifically caused by concurrent
vacuums, but are symptoms of locking-related bugs that we still have
to flush out and fix (cf. discussions on pg_hackers around 9/19/99).
So I've gone ahead and committed the change to allow concurrent
vacuums.
        regards, tom lane


RE: [HACKERS] Re: Concurrent VACUUM: first results

From
"Hiroshi Inoue"
Date:
> 
> I have committed the code change to remove pg_vlock locking from VACUUM.
> It turns out the problems I was seeing initially were all due to minor
> bugs in the lock manager and vacuum itself.
> 
> > 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM
> > ANALYZE" blows up.  The problem seems to be that "VACUUM ANALYZE"'s
> > first move is to delete all available rows in pg_statistic.
> 
> The real problem was that VACUUM ANALYZE tried to delete those rows
> *while it was outside of any transaction*.  If there was a concurrent
> VACUUM inserting tuples into pg_statistic, the new VACUUM would end up
> calling XactLockTableWait() with an invalid XID, which caused a failure

Hmm,what I could have seen here was always LockRelation(..,RowExclu
siveLock).  But the cause may be same.
We couldn't get xids of not running *transaction*s because its proc->xid
is set to 0(InvalidTransactionId). So blocking transaction couldn' find an
xidLookupEnt in xidTable corresponding to the not running *transaction*
when it tries to LockResolveConflicts() in LockReleaseAll() and couldn't
GrantLock() to XidLookupEnt corresponding to the not running *transac
tion*.  After all LockAcquire() from not running *transaction* always fails
once it is blocked.

> I have fixed the simpler aspects of the problem by adding missing
> SpinRelease() calls to lock.c, making lmgr.c test for failure, and
> altering VACUUM to not do the bogus row deletion.  But I suspect that
> there is more to this that I don't understand.  Why does calling
> XactLockTableWait() with an already-committed XID cause the following

It's seems strange.  Isn't it waiting for a being deleted tuple by vc_upd
stats() in vc_vacone() ?

> code in lock.c to trigger?  Is this evidence of a logic bug in lock.c,
> or at least of inadequate checks for bogus input?
> 
>         /*
>          * Check the xid entry status, in case something in the ipc
>          * communication doesn't work correctly.
>          */
>         if (!((result->nHolding > 0) && (result->holders[lockmode] > 0)))
>         {
>             XID_PRINT_AUX("LockAcquire: INCONSISTENT ", result);
>             LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock, lockmode);
>             /* Should we retry ? */
>             SpinRelease(masterLock);   <<<<<<<<<<<< just added by me
>             return FALSE;
>         }
>

This is the third time I came here and it was always caused by
other bugs. 

Regards,

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] Re: Concurrent VACUUM: first results

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> We couldn't get xids of not running *transaction*s because its proc->xid
> is set to 0(InvalidTransactionId). So blocking transaction couldn' find an
> xidLookupEnt in xidTable corresponding to the not running *transaction*
> when it tries to LockResolveConflicts() in LockReleaseAll() and couldn't
> GrantLock() to XidLookupEnt corresponding to the not running *transac
> tion*.  After all LockAcquire() from not running *transaction* always fails
> once it is blocked.

OK, I can believe that ... I assumed that proc->xid still had the ID of
the last transaction, but if it's set to 0 during xact cleanup then this
behavior makes sense.  Still, it seems like lock.c should detect the
missing table entry and fail sooner than it does ...

>> I suspect that
>> there is more to this that I don't understand.  Why does calling
>> XactLockTableWait() with an already-committed XID cause the following
>> code in lock.c to trigger?  Is this evidence of a logic bug in lock.c,
>> or at least of inadequate checks for bogus input?

> It's seems strange.  Isn't it waiting for a being deleted tuple by vc_upd
> stats() in vc_vacone() ?

No --- the process that reaches the "INCONSISTENT" exit is the one that
is trying to do the deletion of pg_statistic rows (during VACUUM
startup).  Presumably, it's found a row that is stored but not yet
committed by another VACUUM, and is trying to wait for that transaction
to commit.
        regards, tom lane