Concurrent VACUUM: first results - Mailing list pgsql-hackers

From Tom Lane
Subject Concurrent VACUUM: first results
Date
Msg-id 5068.943339265@sss.pgh.pa.us
Whole thread Raw
Responses RE: [HACKERS] Concurrent VACUUM: first results  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
RE: [HACKERS] Concurrent VACUUM: first results  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Re: Concurrent VACUUM: first results  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Well, I diked out the code in vacuum.c that creates/deletes the pg_vlock
lockfile, and tried it out.  Turns out it's not quite such a no-brainer
as I'd hoped.  Several problems emerged:

1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM
ANALYZE" blows up.  The problem seems to be that "VACUUM ANALYZE"'s
first move is to delete all available rows in pg_statistic.  That
generates a conflict against other vacuums that might be inserting new
rows in pg_statistic.  The newly started VACUUM will almost always hang
up on a not-yet-committed pg_statistic row, waiting to see whether that
row commits so that it can delete it.  Even more interesting, the older
VACUUM generally hangs up shortly after that; I'm not perfectly clear on
what *it's* waiting on, but it's obviously a mutual deadlock situation.
The two VACUUMs don't fail with a nice "deadlock detected" message,
either ... it's more like a 60-second spinlock timeout, followed by
abort() coredumps in both backends, followed by the postmaster killing
every other backend in sight.  That's clearly not acceptable behavior
for production databases.

I find this really disturbing whether we allow concurrent VACUUMs or
not, because now I'm afraid that other sorts of system-table updates
can show the same ungraceful response to deadlock situations.  I have
a vague recollection that Vadim said something about interlocks between
multiple writers only being done properly for user tables not system
tables ... if that's what this is, I think it's a must-fix problem.

2. I was able to avoid the deadlock by removing the code that tries to
delete every pg_statistic tuple in sight.  The remaining code deletes
(and then recreates) pg_statistics tuples for each table it processes,
while it's processing the table and holding an exclusive lock on the
table.  So, there's no danger of cross-VACUUM deadlocks.  The trouble is
that pg_statistics tuples for deleted tables won't ever go away, since
VACUUM will never consider them.  I suppose this could be fixed by
modifying DROP TABLE to delete pg_statistics tuples applying to the
target table.

3. I tried running VACUUMs in parallel with the regress tests, and saw
a lot of messages like
NOTICE:  Rel tenk1: TID 1/31: InsertTransactionInProgress 29737 - can't shrink relation
Looking at the code, this is normal behavior for VACUUM when it sees
not-yet-committed tuples, and has nothing to do with whether there's
another VACUUM going on elsewhere.  BUT: why the heck are we getting
these at all, especially on user tables?  VACUUM's grabbed an exclusive
lock on the target table; shouldn't that mean that all write
transactions on the target have committed?  This looks like it could
be a symptom of a locking bug.

Do we want to press ahead with fixing these problems, or should I just
discard my changes uncommitted?  Two of the three points look like
things we need to worry about whether VACUUM is concurrent or not,
but maybe I'm misinterpreting what I see.  Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: TODO updates
Next
From: Tom Lane
Date:
Subject: VACUUM as a denial-of-service attack