Re: Vacuum Full Analyze Stalled - Mailing list pgsql-admin

From Kevin Grittner
Subject Re: Vacuum Full Analyze Stalled
Date
Msg-id s3415e4a.049@gwmta.wicourts.gov
Whole thread Raw
In response to Vacuum Full Analyze Stalled  ("Jeff Kirby" <Jeff.Kirby@wicourts.gov>)
Responses Re: Vacuum Full Analyze Stalled
List pgsql-admin
We will use gdb and strace the next time we see this.

I've tried to be specific about which vacuum is running in all cases.  If
the posts have been confusing on that issue, I apologize.  I'll try to be
clear on this in future posts.

To summarize past events, the case involving the constraint index
was indeed a "vacuum full" of the entire database under heavy load.
Autovacuum failed to keep the small, high-update table clean in that
scenario, but I am not sure whether that caused the failure of the
vacuum full, or was the result of it.  This weekend, it seemed like the
first thing which failed (and the last) were autovacuum attempts.
Vacuum full was run through psql during attempts to recover
performance after the failure of autovacuum caused performance
to slow noticably.  We didn't capture info which would tell us whether
the explicit vacuum was blocked by an autovacuum process.

There were a few very small single-source tests under 8.0.3, but all
tests involving any significant load were under 8.1beta1 or 8.1beta2.
We did not see this in any of those small tests under 8.0.3.

-Kevin


>>> Tom Lane <tgl@sss.pgh.pa.us> 10/03/05 3:48 PM >>>
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> However, I'm looking at the autovacuum code to see why it's sitting
> holding locks on the small table and not vacuuming it.  I see on the
> pg_locks output that process 3158 (autovacuum) has got locks on the
> table and index, but it apparently isn't vacuuming the table.  If this
> is correct, it's a bug.  However I can't seem to find out why this
> happens.

We can see clearly from the pg_locks output that VACUUM isn't waiting
for an lmgr lock, so the problem must be at a lower level.  The
hypothesis I'm thinking about is that VACUUM is trying to do
LockBufferForCleanup() and for some reason it never finishes.  There are
a number of possible scenarios that could explain this: leaked buffer
pin, dropped signal, etc.

> Kevin, Jeff, next time this happens please attach gdb to the autovacuum
> process and get a stack trace ("bt" to gdb), if at all possible, and/or
> strace it to see what it's doing.

Please!

Also, we need to keep a little clarity about what we are dealing with.
This thread has mentioned hangups in both plain vacuum (autovacuum) and
VACUUM FULL.  It seems very likely to me that there are different
mechanisms involved --- since VACUUM FULL takes an exclusive lock on the
whole table, that eliminates an entire class of possible explanations
for the plain-VACUUM case, while introducing a whole new set of
explanations having to do with the VACUUM being queued up behind
ordinary table locks.  Please be perfectly clear about which scenario
each report is about.

Finally, I'm wondering whether this bug is new in 8.1 or is
pre-existing.  Has this same application been running successfully
in 8.0?

            regards, tom lane


pgsql-admin by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Vacuum Full Analyze Stalled
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Vacuum Full Analyze Stalled