Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock
Date
Msg-id 20140917161227.GS25887@awork2.anarazel.de
Whole thread Raw
In response to BUG #11444: autovacuum stuck for 5 days and waits on a lock  (alexk@hintbits.com)
Responses Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock
List pgsql-bugs
Hi,

On 2014-09-17 16:02:16 +0000, alexk@hintbits.com wrote:
> We've got a problem with autovacuum process running for already 5 days,
> without being blocked by any particular process.
> The table in question is around 500MB. Usually we don't see vacuums running
> there for more than couple of hours.
> Other autovacuum processes (on different tables) start and finish without
> issues.
>
> The vacuum and autovacuum settings are left at default values. We don't have
> debug information compiled on this server, but here's what we have after the
> gdb bt output, attaching to the autovacuum process.

> #0  0x00007f423cbf34f7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f423e8581f8 in PGSemaphoreLock ()
> #2  0x00007f423e889a45 in LockBufferForCleanup ()
> #3  0x00007f423e6d9cd6 in ?? ()
> #4  0x00007f423e6da08f in ?? ()
> #5  0x00007f423e6daaf6 in btbulkdelete ()

What is happening here is that vacuum (of a btree index) is waiting for a so called
'cleanup' lock on a page. That is it waits for the page not being pinned
by any other backend.

> semop(2523174, {{12, -1, 0}}, 1
> semop(2523174, {{12, -1, 0}}, 1
>
>
> )        = 0
> lseek(21, 0, SEEK_END)                  = 32276480
> select(0, NULL, NULL, NULL, {0, 21000}) = 0 (Timeout)
> semop(2523174, {{12, -1, 0}}, 1
>
> Most of the time it is stuck with semop.

That indicates it's waiting most of the time. But it does make progress.

> There are also UPDATE statements constantly running on this table, in the
> overlapping manner, so at a single moment there is at least one update
> running on it. We are investigating why is it done this way, but can it be a
> reason behind this strange vacuum behavior?

Which quite possibly is caused by this.


I've recently commented on -hackers that this is a very hard to debug
behaviour and should be made more visible, but IIRC we didn't come to a
conclusion...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: alexk@hintbits.com
Date:
Subject: BUG #11444: autovacuum stuck for 5 days and waits on a lock
Next
From: Tom Lane
Date:
Subject: Re: pg_dump -Fd fails to detect ENOSPC