BUG #11444: autovacuum stuck for 5 days and waits on a lock - Mailing list pgsql-bugs

From alexk@hintbits.com
Subject BUG #11444: autovacuum stuck for 5 days and waits on a lock
Date
Msg-id 20140917160216.2486.59930@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      11444
Logged by:          Alexey Klyukin
Email address:      alexk@hintbits.com
PostgreSQL version: 9.3.5
Operating system:   Ubuntu 12.04 3.2.0-52-generic #78-Ubuntu SMP x64
Description:

Greetings,

We've got a problem with autovacuum process running for already 5 days,
without being blocked by any particular process.
The table in question is around 500MB. Usually we don't see vacuums running
there for more than couple of hours.
Other autovacuum processes (on different tables) start and finish without
issues.

The vacuum and autovacuum settings are left at default values. We don't have
debug information compiled on this server, but here's what we have after the
gdb bt output, attaching to the autovacuum process.

#0  0x00007f423cbf34f7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f423e8581f8 in PGSemaphoreLock ()
#2  0x00007f423e889a45 in LockBufferForCleanup ()
#3  0x00007f423e6d9cd6 in ?? ()
#4  0x00007f423e6da08f in ?? ()
#5  0x00007f423e6daaf6 in btbulkdelete ()
#6  0x00007f423e982c7c in FunctionCall4Coll ()
#7  0x00007f423e6d4a93 in index_bulk_delete ()
#8  0x00007f423e7c8322 in ?? ()
#9  0x00007f423e7c937d in lazy_vacuum_rel ()
#10 0x00007f423e7c6a26 in ?? ()
#11 0x00007f423e7c78cc in vacuum ()
#12 0x00007f423e85ac02 in ?? ()
#13 0x00007f423e85b107 in ?? ()
#14 0x00007f423e85b1c2 in StartAutoVacWorker ()
#15 0x00007f423e8683f8 in ?? ()
#16 <signal handler called>
#17 0x00007f423cbea743 in select () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x00007f423e867626 in PostmasterMain ()
#19 0x00007f423e6a3205 in main ()
(gdb)

The strace -p $pid produced the following output at one time:

semop(2523174, {{12, -1, 0}}, 1
semop(2523174, {{12, -1, 0}}, 1


)        = 0
lseek(21, 0, SEEK_END)                  = 32276480
select(0, NULL, NULL, NULL, {0, 21000}) = 0 (Timeout)
semop(2523174, {{12, -1, 0}}, 1

Most of the time it is stuck with semop.
The file in the lseek argument is an index  on the table autovacuum is
trying to vacuum.
I've tried running pgbreecheck on this index, but it did not show any issues
with it:

NOTICE: fast root: block 412 at level 2
NOTICE: setting page parents
NOTICE: done setting parents
NOTICE: Checking forward scan of level 0, starting at block 1
NOTICE: done checking level 0
NOTICE: checking visited nodes on level 0, starting at block 1
NOTICE: done checking visited nodes on level 0
NOTICE: Checking forward scan of level 1, starting at block 3
NOTICE: done checking level 1
NOTICE: checking visited nodes on level 1, starting at block 3
NOTICE: done checking visited nodes on level 1
NOTICE: Checking forward scan of level 2, starting at block 412
NOTICE: done checking level 2
NOTICE: checking visited nodes on level 2, starting at block 412
NOTICE: done checking visited nodes on level 2
started checking at root 412

There are also UPDATE statements constantly running on this table, in the
overlapping manner, so at a single moment there is at least one update
running on it. We are investigating why is it done this way, but can it be a
reason behind this strange vacuum behavior?

pgsql-bugs by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg_dump -Fd fails to detect ENOSPC
Next
From: Andres Freund
Date:
Subject: Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock