Hi.
The database is a large database (3TB+) and I have 200+ backend
processes polling a message queuetable for work to do. On this message
queue table, when activity rises autovacuum fails to make progress and I
cannot get what is actually holding it back:
Stracing the process gives this pattern: (strace -T -p <pid of autovacuum>):
....
semop(18317336, {{1, 1, 0}}, 1) = 0 <0.000021>
semop(19562558, {{10, -1, 0}}, 1) = 0 <141.819414>
lseek(156, 566099968, SEEK_SET) = 566099968 <0.000015>
read(156, "\233\230\0\0`\226\347\332\1\0\0\0\260\3\220\21\360\37\4
\0\0\0\0\340\237 \0\220\221 \0"..., 8192) = 8192 <0.000027>
...
read(156, "\234\230\0\0\210hx\240\1\0\0\0\350\4\260\f\360\37\4
\0\0\0\0\340\237 \0\260\214 \0"..., 8192) = 8192 <0.000018>
semop(19562558, {{10, -1, 0}}, 1) = 0 <32.707159>
lseek(30, 706174976, SEEK_SET) = 706174976 <0.000025>
write(30, "\235\230\0\0\240\353\25v\1\0\0\0\30\0\370\37\370\37\4 \
...
In other words .. large "sleeps" on semops on this semop(19562558, {{10,
-1, 0}}, 1)" .. thus no progress while waiting for this lock. If I pick
the
pg_locks view at the same time, it does not show that is is waiting for
anything.
Can I somehow get deeper into what competing activity that causes
autovacuum to fail making progress?
Thanks.
Jesper