Philip Warner <pjw@rhyme.com.au> writes:
> I may have found the problem; all the hung processes show 'async_notify
> waiting' in ps, and the ANALYZE eventually dies with 'tuple concurrently
> updated'.
> The routine 'ProcessIncomingNotify' in async.c does indeed try to lock
> pg_listener (even if we're not using NOTIFY/LISTEN). Not sure why the
> ANALYZE is locking the relation, though...but it is locked in AccessShareLock.
Hm. What seems likely to have happened is that the sinval message queue
got full. We currently deal with this by sending SIGUSR2 to all
backends, which forces them through a NOTIFY-check cycle; a byproduct of
the transaction start is to read pending sinval messages. (This is
somebody's ugly quick hack from years ago; we really oughta find a less
expensive way of doing it.)
That would have left all the idle backends trying to get exclusive lock
on pg_listener, and if the ANALYZE subsequently reached pg_listener, its
share lock would queue up behind those requests.
What is not clear yet is why *all* of them are blocked. Seems something
else must have some kind of lock already on pg_listener; but who?
Can you get a dump of the pg_locks view while this is happening?
> And before anyone suggests it, we already have processes in place
> to prevent to ANALYZEs running at the same time.
How confident are you in those "processes"? I don't know of any other
mechanism for 'tuple concurrently updated' failures in ANALYZE than
concurrent analyze runs ...
regards, tom lane