OK, we figured it out--I think.
pgbench was stuck in restart_syscall(<...resuming interrupted read...
it was set to open 100 connections
there were ~20 pg sessions in idle, and the last one (highest pid) in auth
that one was in write to fd 2
So... This is running in kubernetes. I was doing some load testing against a storage service (thus 100 connections). PG
waslaunched manually in a bash session connected to the pod, in k9s. There were ~20 total bash sessions open in k9s
across15 nodes.
Theory: k9s glitched and stopped reading the piped file descriptor, buffer filled, and PG blocked on the write. (I have
seenprior evidence of less-than-perfect handling of output by k9s). Particularly, I had logging of connections on, so
atauth it would have been writing to stderr.
This happened in one of probably over 100 runs of the same test, so not readily reproducible and I wanted to autopsy it
beforekilling off the hung processes. Unless someone pokes a hole in my theory, at this point I think it is neither
pgbenchnor PG nor Pure/Portworx at fault.
--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/