This happened again. This time I got the connection status(between
pgbouncer host to pgsql host) at postgresql side. When the problem
happens, the connection status is this:
ESTABLISHED: 188
CLOSE_WAIT: 116
The count of connections in CLOSE_WAIT is abnormal. Comparing with
normal situation, there is usually no close_wait connection. The
connection status sample is like this:
ESTABLISHED: 117
CLOSE_WAIT: 0
I have 4 users configured in pgbouncer and the pool_size is 50. So the
max number of connections from pgbouncer should be less than 200.
The connection spike happens very quickly. I created a script to check
the connections from pgbouncer. The script checks the connections from
pgbouncer every 5 mins. This is the log:
10:55:01 CST pgbouncer is healthy. connection count: 73
11:00:02 CST pgbouncer is healthy. connection count: 77
11:05:01 CST pgbouncer is healthy. connection count: 118
11:10:01 CST pgbouncer is healthy. connection count: 115
11:15:01 CST pgbouncer is healthy. connection count: 75
11:20:01 CST pgbouncer is healthy. connection count: 73
11:25:02 CST pgbouncer is healthy. connection count: 75
11:30:01 CST pgbouncer is healthy. connection count: 77
11:35:01 CST pgbouncer is healthy. connection count: 84
11:40:10 CST Problematic connection count: 292, will restart pgbouncer...
Now I suspect there is some network problem between the hosts of
pgbouncer and pgsql. Will check more.