For the past few days, we've been seeing unexpected extremely high CPU spikes in our system. We observed the following: the 'free' memory would go down to lower than 300 MB; at that point, 'cached' slowly starts to go down, and then CPU starts to go way up.
It's almost as if the OS was not releasing 'cached' memory fast enough for Postgres. Is that analysis correct? Is there a way to fix this?
This sounds like a kernel problem, probably either the zone reclaim issue, or the transparent huge pages issue.
I don't know the exact details off the top of my head, but both have been discussed a lot on both this list and the pgsql-hackers list.
What tool is that? I'm not familiar with this output format.
max_connections | 500
While this is probably fundamentally a kernel problem, you are not doing yourself any favors by allowing 500 connections to a machine with 24 cores. High numbers of connections can trigger poor kernel behavior.