On Wed, May 24, 2017 at 08:24:15AM -0400, Bill Moran wrote:
> ... I tried allocating 64G to shared buffers and we had a bunch of problems
> with inconsistent performance, including "stall" periods where the database
> would stop responding for 2 or 3 seconds. After trying all sorts of tuning
> options that didn't help, the problem finally went away after reducing
> shared_buffers to 32G. I speculated, at the time, that the shared buffer code
> hit performance issues managing that much memory, but I never had the
> opportunity to really follow up on it.
I think you were hitting an issue related to "kernel shared memory" and maybe
"transparent huge pages".
I was able to work around similar issues with ~32GB allocations to QEMU/QEMU
running on something like kernel 3.13. I didn't spend time to narrow down the
problem, and I don't know if the behavior is better with recent kernel.
/sys/kernel/mm/ksm/run=2
... and maybe also:
/sys/kernel/mm/transparent_hugepage/defrag=madvise
/sys/kernel/mm/ksm/merge_across_nodes=0
Justin