Hi,
On 8/11/23 14:05, Merlin Moncure wrote:
> On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav.pg@gmail.com> wrote:
>
> Hi,
>
> On 6/7/23 23:37, Andres Freund wrote:
> > I think we're starting to hit quite a few limits related to the
> process model,
> > particularly on bigger machines. The overhead of cross-process
> context
> > switches is inherently higher than switching between threads in
> the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we
> end up spending
> > a *lot* of time in TLB misses, and that's inherent to the
> process model,
> > because you can't share the TLB across processes.
>
> Another problem I haven't seen mentioned yet is the excessive kernel
> memory usage because every process has its own set of page table
> entries
> (PTEs). Without huge pages the amount of wasted memory can be huge if
> shared buffers are big.
>
>
> Hm, noted this upthread, but asking again, does this
> help/benefit interactions with the operating system make oom kill
> situations less likely? These things are the bane of my existence,
> and I'm having a hard time finding a solution that prevents them other
> than running pgbouncer and lowering max_connections, which adds
> complexity. I suspect I'm not the only one dealing with this.
> What's really scary about these situations is they come without
> warning. Here's a pretty typical example per sar -r.
>
> The conjecture here is that lots of idle connections make the server
> appear to have less memory available than it looks, and sudden
> transient demands can cause it to destabilize.
It does in the sense that your server will have more memory available in
case you have many long living connections around. Every connection has
less kernel memory overhead if you will. Of course even then a runaway
query will be able to invoke the OOM killer. The unfortunate thing with
the OOM killer is that, in my experience, it often kills the
checkpointer. That's because the checkpointer will touch all of shared
buffers over time which makes it likely to get selected by the OOM
killer. Have you tried disabling memory overcommit?
--
David Geier
(ServiceNow)