On Jan 2, 2025, at 9:21 PM, Tomas Vondra <tomas@vondra.me> wrote:
That said, I do think a workload manager would be more effective than
trying to limit total connections.
The "workload management" concept is so abstract I find it verydifficult to discuss without much more detail about how would itactually work / be implemented.I believe implementing some rudimentary "global" memory accounting wouldnot be *that* hard (possibly along the lines of the patches early inthis thread), and adding some sort of dynamic connection limit would notbe much harder I think. But then comes the hard part of actually doingthe "workload management" part, which seems pretty comparable to what aQoS / scheduler needs to do. With all the weird corner cases.
I’ve been saying “workload management” for lack of a better term, but my initial suggestion upthread was to simply stop allowing new transactions to start if global work_mem consumption exceeded some threshold. That’s simplistic enough that I wouldn’t really consider it “workload management”. Maybe “deferred execution” would be a better name. The only other thing it’d need is a timeout on how long a new transaction could sit in limbo.
I agree that no matter what everything being proposed would rely on having metrics on actual work_mem consumption. That would definitely be a good feature on its own. I’m thinking adding “work_mem_bytes” and “work_mem_operations” to pg_stat_activity (where “work_mem_operations” would tell you how many different things were using work_mem in the backend.
Incidentally, something related to this that I’ve seen is backend memory consumption slowly growing over time. Unbounded growth of relcache and friends was presumably the biggest contributor. There’s an argument to be made for a view dedicated to tracking per-backend memory stats, with additional info about things contributing to idle memory consumption.