Re: Add the ability to limit the amount of memory that can be allocated to backends. - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Add the ability to limit the amount of memory that can be allocated to backends. |
Date | |
Msg-id | 4fb99fb7-8a6a-2828-dd77-e2f1d75c7dd0@enterprisedb.com Whole thread Raw |
In response to | Re: Add the ability to limit the amount of memory that can be allocated to backends. (reid.thompson@crunchydata.com) |
List | pgsql-hackers |
Hi, I wanted to take a look at the patch, and I noticed it's broken since 3d51cb5197 renamed a couple pgstat functions in August. I plan to maybe do some benchmarks etc. preferably on current master, so here's a version fixing that minor bitrot. As for the patch, I only skimmed through the thread so far, to get some idea of what the approach and goals are, etc. I didn't look at the code yet, so can't comment on that. However, at pgconf.eu a couple week ago I had quite a few discussions about such "backend memory limit" could/should work in principle, and I've been thinking about ways to implement this. So let me share some thoughts about how this patch aligns with that ... (FWIW it's not my intent to hijack or derail this patch in any way, but there's a couple things I think we should do differently.) I'm 100% on board with having a memory limit "above" work_mem. It's really annoying that we have no way to restrict the amount of memory a backend can allocate for complex queries, etc. But I find it a bit strange that we aim to introduce a "global" memory limit for all backends combined first. I'm not against having that too, but it's not the feature I usually wish to have. I need some protection against runaway backends, that happen to allocate a lot memory. Similarly, I'd like to be able to have different limits depending on what the backend does - a backend doing OLAP may naturally need more memory, while a backend doing OLTP may have a much tighter limit. But with a single global limit none of this is possible. It may help reducing the risk of unexpected OOM issues (not 100%, but useful), but it can't limit the impact to the one backend - if memory starts runnning out, it will affect all other backends a bit randomly (depending on the order in which the backends happen to allocate memory). And it does not consider what workloads the backends execute. Let me propose a slightly different architecture that I imagined while thinking about this. It's not radically differrent from what the patch does, but it focuses on the local accounting first. I believe it's possible to extend this to enforce the global limit too. FWIW I haven't tried implementing this - I don't want to "hijack" this thread and do my own thing. I can take a stab at a PoC if needed. Firstly, I'm not quite happy with how all the memory contexts have to do their own version of the accounting and memory checking. I think we should move that into a new abstraction which I call "memory pool". It's very close to "memory context" but it only deals with allocating blocks, not the chunks requested by palloc() etc. So when someone does palloc(), that may be AllocSetAlloc(). And instead of doing malloc() that would do MemoryPoolAlloc(blksize), and then that would do all the accounting and checks, and then do malloc(). This may sound like an unnecessary indirection, but the idea is that a single memory pool would back many memory contexts (perhaps all for a given backend). In principle we might even establish separate memory pools for different parts of the memory context hierarchy, but I'm not sure we need that. I can imagine the pool could also cache blocks for cases when we create and destroy contexts very often, but glibc should already does that for us, I believe. For me, the accounting and memory context is the primary goal. I wonder if we considered this context/pool split while working on the accounting for hash aggregate, but I think we were too attached to doing all of it in the memory context hierarchy. Of course, this memory pool is per backend, and so would be the memory accounting and limit enforced by it. But I can imagine extending to do a global limit similar to what the current patch does - using a counter in shared memory, or something. I haven't reviewed what's the overhead or how it handles cases when a backend terminates in some unexpected way. But whatever the current patch does, memory pool could do too. Secondly, I think there's an annoying issue with the accounting at the block level - it makes it problematic to use low limit values. We double the block size, so we may quickly end up with a block size a couple MBs, which means the accounting granularity gets very coarse. I think it'd be useful to introduce a "backpressure" between the memory pool and the memory context, depending on how close we are to the limit. For example if the limit is 128MB and the backend allocated 16MB so far, we're pretty far away from the limit. So if the backend requests 8MB block, that's fine and the memory pool should malloc() that. But if we already allocated 100MB, maybe we should be careful and not allow 8MB blocks - the memory pool should be allowed to override this and return just 1MB block. Sure, this would have to be optional, and not all places can accept a smaller block than requested (when the chunk would not fit into the smaller block). It would require a suitable memory pool API and more work in the memory contexts, but it seems pretty useful. Certainly not something for v1. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: