Re: memory leak in trigger handling (since PG12) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: memory leak in trigger handling (since PG12) |
Date | |
Msg-id | 9574e4b6-3334-777b-4287-29d81151963a@enterprisedb.com Whole thread Raw |
In response to | Re: memory leak in trigger handling (since PG12) (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
List | pgsql-hackers |
On 5/24/23 10:19, Jakub Wartak wrote: > Hi, just two cents: > > On Tue, May 23, 2023 at 8:01 PM Andres Freund <andres@anarazel.de> wrote: >> >> Hi, >> >> On 2023-05-23 13:28:30 -0400, Tom Lane wrote: >>> Andres Freund <andres@anarazel.de> writes: >>>> Could it help to have a mode where the executor shutdown hook checks how much >>>> memory is allocated in ExecutorState and warns if its too much? >>> >>> It'd be very hard to set a limit for what's "too much", since the amount >>> of stuff created initially will depend on the plan size. >> >> I was thinking of some limit that should really never be reached outside of a >> leak or work_mem based allocations, say 2GB or so. > > RE: instrumentation subthread: > if that helps then below technique can work somewhat good on normal > binaries for end users (given there are debug symbols installed), so > maybe we don't need that much infrastructure added in to see the hot > code path: > > perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on > x86_64(palloc size arg0) > perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p > <pid> sleep 3 # cannot be longer, huge overhead (~3s=~2GB) > > it produces: > 50.27% (563d0e380670) size=24 > | > ---palloc > bms_copy > ExecUpdateLockMode > ExecBRUpdateTriggers > ExecUpdate > [..] > > 49.73% (563d0e380670) size=16 > | > ---palloc > bms_copy > RelationGetIndexAttrBitmap > ExecUpdateLockMode > ExecBRUpdateTriggers > ExecUpdate > [..] > > Now we know that those small palloc() are guilty, but we didn't know > at the time with Tomas. The problem here is that we do not know in > palloc() - via its own arguments for which MemoryContext this is going > to be allocated for. This is problematic for perf, because on RHEL8, I > was not able to generate an uprobe that was capable of reaching a > global variable (CurrentMemoryContext) at that time. > I think there are a couple even bigger issues: (a) There are other methods that allocate memory - repalloc, palloc0, MemoryContextAlloc, ... and so on. But presumably we can trace all of them at once? We'd have to trace reset/deletes too. (b) This doesn't say if/when the allocated chunks get freed - even with a fix, we'll still do exactly the same number of allocations, except that we'll free the memory shortly after that. I wonder if we could print a bit more info for each probe, to match the alloc/free requests. > Additionally what was even more frustrating on diagnosing that case on > the customer end system, was that such OOMs were crashing other > PostgreSQL clusters on the same OS. Even knowing the exact guilty > statement it was impossible to limit RSS memory usage of that > particular backend. So, what you are proposing makes a lot of sense. > Also it got me thinking of implementing safety-memory-net-GUC > debug_query_limit_backend_memory=X MB that would inject > setrlimit(RLIMIT_DATA) through external extension via hook(s) and > un-set it later, but the man page states it works for mmap() only > after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 - > maybe that's still good enough?; Or, well maybe try to hack a palloc() > a little, but that has probably too big overhead, right? (just > thinking loud). > Not sure about setting a hard limit - that seems pretty tricky and may easily backfire. It's already possible to set such memory limit using e.g. cgroups, or even better use VMs to isolate the instances. I certainly agree it's annoying that when OOM hits, we end up with no information about what used the memory. Maybe we could have a threshold triggering a call to MemoryContextStats? So that we have at least some memory usage info in the log in case the OOM killer intervenes. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: