Thread: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Stephan Knauss

Date:

30 March 2021, 18:33:27

Hello,

I have a setup in which I have constantly 8 connections to the database
open which query data for map rendering. The query involve only a hand
full of tables, but could be complex and use postgis functions.

The connection stays open. Only read-requests are happening, no
update/insert/delete.

I observe available memory going down at a rate of roughly 2GB per hour.

I am running the database inside a docker container, using latest
postgis/postgis image which is then using the latest postgres image as a
base.

PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
POSTGIS=""3.1.1 aaf4c79"" [EXTENSION] PGSQL=""130""
GEOS=""3.7.1-CAPI-1.11.1 27a5e771"" PROJ=""Rel. 5.2.0, September 15th,
2018"" LIBXML=""2.9.4"" LIBJSON=""0.12.1"" LIBPROTOBUF=""1.3.1""
WAGYU=""0.5.0 (Internal)""

It keeps increasing RssAnon memory of the backends until OOM.

The wiki suggested to dump MemoryContext states for more details, but
something strange happens when attaching gdb. It seems that the process
is immediately killed and I can no longer dump such details.

Is this related to running within docker? I added capabilities and
seccomp for gdb:

     cap_add:
       - SYS_PTRACE
     security_opt:
       - seccomp:unconfined

with count(*) in pg_stat_activity I see 20 connections. Besides the 8
expected client backend it also lists some parallel worker and other
postgres maintenance tasks.

work_mem is set to 50MB. shared_buffers to 3500MB. By looking at
/proc/meminfo I think that all of this shared memory is statically
allocated as HugePages. I see no increase there. Only increase is within
RssAnon.

Any ideas on how to debug deeper into this issue?

Any idea why gdb is not working as expected? More logs below.

It seems that the binary is terminating while gdb is attaching also
sending a cont immediately does not change it.

Any ideas what configuration parameter might help? I had a similar setup
working with older versions of postgresql and postgis and it did not
show this constant loss of memory.

I can work-around by periodically sending a SIGHUP to my clients, which
will then reconnect to postgresql and this frees the memory of the
backends. So is something kept longer in memory than needed?

Others seem to have similar issues using slightly older versions as well

https://github.com/openstreetmap/mod_tile/issues/181#issuecomment-738303445


I would appreciate some ideas/pointers on how to track this down further.


Logs of OOM (before using hugepages, but this did not change the situation):

postgres invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
order=0, oom_score_adj=0
CPU: 9 PID: 3322620 Comm: postgres Not tainted 5.4.0-67-generic #75-Ubuntu
Hardware name: Gigabyte Technology Co., Ltd. B360 HD3P-LM/B360HD3PLM-CF,
BIOS F7 HZ 07/24/2020
Call Trace:
  dump_stack+0x6d/0x8b
  dump_header+0x4f/0x1eb
  oom_kill_process.cold+0xb/0x10
  out_of_memory.part.0+0x1df/0x3d0
  out_of_memory+0x6d/0xd0
  __alloc_pages_slowpath+0xd5e/0xe50
  __alloc_pages_nodemask+0x2d0/0x320
  alloc_pages_current+0x87/0xe0
  __page_cache_alloc+0x72/0x90
  pagecache_get_page+0xbf/0x300
  filemap_fault+0x6b2/0xa50
  ? unlock_page_memcg+0x12/0x20
  ? page_add_file_rmap+0xff/0x1a0
  ? xas_load+0xd/0x80
  ? xas_find+0x17f/0x1c0
  ? filemap_map_pages+0x24c/0x380
  ext4_filemap_fault+0x32/0x50
  __do_fault+0x3c/0x130
  do_fault+0x24b/0x640
  ? __switch_to_asm+0x34/0x70
  __handle_mm_fault+0x4c5/0x7a0
  handle_mm_fault+0xca/0x200
  do_user_addr_fault+0x1f9/0x450
  __do_page_fault+0x58/0x90
  do_page_fault+0x2c/0xe0
  page_fault+0x34/0x40
RIP: 0033:0x558ddf0beded
Code: Bad RIP value.
RSP: 002b:00007ffe5214a020 EFLAGS: 00010202
RAX: 00007fea26b16b28 RBX: 0000000000000028 RCX: 00007fea26b16b68
RDX: 0000000000000028 RSI: 0000000000000000 RDI: 00007fea26b16b28
RBP: 0000000000000010 R08: 00007fea26b16b28 R09: 0000000000000019
R10: 0000000000000001 R11: 0000000000000001 R12: 00000000ffffffff
R13: 00007fea26af76d8 R14: 00007fea26af7728 R15: 0000000000000000
Mem-Info:
active_anon:29797121 inactive_anon:2721653 isolated_anon:32
  active_file:323 inactive_file:83 isolated_file:0
  unevictable:16 dirty:14 writeback:0 unstable:0
  slab_reclaimable:85925 slab_unreclaimable:106003
  mapped:1108591 shmem:14943591 pagetables:69567 bounce:0
  free:148637 free_pcp:1619 free_cma:0
Node 0 active_anon:119188484kB inactive_anon:10886612kB
active_file:1292kB inactive_file:332kB unevictable:64kB
isolated(anon):128kB isolated(file):0kB mapped:4434364kB dirty:56kB
writeback:0kB shmem:59774364kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Node 0 DMA free:15904kB min:8kB low:20kB high:32kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB present:15988kB managed:15904kB mlocked:0kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
lowmem_reserve[]: 0 809 128670 128670 128670
Node 0 DMA32 free:511716kB min:424kB low:1252kB high:2080kB
active_anon:328544kB inactive_anon:8540kB active_file:204kB
inactive_file:0kB unevictable:0kB writepending:0kB present:947448kB
managed:881912kB mlocked:0kB kernel_stack:0kB pagetables:416kB
bounce:0kB free_pcp:1412kB local_pcp:288kB free_cma:0kB
lowmem_reserve[]: 0 0 127860 127860 127860
Node 0 Normal free:66928kB min:67148kB low:198076kB high:329004kB
active_anon:118859812kB inactive_anon:10877988kB active_file:1812kB
inactive_file:1480kB unevictable:64kB writepending:56kB
present:133160960kB managed:130937320kB mlocked:64kB
kernel_stack:18336kB pagetables:277852kB bounce:0kB free_pcp:5064kB
local_pcp:384kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB
(U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Node 0 DMA32: 1388*4kB (UME) 1274*8kB (UME) 1772*16kB (UE) 1564*32kB
(UME) 1082*64kB (UE) 514*128kB (UME) 160*256kB (UE) 46*512kB (UME)
23*1024kB (UE) 11*2048kB (UME) 42*4096kB (UME) = 511808kB
Node 0 Normal: 715*4kB (UEH) 108*8kB (UEH) 1400*16kB (UMEH) 1156*32kB
(UMEH) 58*64kB (UMEH) 7*128kB (UMEH) 1*256kB (U) 0*512kB 0*1024kB
0*2048kB 0*4096kB = 67980kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
14982418 total pagecache pages
38157 pages in swap cache
Swap cache stats: add 14086008, delete 14047525, find 102734363/105748939
Free swap  = 0kB
Total swap = 4189180kB
33531099 pages RAM
0 pages HighMem/MovableOnly
572315 pages reserved
0 pages cma reserved
0 pages hwpoisoned

oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=fae7f79d710f4449fd87c58f38eb164a470e3f837b33630c6c10a9fbca10a82b,mems_allowed=0,global_oom,task_memcg=/docker/e88b65d20ef39588c6bf9c00e7aa2946f134a61a6195c210f7081d7ed4d9a5fa,task=postgres,pid=2453393,uid=999


Out of memory: Killed process 2453393 (postgres) total-vm:10625568kB,
anon-rss:6768188kB, file-rss:4kB, shmem-rss:3592772kB, UID:999
pgtables:20756kB oom_score_adj:0
oom_reaper: reaped process 2453393 (postgres), now anon-rss:0kB,
file-rss:0kB, shmem-rss:3592772kB



Logs of gdb attach

# gdb -p 1296 -ex cont
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 1296
Reading symbols from /usr/lib/postgresql/13/bin/postgres...Reading
symbols from
/usr/lib/debug/.build-id/31/ae2853776500091d313e76cf679017e697884b.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading
symbols from
/usr/lib/debug/.build-id/e9/1114987a0147bd050addbd591eb8994b29f4b3.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /usr/lib/x86_64-linux-gnu/libxml2.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpam.so.0...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libssl.so.1.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols
from
/usr/lib/debug/.build-id/5d/cf98ad684962be494af28a1051793fd39e4ebc.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols
from
/usr/lib/debug/.build-id/d3/583c742dd47aaa860c5ae0c0c5bdbcd2d54f61.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols
from
/usr/lib/debug/.build-id/88/5dda4b4a5cea600e7b5b98c1ad86996c8d2299.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicui18n.so.63...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicuuc.so.63...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libsystemd.so.0...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
from
/usr/lib/debug/.build-id/18/b9a9a8c523e5cfe5b5d946d605d09242f09798.debug...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from
/usr/lib/debug/.build-id/f2/5dfd7b95be4ba386fd71080accae8c0732b711.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicudata.so.63...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/liblzma.so.5...(no debugging
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libaudit.so.1...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5.so.3...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libk5crypto.so.3...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcom_err.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5support.so.0...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libkeyutils.so.1...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libresolv.so.2...Reading
symbols from
/usr/lib/debug/.build-id/02/6c3ba167f64f631eb8781fca2269fbc2ee7ca5.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libsasl2.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgnutls.so.30...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/liblz4.so.1...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcrypt.so.20...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcap-ng.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libp11-kit.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libidn2.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libunistring.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libtasn1.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libnettle.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libhogweed.so.4...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgmp.so.10...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgpg-error.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libffi.so.6...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading
symbols from
/usr/lib/debug/.build-id/4b/ff8b782e1602c596e856bdef06e642e50e7fa7.debug...done.
done.
Reading symbols from /usr/lib/postgresql/13/lib/postgis-3.so...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgeos_c.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libproj.so.13...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libjson-c.so.3...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libSFCGAL.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgeos-3.7.1.so...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libCGAL_Core.so.13...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.67.0...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_timer.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.67.0...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_thread.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_system.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.67.0...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_date_time.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_atomic.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libCGAL.so.13...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgmpxx.so.4...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libmpfr.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/postgresql/13/lib/hstore.so...Reading
symbols from
/usr/lib/debug/.build-id/ae/d0b2a75765cea927097e4d89b4e2f39a5534be.debug...done.
done.
Reading symbols from /usr/lib/postgresql/13/lib/llvmjit.so...Reading
symbols from
/usr/lib/debug/.build-id/d3/f52dce996ffe1929204774abcd4fa75fbb317e.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libLLVM-7.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libedit.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libtinfo.so.6...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libbsd.so.0...(no
debugging symbols found)...done.
0x00007fe4bb39a7b7 in epoll_wait (epfd=4, events=0x55d1517d4990,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c:
No such file or directory.
Continuing.

Program received signal SIGQUIT, Quit.
0x00007fe4bb39a7b7 in epoll_wait (epfd=4, events=0x55d1517d4990,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      in ../sysdeps/unix/sysv/linux/epoll_wait.c

(gdb) p MemoryContextStats(TopMemoryContext)
Couldn't get extended state status: No such process.
(gdb) bt
#0  0x00007fe4bb39a7b7 in epoll_wait (epfd=4, events=0x55d1517d4990,
maxevents=1, maxevents@entry=<error reading variable: Cannot access
memory at address 0x7fff7515518c>, timeout=-1, timeout@entry=<error
reading variable: Cannot access memory at address 0x7fff75155178>)
     at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055d14fe9d089 in WaitEventSetWaitBlock (nevents=<error reading
variable: Cannot access memory at address 0x7fff7515518c>,
occurred_events=<error reading variable: Cannot access memory at address
0x7fff75155168>,
     cur_timeout=<error reading variable: Cannot access memory at
address 0x7fff75155178>, set=0x55d1517d4918) at
./build/../src/backend/storage/ipc/latch.c:1295
#2  WaitEventSetWait (set=0x55d1517d4918, timeout=<error reading

variable: Cannot access memory at address 0x7fff75155170>,
occurred_events=<error reading variable: Cannot access memory at address
0x7fff75155168>,
     nevents=<error reading variable: Cannot access memory at address
0x7fff7515518c>, wait_event_info=<optimized out>) at
./build/../src/backend/storage/ipc/latch.c:1247
Backtrace stopped: Cannot access memory at address 0x7fff75155208
(gdb) cont
Continuing.

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) detach
The program is not being run.
(gdb) quit

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Tom Lane

Date:

30 March 2021, 18:46:21

Stephan Knauss <pgsql@stephans-server.de> writes:
> The wiki suggested to dump MemoryContext states for more details, but 
> something strange happens when attaching gdb. It seems that the process 
> is immediately killed and I can no longer dump such details.

You might try running the postmaster under a restrictive ulimit
(I think the -v option is the one that matters on Linux, not -d
as you might guess).  The idea here is that the backends would
get an actual ENOMEM failure from malloc() before reaching the
point where the kernel's OOM-kill behavior takes over.  Given
that, they'd dump memory maps to stderr of their own accord,
and you could maybe get some insight as to what's leaking.
This'd also reduce the severity of the problem when it does
happen.

            regards, tom lane

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Stephan Knauss

Date:

31 March 2021, 18:08:23

On 30.03.2021 20:46, Tom Lane wrote:
> Stephan Knauss <pgsql@stephans-server.de> writes:
>> The wiki suggested to dump MemoryContext states for more details, but
>> something strange happens when attaching gdb. It seems that the process
>> is immediately killed and I can no longer dump such details.
> (I think the -v option is the one that matters on Linux, not -d
> as you might guess).  The idea here is that the backends would
> get an actual ENOMEM failure from malloc() before reaching the
> point where the kernel's OOM-kill behavior takes over.  Given
> that, they'd dump memory maps to stderr of their own accord,
> and you could maybe get some insight as to what's leaking.
> This'd also reduce the severity of the problem when it does
> happen.

Hello Tom, the output below looks similar to the OOM output you
expected. Can you give a hint how to interpret the results?

I had a backend which had a larger amount of memory allocated already.
So I gave "gcore -a" a try.

In contrast to the advertised behavior, the process did not continue to
run but I got a core file at least. Probably related to gcore just
calling gdb attach which somehow triggers a SIGKILL of all backends.

With 4.2GB in size it hopefully has most of the relevant memory
structures are there. Without a running process I still can not call
MemoryContextStats(), but I found a macro which claims to decode the
memory structure post mortem:

https://www.cybertec-postgresql.com/en/checking-per-memory-context-memory-consumption/


This gave me the following memory structure:

How should it be interpreted? It looks like the size is bytes as it
calculates with pointers. But the numbers look a bit small, given that I
had a backend with roughly 6GB RSS memory.

I thought it might print overall size and then indent and print the
memory of children, but the numbers do indicate this is not the case,
having a higher level smaller size than children:

   CachedPlanSource: 67840
    unnamed prepared statement: 261920

So how to read it and any indication why I have a constantly increasing
memory footprint? Is there any indication where multiple gigabytes are
allocated?



root@0ec98d20bda2:/# gdb /usr/lib/postgresql/13/bin/postgres core.154218
<gdb-context
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/13/bin/postgres...Reading
symbols from
/usr/lib/debug/.build-id/31/ae2853776500091d313e76cf679017e697884b.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 154218]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: osm gis 172.20.0.3(51894) idle'.
#0  0x00007fc01cfa07b7 in epoll_wait (epfd=4, events=0x55f403584080,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c:
No such file or directory.
(gdb) >>>> > > >>>(gdb) (gdb) >>>> > > >>>>> > > >>(gdb) (gdb)
TopMemoryContext: 109528
  dynahash: 7968
  HandleParallelMessages: 7968
  dynahash: 7968
  dynahash: 7968
  dynahash: 7968
  dynahash: 24392
  dynahash: 24352
  RowDescriptionContext: 24352
  MessageContext: 7968
  dynahash: 7968
  dynahash: 32544
  TransactionAbortContext: 32544
  dynahash: 7968
  TopPortalContext: 7968
  dynahash: 16160
  CacheMemoryContext: 1302944
   CachedPlan: 138016
   CachedPlanSource: 67840
    unnamed prepared statement: 261920
   index info: 1824
   index info: 1824
   index info: 3872
   index info: 1824
   index info: 1824
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 1824
   index info: 3872
   relation rules: 32544
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 3872
   relation rules: 24352
   index info: 3872
   index info: 3872
   index info: 1824
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 1824
   index info: 3872
   index info: 1824
   index info: 3872
   relation rules: 32544
   index info: 1824
   index info: 2848
   index info: 1824
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 3872
   index info: 1824
   index info: 3872
   index info: 1824
   index info: 1824
   relation rules: 32544
   index info: 1824
   index info: 2848
   index info: 1824
   index info: 800
   index info: 1824
   index info: 800
   index info: 800
   index info: 2848
   index info: 1824
   index info: 800
   index info: 800
   index info: 800
   index info: 2848
   index info: 1824
   index info: 1824
--Type <RET> for more, q to quit, c to continue without paging--  index
info: 2848
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 800
   index info: 800
   index info: 800
   index info: 2848
   index info: 2848
   index info: 1824
   index info: 1824
   index info: 800
   index info: 800
   index info: 2848
   index info: 800
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 800
   index info: 2848
   index info: 2848
   index info: 2848
   index info: 800
   index info: 800
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 1824
   index info: 2848
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 2848
   index info: 800
   index info: 1824
   index info: 800
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 800
   index info: 1824
   index info: 2848
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 1824
   index info: 1824
  WAL record construction: 49544
  dynahash: 7968
  MdSmgr: 7968
  dynahash: 16160
  dynahash: 103896
  ErrorContext: 7968
(gdb) quit
root@0ec98d20bda2:/# cat gdb-context
define sum_context_blocks
set $context = $arg0
set $block = ((AllocSet) $context)->blocks
set $size = 0
while ($block)
set $size = $size + (((AllocBlock) $block)->endptr - ((char *) $block))
set $block = ((AllocBlock) $block)->next
end
printf "%s: %d\n",((MemoryContext)$context)->name, $size
end

define walk_contexts
set $parent_$arg0 = ($arg1)
set $indent_$arg0 = ($arg0)
set $i_$arg0 = $indent_$arg0
while ($i_$arg0)
printf " "
set $i_$arg0 = $i_$arg0 - 1
end
sum_context_blocks $parent_$arg0
set $child_$arg0 = ((MemoryContext) $parent_$arg0)->firstchild
set $indent_$arg0 = $indent_$arg0 + 1
while ($child_$arg0)
walk_contexts $indent_$arg0 $child_$arg0
set $child_$arg0 = ((MemoryContext) $child_$arg0)->nextchild
end
end

walk_contexts 0 TopMemoryContext

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Tom Lane

Date:

31 March 2021, 18:24:10

Stephan Knauss <pgsql@stephans-server.de> writes:
> Hello Tom, the output below looks similar to the OOM output you
> expected. Can you give a hint how to interpret the results?

Looks like the answer is that wherever the leak is, it's not accounted
for by this info; none of those contexts are particularly large.

Based on nearby threads, it occurs to me to ask whether you have JIT
enabled, and if so whether turning it off helps.  There seems to be
a known leak of the code fragments generated by that in some cases.

If that's not it, then the leak must be accumulating through plain
old malloc calls.  There's not much of that in the core backend
(although if you use ispell text search dictionaries, maybe [1] is
relevant), so my suspicions would next fall on any extensions you
might be using.

            regards, tom lane

[1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=eba939551

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Paul Ramsey

Date:

31 March 2021, 18:27:26


> On Mar 31, 2021, at 11:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Stephan Knauss <pgsql@stephans-server.de> writes:
>> Hello Tom, the output below looks similar to the OOM output you
>> expected. Can you give a hint how to interpret the results?
>
> Looks like the answer is that wherever the leak is, it's not accounted
> for by this info; none of those contexts are particularly large.
>
> Based on nearby threads, it occurs to me to ask whether you have JIT
> enabled, and if so whether turning it off helps.  There seems to be
> a known leak of the code fragments generated by that in some cases.
>
> If that's not it, then the leak must be accumulating through plain
> old malloc calls.  There's not much of that in the core backend
> (although if you use ispell text search dictionaries, maybe [1] is
> relevant), so my suspicions would next fall on any extensions you
> might be using.

Would be interested in the queries being run. We have a reproduceable leak in <-> geography operator that we have been
unableto track down.  

P

>
>             regards, tom lane
>
> [1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=eba939551
>
>

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Stephan Knauss

Date:

03 April 2021, 18:47:45

On 31.03.2021 20:27, Paul Ramsey wrote:
> On Mar 31, 2021, at 11:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If that's not it, then the leak must be accumulating through plain
>> old malloc calls.  There's not much of that in the core backend
>> (although if you use ispell text search dictionaries, maybe [1] is
>> relevant), so my suspicions would next fall on any extensions you
>> might be using.
> Would be interested in the queries being run. We have a reproduceable leak in <-> geography operator that we have
beenunable to track down. 
>
I tried to get any idea where the memory is by feeding my core file into
chap. I ended up not knowing how to proceed further. I found a
std::vector which was roughly 70 MB in size and a lot of allocations
chap could not identify where they belonged to:

chap> summarize used
Unrecognized allocations have 23098683 instances taking
0xe6973688(3,868,669,576) bytes.
    Unrecognized allocations of size 0x68 have 8214926 instances taking
0x32ec61b0(854,352,304) bytes.
    Unrecognized allocations of size 0x28 have 4890673 instances taking
0xba907a8(195,626,920) bytes.
    Unrecognized allocations of size 0x38 have 3255995 instances taking
0xade38e8(182,335,720) bytes.
    Unrecognized allocations of size 0x48 have 2096221 instances taking
0x8fefa28(150,927,912) bytes.
    Unrecognized allocations of size 0x78 have 1792042 instances taking
0xcd153b0(215,045,040) bytes.
    Unrecognized allocations of size 0x88 have 1393038 instances taking
0xb4ad370(189,453,168) bytes.
    Unrecognized allocations of size 0x58 have 657193 instances
taking
0x3727618(57,832,984) bytes.
    Unrecognized allocations of size 0xa8 have 360742 instances
taking
0x39cc0f0(60,604,656) bytes.
    Unrecognized allocations of size 0x18 have 270834 instances
taking
0x632eb0(6,500,016) bytes.
    Unrecognized allocations of size 0x98 have 60372 instances taking
0x8c05e0(9,176,544) bytes.
    Unrecognized allocations of size 0xb8 have 37559 instances taking
0x697388(6,910,856) bytes.
    Unrecognized allocations of size 0xd8 have 19033 instances taking
0x3ebb18(4,111,128) bytes.
    Unrecognized allocations of size 0x128 have 19010 instances
taking
0x55dc50(5,626,960) bytes.
    Unrecognized allocations of size 0xe8 have 9526 instances taking
0x21b8f0(2,210,032) bytes.
    Unrecognized allocations of size 0x118 have 9513 instances taking
0x28a4d8(2,663,640) bytes.
    Unrecognized allocations of size 0x1f8 have 9499 instances taking
0x490d28(4,787,496) bytes.
    Unrecognized allocations of size 0x408 have 292 instances taking
0x49920(301,344) bytes.
    Unrecognized allocations of size 0x2008 have 194 instances taking
0x184610(1,590,800) bytes.
    Unrecognized allocations of size 0x1008 have 155 instances taking
0x9b4d8(636,120) bytes.
    Unrecognized allocations of size 0x4008 have 137 instances taking
0x224448(2,245,704) bytes.
    Unrecognized allocations of size 0x8008 have 133 instances taking
0x428428(4,359,208) bytes.
    Unrecognized allocations of size 0x10008 have 131 instances
taking
0x830418(8,586,264) bytes.
    Unrecognized allocations of size 0x40008 have 131 instances
taking
0x20c0418(34,341,912) bytes.
    Unrecognized allocations of size 0x100008 have 130 instances taking
0x8200410(136,315,920) bytes.
    Unrecognized allocations of size 0x20008 have 129 instances
taking
0x1020408(16,909,320) bytes.
    Unrecognized allocations of size 0x80008 have 129 instances
taking
0x4080408(67,634,184) bytes.
    Unrecognized allocations of size 0x200008 have 129 instances taking
0x10200408(270,533,640) bytes.
    Unrecognized allocations of size 0x400008 have 128 instances taking
0x20000400(536,871,936) bytes.
    Unrecognized allocations of size 0x418 have 53 instances taking
0xd8f8(55,544) bytes.
[...]
Pattern %VectorBody has 57401 instances taking 0x42eaaf0(70,167,280) bytes.
[...]
Pattern %MapOrSetNode has 51362 instances taking 0x3f2f40(4,140,864) bytes.

let me know if it would be interesting for you to have this (huge) core
file with memory dump available somewhere.

I have executed a few minutes of queries on a single backend setup. The
RssAnon was growing during this time by about 10 MB in size.

Full query logs are here:
https://downloads.osm-tools.org/postgresql-2021-04-03_183913.csv.gz

I have the following extensions installed. hstore is heavily used and a
little bit postgis.

gis=# SELECT extname FROM pg_extension;
   extname
-----------
  plpgsql
  adminpack
  postgis
  hstore
(4 rows)

as per other suggestions I enabled auto_explain as well.

I hope you can get a hint what is allocating the memory. I could offer
to share some docker images to reproduce, but It would require a few
commands to set it up and a few hundred megabytes of download.

Stephan

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Stephan Knauss

Date:

05 April 2021, 07:23:23

Hello Tom,

On 31.03.2021 20:24, Tom Lane wrote:
> Based on nearby threads, it occurs to me to ask whether you have JIT
> enabled, and if so whether turning it off helps.  There seems to be
> a known leak of the code fragments generated by that in some cases.

That's it!

I am quite surprised that a functionality, which is on by default does 
generate such a massive leak and goes sort of undetected.

A single backend was leaking 250MB/hour, with my multiple connections it 
was 2GB. But exactly that happened.

Doing a set jit=off immediately stopped the leak.

You mentioned that this seems to be known. Do you have pointers to the 
relevant bug-tracker/thread? I would like to follow up on this.

I have not measured the impact of jit, but in theory it could bring 
larger performance benefits. So having it enabled sounds like a good 
idea, once it stops leaking.

I tried running Valgrind on postgres but I had not much success with it. 
processes seemed to terminate quite frequently. My last use of Valgrind 
is a while ago and my use-case back then was probably much simpler.

Is it known which queries are leading to a leak? I still have the 
recording of mine, including explain. Would it help to narrow it further 
down to single queries which leak? Or is the JIT re-creating optimized 
code for each slightly modified one without freeing the old ones? So 
re-running the same query would not leak?

https://downloads.osm-tools.org/postgresql-2021-04-03_183913.csv.gz

Stephan

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

From

Tom Lane

Date:

06 April 2021, 02:11:18

Stephan Knauss <pgsql@stephans-server.de> writes:
> On 31.03.2021 20:24, Tom Lane wrote:
>> Based on nearby threads, it occurs to me to ask whether you have JIT
>> enabled, and if so whether turning it off helps.  There seems to be
>> a known leak of the code fragments generated by that in some cases.

> That's it!

> You mentioned that this seems to be known. Do you have pointers to the 
> relevant bug-tracker/thread? I would like to follow up on this.

According to the v14 open issues page [1], there are a couple of older
threads.  I just added this one.

            regards, tom lane

[1] https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items