Home > mailing lists

Re: Memory leak (possibly connected to postgis) leading to servercrash - Mailing list pgsql-bugs

From	Tomas Vondra
Subject	Re: Memory leak (possibly connected to postgis) leading to servercrash
Date	December 6, 2019 12:41:19
Msg-id	20191206124119.qe2xaffcbkcbmve3@development Whole thread Raw
In response to	Re: Memory leak (possibly connected to postgis) leading to server crash (Roman Cervenak <roman@cervenak.info>)
Responses	Re: Memory leak (possibly connected to postgis) leading to server crash
List	pgsql-bugs

Tree view

On Fri, Dec 06, 2019 at 12:46:44PM +0100, Roman Cervenak wrote:
>Yes, it was killed by oom killer:
>
>[2037990.376427]

>oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice,task=postgres,pid=52059,uid=111
>[2037990.376433] Out of memory: Kill process 52059 (postgres) score 294 or
>sacrifice child
>[2037990.384186] Killed process 52059 (postgres) total-vm:17508832kB,
>anon-rss:4309296kB, file-rss:108kB, shmem-rss:12641580kB
>[2037990.516504] oom_reaper: reaped process 52059 (postgres), now
>anon-rss:0kB, file-rss:0kB, shmem-rss:12641580kB
>
>(full dmesg.log attached, if it is interesting; there are more postgres
>backends visible, but they were inactive at the time)
>

OK, so it allocated extra ~4.3GB or so (plus shared buffers).

>I can try the gdb dump next time I will see it. But I cannot imagine giving
>you reproducible case - it is 500 GB proprietary database, and without it,
>queries would be hardly useful, I presume? I can try to make "sample" with
>generated data and find out if I can reproduce the issue with my queries
>that way, but that will be quite time consuming.
>

Yeah, I understand the data is proprietary. But we need to identify
where the issue is, somehow. And being able to reproduce it is likely
the best way to do that. One way to do that is to either generate
synthetic data with similar structure/features (geometries of similar
size etc.), and hope that it triggers the issue too. Or distill a subset
of data triggering the issue and anonymize it. Or something like that.

Yes, it's going to be quite time consuming, but I don't have better
ideas. Maybe running it under valgrind would help - you'd have to
rebuild PostgreSQL with valgrind support and run the workload on it. The
queries would be much slower (possibly by an order of magnitude or so),
so you'd have to run it on a different machine, but that's machine time,
not time wasted by a human.

Anyway, let's start with the easy stuff - try running it again and get
the memory context stats. Maybe it's a simple leak in PostgreSQL, in
which case it should be easier to investigate it. If that turns out to
be untrue, you can try this more complicated stuff (valgrind, ...).

>Would it help to dump the memory of the backend process and deliver the
>dump (by some private channel) to somebody to identify who is consuming all
>that memory? (that is the usual drill in windows)
>

I don't think that'd be very helpful - if this really is memory leak in
one of the libraries, I have no idea how to spot that in the memory
dump. Also, if you claim the data is sensitive/proprietary, you should
not really be sharing the dumps.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

From: Roman Cervenak
Date: 06 December 2019, 11:46:44
Subject: Re: Memory leak (possibly connected to postgis) leading to server crash

From: Tom Lane
Date: 06 December 2019, 15:14:16
Subject: Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)

Re: Memory leak (possibly connected to postgis) leading to servercrash - Mailing list pgsql-bugs

Previous

Next