Re: Memory leak (possibly connected to postgis) leading to servercrash - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: Memory leak (possibly connected to postgis) leading to servercrash
Date
Msg-id 20191206124119.qe2xaffcbkcbmve3@development
Whole thread Raw
In response to Re: Memory leak (possibly connected to postgis) leading to server crash  (Roman Cervenak <roman@cervenak.info>)
Responses Re: Memory leak (possibly connected to postgis) leading to server crash
List pgsql-bugs
On Fri, Dec 06, 2019 at 12:46:44PM +0100, Roman Cervenak wrote:
>Yes, it was killed by oom killer:
>
>[2037990.376427]

>oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice,task=postgres,pid=52059,uid=111
>[2037990.376433] Out of memory: Kill process 52059 (postgres) score 294 or
>sacrifice child
>[2037990.384186] Killed process 52059 (postgres) total-vm:17508832kB,
>anon-rss:4309296kB, file-rss:108kB, shmem-rss:12641580kB
>[2037990.516504] oom_reaper: reaped process 52059 (postgres), now
>anon-rss:0kB, file-rss:0kB, shmem-rss:12641580kB
>
>(full dmesg.log attached, if it is interesting; there are more postgres
>backends visible, but they were inactive at the time)
>

OK, so it allocated extra ~4.3GB or so (plus shared buffers).

>I can try the gdb dump next time I will see it. But I cannot imagine giving
>you reproducible case - it is 500 GB proprietary database, and without it,
>queries would be hardly useful, I presume? I can try to make "sample" with
>generated data and find out if I can reproduce the issue with my queries
>that way, but that will be quite time consuming.
>

Yeah, I understand the data is proprietary. But we need to identify
where the issue is, somehow. And being able to reproduce it is likely
the best way to do that. One way to do that is to either generate
synthetic data with similar structure/features (geometries of similar
size etc.), and hope that it triggers the issue too. Or distill a subset
of data triggering the issue and anonymize it. Or something like that.

Yes, it's going to be quite time consuming, but I don't have better
ideas. Maybe running it under valgrind would help - you'd have to
rebuild PostgreSQL with valgrind support and run the workload on it. The
queries would be much slower (possibly by an order of magnitude or so),
so you'd have to run it on a different machine, but that's machine time,
not time wasted by a human.

Anyway, let's start with the easy stuff - try running it again and get
the memory context stats. Maybe it's a simple leak in PostgreSQL, in
which case it should be easier to investigate it. If that turns out to
be untrue, you can try this more complicated stuff (valgrind, ...).

>Would it help to dump the memory of the backend process and deliver the
>dump (by some private channel) to somebody to identify who is consuming all
>that memory? (that is the usual drill in windows)
>

I don't think that'd be very helpful - if this really is memory leak in
one of the libraries, I have no idea how to spot that in the memory
dump. Also, if you claim the data is sensitive/proprietary, you should
not really be sharing the dumps.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: Roman Cervenak
Date:
Subject: Re: Memory leak (possibly connected to postgis) leading to server crash
Next
From: Tom Lane
Date:
Subject: Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)