Re: [GENERAL] Out of memory/corrupted shared memory problem on server - Mailing list pgsql-general

From Tom Lane
Subject Re: [GENERAL] Out of memory/corrupted shared memory problem on server
Date
Msg-id 13601.1503661683@sss.pgh.pa.us
Whole thread Raw
In response to [GENERAL] Out of memory/corrupted shared memory problem on server  (Johann Spies <johann.spies@gmail.com>)
Responses Re: [GENERAL] Out of memory/corrupted shared memory problem on server  (Johann Spies <johann.spies@gmail.com>)
List pgsql-general
Johann Spies <johann.spies@gmail.com> writes:
> While restoring a dump from our development server (768G ram) to the
> production server, (PG 9.6.3 on Debian Stretch with 128G Ram) the
> refreshing of a Materialized View fails like this:

> local] js@wos=# REFRESH MATERIALIZED VIEW wos_2017_1.citation_window_mv ;
> server closed the connection unexpectedly

> In the log:
> 2017-08-24 19:23:26 SAST [7532-18] LOG:  server process (PID 4890) was
> terminated by signal 9: Killed

As Christoph said, this looks a lot like the kernel OOM killer decided
you'd eaten too much memory.

> The Materialized View  uses a complex query and  should contain 69 772
> 381 records.

How complex is "complex"?  I can think of two likely scenarios:
1. You've stumbled across some kind of memory-leak bug in Postgres.
2. The query's just using too much memory.  In this connection, it's
not good that you've got
> work_mem = 2GB
Remember that "work_mem" is "work memory per plan node", so a complex
query could easily chew up a multiple of that number --- and that's
with everything going according to plan.  If, say, the planner
underestimates the number of table entries involved in a hash
aggregation, the actual consumption might be much larger.

My first move would be to reduce work_mem by an order of magnitude
or two.  If that doesn't help, check the plan for the view's query
and see if it contains any hash aggregation steps --- if so, does
"set enable_hashagg = off" help?  (Also, make sure the view's input
tables have been ANALYZEd recently.)

If none of that helps, we should investigate the memory-leak-bug
theory.  One thing you could do in that direction is to run
the postmaster with a "ulimit -v" size less than what will trigger
the ire of the OOM killer, so that the query encounters a normal
ENOMEM error rather than SIGKILL when it's eaten too much memory.
That should result in it dumping a memory consumption map to stderr,
which would give some clue where the problem is.  We'd need to see
that map as well as details about your query to make progress.

            regards, tom lane


pgsql-general by date:

Previous
From: Christoph Moench-Tegeder
Date:
Subject: Re: [GENERAL] Out of memory/corrupted shared memory problem on server
Next
From: Harry Ambrose
Date:
Subject: Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100