[GENERAL] Out of memory/corrupted shared memory problem on server - Mailing list pgsql-general

From Johann Spies
Subject [GENERAL] Out of memory/corrupted shared memory problem on server
Date
Msg-id CAGZ55DR67Ejvef8GkLjFp8vqzfnhngfm2jzoerZ3WbCjxoLeWw@mail.gmail.com
Whole thread Raw
Responses Re: [GENERAL] Out of memory/corrupted shared memory problem on server  (Christoph Moench-Tegeder <cmt@burggraben.net>)
Re: [GENERAL] Out of memory/corrupted shared memory problem on server  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
While restoring a dump from our development server (768G ram) to the
production server, (PG 9.6.3 on Debian Stretch with 128G Ram) the
refreshing of a Materialized View fails like this:

local] js@wos=# REFRESH MATERIALIZED VIEW wos_2017_1.citation_window_mv ;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 11556111.977 ms

In the log:
------------------
2017-08-24 19:23:26 SAST [7532-18] LOG:  server process (PID 4890) was
terminated by signal 9: Killed
2017-08-24 19:23:26 SAST [7532-19] DETAIL:  Failed process was
running: REFRESH MATERIALIZED VIEW wos_2017_1.citation_window_mv ;
2017-08-24 19:23:26 SAST [7532-20] LOG:  terminating any other active
server processes
2017-08-24 19:23:26 SAST [16376-1] crest@data_portal WARNING:
terminating connection because of crash of another server process
2017-08-24 19:23:26 SAST [16376-2] crest@data_portal DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because anothe\
r server process exited abnormally and possibly corrupted shared memory.
2017-08-24 19:23:26 SAST [16376-3] crest@data_portal HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
--------------------

This show a cross-database  problem.  The Mateiralized Vew is in
database wos while the other related problem seems to be in database
data_portal.  We could not determine what caused the problem in
database_portal.  Or was it caused by the out-of-memory problem in the
wos-process?

The Materialized View  uses a complex query and  should contain 69 772
381 records.

Monitoring the memory usage while running the refresh materialized
view command show  a steady increase by the process until reaches 100%
and breaks.

The server has 128G Ram with the following changes to the default
setup (and you can see how we tried to solve the problem by opting for
lower thresholds in many cases):

# http://edoceo.com/howto/postgresql-performance
# https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

# pgtune wiz ard 21.01.2016:
max_connections = 80
#shared_buffers = 32GB
shared_buffers = 14GB
#effective_cache_size = 96GB
effective_cache_size = 20GB
#work_mem = 4GB
work_mem = 2GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
temp_buffers = 1GB
fsync = true
max_worker_processes = 24        # (change requires restart)
max_parallel_workers_per_gather = 4    # taken from max_worker_processes
checkpoint_flush_after = 256kB
idle_in_transaction_session_timeout = 3600000

# Other:

# max_wal_size = (3 * checkpoint_segments) *16MB
# http://www.postgresql.org/docs/9.5/static/release-9-5.html
max_wal_size = 3GB # Replace checkpoint_segments
huge_pages = try



# - Archiving -
wal_level = archive
wal_sync_method = fdatasync
full_page_writes = on           # recover from partial page writes
wal_buffers = -1

#archive_mode = on      # allows archiving to be done
archive_mode = off      # allows archiving to be done

And in /etc/sysctl.conf:


# http://padmavyuha.blogspot.co.za/2010/12/configuring-shmmax-and-shmall-for.html

# (for 60GB)
kernel.shmall = 15728640
kernel.shmmax = 64424509440
# run "sudo sysctl -p" after editing

We are stuck at the moment and do not know how to proceed from here.
Help will be appreciated.

Regards
Johann

--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)


pgsql-general by date:

Previous
From: Igor Korot
Date:
Subject: Re: [GENERAL] Retrieving query results
Next
From: Christoph Moench-Tegeder
Date:
Subject: Re: [GENERAL] Out of memory/corrupted shared memory problem on server