Thread: PostgreSQL 17.5 - could not map dynamic shared memory segment

PostgreSQL 17.5 - could not map dynamic shared memory segment

From

Aleš Zelený

Date:

22 June, 00:09:49

Hello,

After upgrading from (old good no longer supported Pg11) to PostgreSQL 17.5 via pg_dump & pg_restore vacuum started reporting errors:

ERROR: could not map dynamic shared memory segment

Vacuumdb was invoked:

/usr/lib/postgresql/17/bin/vacuumdb -p 5433 -Fvaz -j 12 -v

The vacuum finished exit code was 0, but in PostgreSQL instance log files, the errors above were logged.

Example error messages from the PostgreSQL log file:

2025-06-21 16:29:56.095 UTC [306130] ERROR: could not map dynamic shared memory segment
2025-06-21 16:29:56.095 UTC [306131] ERROR: could not map dynamic shared memory segment
2025-06-21 16:29:56.097 UTC [135978] LOG: background worker "parallel worker" (PID 306130) exited with exit code 1
2025-06-21 16:29:56.097 UTC [135978] LOG: background worker "parallel worker" (PID 306131) exited with exit code 1
2025-06-21 16:30:23.677 UTC [300930] postgres@fin LOG: could not send data to client: Broken pipe
2025-06-21 16:30:23.677 UTC [300930] postgres@fin CONTEXT: while scanning relation "core.cusip"
2025-06-21 16:30:23.677 UTC [300930] postgres@fin STATEMENT: VACUUM (SKIP_DATABASE_STATS, FREEZE, VERBOSE) core.cusip;
2025-06-21 16:30:23.677 UTC [300930] postgres@fin FATAL: connection to client lost
2025-06-21 16:30:23.677 UTC [300930] postgres@fin CONTEXT: while scanning relation "core.cusip"
2025-06-21 16:30:23.677 UTC [300930] postgres@fin STATEMENT: VACUUM (SKIP_DATABASE_STATS, FREEZE, VERBOSE) core.cusip;
2025-06-21 16:30:23.678 UTC [306190] ERROR: could not map dynamic shared memory segment
2025-06-21 16:30:23.680 UTC [135978] LOG: background worker "parallel worker" (PID 306190) exited with exit code 1

PostgreSQL version:

psql -d postgres -c "select version();"
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.5 (Ubuntu 17.5-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 row)

Configuration parameters:

listen_addresses = '*' # what IP address(es) to listen on;
port = 5432 # (change requires restart)
max_connections = 400 # (change requires restart)
unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories
tcp_keepalives_idle = 60 # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 5 # TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 3 # TCP_KEEPCNT;
tcp_user_timeout = 75s # TCP_USER_TIMEOUT, in milliseconds;
client_connection_check_interval = 75s # time between checks for client
shared_buffers = 22GB # min 128kB
huge_pages = on # on, off, or try
temp_buffers = 32MB # min 800kB
work_mem = 64MB # min 64kB
maintenance_work_mem = 2GB # min 64kB
dynamic_shared_memory_type = posix # the default is usually the first option
bgwriter_delay = 10ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # max buffers written/round, 0 disables
bgwriter_lru_multiplier = 10.0 # 0-10.0 multiplier on buffers scanned/round
max_worker_processes = 24 # (change requires restart)
wal_level = logical # minimal, replica, or logical
wal_log_hints = on # also do full page writes of non-critical updates
checkpoint_timeout = 30min # range 30s-1d
checkpoint_completion_target = 0.8 # checkpoint target duration, 0.0 - 1.0
max_wal_size = 120GB
min_wal_size = 10GB
archive_timeout = 1800 # force a WAL file switch after this
max_wal_senders = 12 # max number of walsender processes
max_replication_slots = 12 # max number of replication slots
wal_sender_timeout = 600s # in milliseconds; 0 disables
track_commit_timestamp = on # collect timestamp of transaction commit
wal_receiver_timeout = 600s # time that receiver waits for
jit = off # allow JIT compilation
log_min_duration_statement = 1000 # -1 is disabled, 0 logs all statements
log_autovacuum_min_duration = 0 # log autovacuum activity;
log_checkpoints = on
track_activity_query_size = 32768 # (change requires restart)
track_io_timing = on
track_functions = all # none, pl, all
idle_in_transaction_session_timeout = 1200000 # in milliseconds, 0 is disabled
shared_preload_libraries = 'pg_stat_statements,pg_stat_kcache,pg_qualstats,pg_partman_bgw,pg_cron,pg_prewarm' # (change requires restart)
max_locks_per_transaction = 128 # min 10

OS: Ubuntu 22.04.5 LTS, 16vCPUs, ~123 GB RAM

PostgreSQL apt package: postgresql-17/jammy-pgdg,now 17.5-1.pgdg22.04+1 amd64 [installed]

Memory:

grep -e Mem -e Huge /proc/meminfo
MemTotal: 129804320 kB
MemFree: 2127576 kB
MemAvailable: 100695068 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 12672
HugePages_Free: 10908
HugePages_Rsvd: 9818
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 25952256 kB

It also happened during the application load test, but I've accidentally copied a wrong log file, and now the server is decommissioned, so I have no details on application queries reporting the exact same message, but it seemed to be related to parallel queries as well if I remember it properly.

The application benefits from parallel queries, so despite the first temptation to disable parallel queries (based on log entries correlation only, but is that the root cause?) I did not want to disable parallel queries, if there is another workaround/solution/fix available.

Thanks for any hints on how to provide more information if needed, as well as for fix/workaround advice.

Kind regards Ales Zeleny

Re: PostgreSQL 17.5 - could not map dynamic shared memory segment

From

Aleš Zelený

Date:

23 June, 12:32:40

Hi,

Thanks for the good point:

$ sysctl vm.overcommit_memory
vm.overcommit_memory = 0

That is a difference, the old pg11 running on Ubuntu 18.4 had disabled overcommit (vm.overcommit_memory = 2).

Anyway, on a dedicated DB server box with 123GB RAM running only vacuum (14 parallel processes (2GB maintenance workmen)) and shared buffers 22GB seems to me unlikely to hit available memory.

During Sunday (low load) and Monday so far, it has not reoccurred.

Kind regards Ales Zeleny

ne 22. 6. 2025 v 0:44 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

On 6/21/25 23:09, Aleš Zelený wrote:
> Hello,
> ...
>
> The application benefits from parallel queries, so despite the first
> temptation to disable parallel queries (based on log entries correlation
> only, but is that the root cause?) I did not want to disable parallel
> queries, if there is another workaround/solution/fix available.
>
> Thanks for any hints on how to provide more information if needed, as
> well as for fix/workaround advice.
>

Could it be that you simply ran out of memory, or perhaps hit the
overcommit? What does sysctl say?

sysctl vm.overcommit_memory

And what's CommitLimit/Committed_AS in /proc/meminfo? IIRC the shmem is
counted against the limit, and if the system does not have significant
swap, it's not uncommon to hit that (esp. with overcommit_memory=2).

regards

--
Tomas Vondra