pg_dump crashing - Mailing list pgsql-general

From Matthias Schmitt
Subject pg_dump crashing
Date
Msg-id C2614EC6-38A6-41AB-8208-950319333E84@mmp.lu
Whole thread Raw
In response to Re: Q: extract database name from directory dump  (John R Pierce <pierce@hogranch.com>)
Responses Re: pg_dump crashing
Re: pg_dump crashing
Re: pg_dump crashing
List pgsql-general
Hello,

since two weeks I am trying to get PostgreSQL 9.5.1 to run on Debian 8.3. Everything is fine except the daily backup.
Whencalling pg_dump as part of a cron job pg_dump crashes: 

2016-03-15 01:00:02 CETFATAL:  semctl(23232524, 3, SETVAL, 0) failed: Invalid argument
2016-03-15 01:00:02 CETLOG:  server process (PID 22279) exited with exit code 1
2016-03-15 01:00:02 CETLOG:  terminating any other active server processes
2016-03-15 01:00:02 CETWARNING:  terminating connection because of crash of another server process
2016-03-15 01:00:02 CETDETAIL:  The postmaster has commanded this server process to roll back the current transaction
andexit, because another server process exited abnormally and possibly corrupted shared memory. 
2016-03-15 01:00:02 CETHINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-03-15 01:00:02 CETLOG:  all server processes terminated; reinitializing
2016-03-15 01:00:02 CETLOG:  could not remove shared memory segment "/PostgreSQL.1804289383": No such file or directory
2016-03-15 01:00:02 CETLOG:  semctl(22839296, 0, IPC_RMID, ...) failed: Invalid argument
2016-03-15 01:00:02 CETLOG:  semctl(22872065, 0, IPC_RMID, ...) failed: Invalid argument
2016-03-15 01:00:02 CETLOG:  semctl(22904834, 0, IPC_RMID, ...) failed: Invalid argument
…

I am calling pg_dump in my cronjob like this:
su - mmppostgres -c "/Users/…/bin/pg_dump -p 5433 mydatabase_1_0_0 > /my_backup_path/mydatabase_1_0_0.dump"

After the crash the database runs in recovery mode. A restart of the database brings everything back to normal.
This crash is always reproducible and occurs every night during backup. When calling the same command via the command
lineeverything run fine. In the system log I can see: 

Mar 15 01:00:01 mymachine systemd[1]: Started User Manager for UID 1001.
Mar 15 01:00:02 mymachine systemd[1]: Stopping User Manager for UID 1001...
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Default.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Default.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Basic System.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Basic System.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Paths.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Paths.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Timers.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Timers.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Sockets.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Sockets.
Mar 15 01:00:02 mymachine systemd[22241]: Starting Shutdown.
Mar 15 01:00:02 mymachine systemd[22241]: Reached target Shutdown.
Mar 15 01:00:02 mymachine systemd[22241]: Starting Exit the Session...
Mar 15 01:00:02 mymachine systemd[22241]: Received SIGRTMIN+24 from PID 22249 (kill).

While searching through mailing lists I tried to exclude the following possibilities, which might cause the error:

1st possible solution: Another PostgreSQL instance running … no.
Some people get the ‘semctl’ error when running a second PostgreSQL process using the same semaphores. I can definitely
excludethat. I have no other PostgreSQL instance running on the machine. 

2nd possible solution: Checking with 'ipcs -s’ for another process deleting semaphores.
Before a crash my output looks like this:

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x0052e6a9 20742144   mmppostgre 600        17
0x0052e6aa 20774913   mmppostgre 600        17
0x0052e6ab 20807682   mmppostgre 600        17
0x0052e6ac 20840451   mmppostgre 600        17
0x0052e6ad 20873220   mmppostgre 600        17
0x0052e6ae 20905989   mmppostgre 600        17
0x0052e6af 20938758   mmppostgre 600        17
0x0052e6b0 20971527   mmppostgre 600        17
0x0052e6b1 21004296   mmppostgre 600        17
0x0052e6b2 21037065   mmppostgre 600        17
0x0052e6b3 21069834   mmppostgre 600        17
0x0052e6b4 21102603   mmppostgre 600        17
0x0052e6b5 21135372   mmppostgre 600        17
0x0052e6b6 21168141   mmppostgre 600        17
0x00000000 21266446   www-data   600        1
0x00000000 21299215   www-data   600        1

After the crash the output looks like that:
------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x0052e6a9 22380544   mmppostgre 600        17
0x0052e6aa 22413313   mmppostgre 600        17
0x0052e6ab 22446082   mmppostgre 600        17
0x0052e6ac 22478851   mmppostgre 600        17
0x0052e6ad 22511620   mmppostgre 600        17
0x0052e6ae 22544389   mmppostgre 600        17
0x0052e6af 22577158   mmppostgre 600        17
0x0052e6b0 22609927   mmppostgre 600        17
0x0052e6b1 22642696   mmppostgre 600        17
0x0052e6b2 22675465   mmppostgre 600        17
0x0052e6b3 22708234   mmppostgre 600        17
0x0052e6b4 22741003   mmppostgre 600        17
0x0052e6b5 22773772   mmppostgre 600        17
0x0052e6b6 22806541   mmppostgre 600        17
0x00000000 21856270   www-data   600        1
0x00000000 21889039   www-data   600        1

As all server processes have been cancelled it seems to me normal, that the semids have been changed. Beyond that I can
notsee anything which might be useful for me. 

3rd possible solution: Strange values for shared memory:
My Debian 8.3 default settings for shared memory have been:

kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096

These values have been looking strange to me, so I changed them to some more realistic values:
kernel.shmall = 4194304
kernel.shmmax = 17179869184
kernel.shmmni = 4096

4th  possible solution: I have read here: https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html
that logging into the postgres user in and out might delete semaphore memory, but I cannot reproduce this.

Interesting enough I have another PostgreSQL version running perfectly under Debian 8.3 on a different machine. This is
aPostgreSQL 9.4.1. It uses the same backup mechanism. I hate to consider a downgrade as a possible solution. May be an
info,which might be useful: my machine has 64 GB RAM. 

Any ideas where to continue my search?

Best regards

Matthias Schmitt

magic moving pixel s.a.
23, Avenue Grande-Duchesse Charlotte
L-3441 Dudelange
Luxembourg
Phone: +352 54 75 75
http://www.mmp.lu






pgsql-general by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: psql question: aborting a "script"
Next
From: John Lumby
Date:
Subject: Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing