Hi all.
I have a task to create HA PostgreSQL cross datacenter cluster. I created a test bed on VirtualBox and I simulate
differentfailures and disastrous in loops. And got one of funny bug of the Postgresql (11.4). One of the clusters has
twoinstances of PostgreSQL on port 5433 and 5434. After simulating unexpected power off of the node the lock files of
PostgreSQLstill exist in /tmp directory. And on recovery the first instance on port 5433 starts without problem, but
thesecond instance sometimes not. The reason is that the PID in /tmp/.s.PGSQL.5434.lock sometimes point to the one of
processof PostgereSQL on port 5433, for instance the logger.
1408 ? S 0:00 /usr/pgsql-11/bin/postgres -D /var/lib/pgsql/krogan0a
1415 ? Ss 0:00 \_ postgres: logger
1416 ? Ss 0:02 \_ postgres: startup recovering 000000030000000000000007
1417 ? Ss 0:00 \_ postgres: checkpointer
1418 ? Ss 0:00 \_ postgres: background writer
1419 ? Ss 0:00 \_ postgres: stats collector
1420 ? Ss 0:10 \_ postgres: walreceiver streaming 0/7269290
# cat /tmp/.s.PGSQL.5434.lock
1415
/var/lib/pgsql/krogan0b
1565355860
5434
/tmp
# sudo -u postgres /usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/krogan0b start
waiting for server to start....2019-08-09 16:52:25.685 MSK [6673] LOG: listening on IPv4 address "0.0.0.0", port 5434
2019-08-09 16:52:25.685 MSK [6673] LOG: listening on IPv6 address "::", port 5434
2019-08-09 16:52:25.686 MSK [6673] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5434"
2019-08-09 16:52:25.686 MSK [6673] FATAL: lock file "/tmp/.s.PGSQL.5434.lock" already exists
2019-08-09 16:52:25.686 MSK [6673] HINT: Is another postmaster (PID 1415) using socket file "/tmp/.s.PGSQL.5434"?
2019-08-09 16:52:25.686 MSK [6673] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
So it will be good if postgres will check not only PID, but also that the different postgres is running on different or
sameport. And is there some workaround?