Thread: PostgreSQL lock file

PostgreSQL lock file

From
Олег Самойлов
Date:
Hi all.

I have a task to create HA PostgreSQL cross datacenter cluster. I created a test bed on VirtualBox and I simulate
differentfailures and disastrous in loops. And got one of funny bug of the Postgresql (11.4). One of the clusters has
twoinstances of PostgreSQL on port 5433 and 5434. After simulating unexpected power off of the node the lock files of
PostgreSQLstill exist in /tmp directory. And on recovery the first instance on port 5433 starts without problem, but
thesecond instance sometimes not. The reason is that the PID in /tmp/.s.PGSQL.5434.lock sometimes point to the one of
processof PostgereSQL on port 5433, for instance the logger. 

 1408 ?        S      0:00 /usr/pgsql-11/bin/postgres -D /var/lib/pgsql/krogan0a
 1415 ?        Ss     0:00  \_ postgres: logger
 1416 ?        Ss     0:02  \_ postgres: startup   recovering 000000030000000000000007
 1417 ?        Ss     0:00  \_ postgres: checkpointer
 1418 ?        Ss     0:00  \_ postgres: background writer
 1419 ?        Ss     0:00  \_ postgres: stats collector
 1420 ?        Ss     0:10  \_ postgres: walreceiver   streaming 0/7269290

# cat /tmp/.s.PGSQL.5434.lock
1415
/var/lib/pgsql/krogan0b
1565355860
5434
/tmp

# sudo -u postgres /usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/krogan0b start
waiting for server to start....2019-08-09 16:52:25.685 MSK [6673] LOG:  listening on IPv4 address "0.0.0.0", port 5434
2019-08-09 16:52:25.685 MSK [6673] LOG:  listening on IPv6 address "::", port 5434
2019-08-09 16:52:25.686 MSK [6673] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5434"
2019-08-09 16:52:25.686 MSK [6673] FATAL:  lock file "/tmp/.s.PGSQL.5434.lock" already exists
2019-08-09 16:52:25.686 MSK [6673] HINT:  Is another postmaster (PID 1415) using socket file "/tmp/.s.PGSQL.5434"?
2019-08-09 16:52:25.686 MSK [6673] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

So it will be good if postgres will check not only PID, but also that the different postgres is running on different or
sameport. And is there some workaround? 


Re: PostgreSQL lock file

From
Tom Lane
Date:
=?utf-8?B?0J7Qu9C10LMg0KHQsNC80L7QudC70L7Qsg==?= <splarv@ya.ru> writes:
> I have a task to create HA PostgreSQL cross datacenter cluster. I created a test bed on VirtualBox and I simulate
differentfailures and disastrous in loops. And got one of funny bug of the Postgresql (11.4). One of the clusters has
twoinstances of PostgreSQL on port 5433 and 5434. After simulating unexpected power off of the node the lock files of
PostgreSQLstill exist in /tmp directory. And on recovery the first instance on port 5433 starts without problem, but
thesecond instance sometimes not. The reason is that the PID in /tmp/.s.PGSQL.5434.lock sometimes point to the one of
processof PostgereSQL on port 5433, for instance the logger. 

Best practice for this sort of thing is considered to be to run the two
Postgres instances under different user IDs.  That avoids them thinking
that the other one's lock files are relevant, and it provides an extra
layer of security against possible conflicts between the two instances.

            regards, tom lane



Re: PostgreSQL lock file

From
Олег Самойлов
Date:
> Best practice for this sort of thing is considered to be to run the two
> Postgres instances under different user IDs.  That avoids them thinking
> that the other one's lock files are relevant, and it provides an extra
> layer of security against possible conflicts between the two instances.
>
>             regards, tom lane

Thanks, an interesting  notice. I already did changes in start script to check instance by pg_ctl status and delete
lockfile if status<>0. But your recommendation is looked better.