[bug fix] "pg_ctl stop" times out when it should respond quickly - Mailing list pgsql-hackers

From MauMau
Subject [bug fix] "pg_ctl stop" times out when it should respond quickly
Date
Msg-id DF2AB03E91D547319F29A21458EA868E@maumau
Whole thread Raw
Responses Re: [bug fix] "pg_ctl stop" times out when it should respond quickly
List pgsql-hackers
Hello,

I've encountered a small bug and fixed it.  I guess this occurs on all major
releases.  I saw this happen on 9.2 and 9.4devel.  Please find attached the
patch and commit this.


[Problem]
If I mistakenly set an invalid value to listen_addresses, say '-1', and
start the database server, it fails to start as follows.  In my environment
(RHEL6 for Intel64), it takes about 15 seconds before postgres prints the
messages.  This is OK.

[maumau@myhost pgdata]$ pg_ctl -w start
waiting for server to start........................LOG:  could not translate
host name "-1", service "5450" to address: Temporary failure in name
resolution
WARNING:  could not create listen socket for "-1"
FATAL:  could not create any TCP/IP sockets
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[maumau@myhost pgdata]$

When I start the server without -w and try to stop it, "pg_ctl stop" waits
for 60 seconds and timed out before it fails.  This is what I'm seeing as a
problem.  I expected "pg_ctl stop" to respond quickly with success or
failure depending on the timing.

[maumau@myhost pgdata]$ pg_ctl start
server starting
...(a few seconds later)
[maumau@myhost ~]$ pg_ctl stop
waiting for server to shut
down.................................................
.............. failed
pg_ctl: server does not shut down
HINT: The "-m fast" option immediately disconnects sessions rather than
waiting for session-initiated disconnection.
[maumau@myhost ~]$


[Cause]
The problem occurs in the sequence below:

1. postmaster creates $PGDATA/postmaster.pid.
2. postmaster tries to resolve the value of listen_addresses to IP
addresses.  This took about 15 seconds in my failure scenario.
3. During 2, pg_ctl sends SIGTERM to postmaster.
4. postmaster terminates immediately without deleting
$PGDATA/postmaster.pid.  This is because it hasn't set signal handlers yet.
5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears.
But the file does not disappear and it times out.


[Fix]
Make pg_ctl check if postmaster is still alive, because postmaster might
have crashed unexpectedly.


Regards
MauMau

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Skip hole in log_newpage
Next
From: Robert Haas
Date:
Subject: Re: Skip hole in log_newpage