scripted 'pg_ctl start' hangs and never finishes, goes - Mailing list pgsql-general

From Brian Fehrle
Subject scripted 'pg_ctl start' hangs and never finishes, goes
Date
Msg-id 4F739159.4020808@consistentstate.com
Whole thread Raw
Responses Re: scripted 'pg_ctl start' hangs and never finishes, goes
List pgsql-general
Hi all,

OS: Linux 64bit
PostgreSQL Version: 9.0.5 installed from source.

I'm writing up a process that will bring down a warm standby cluster, tarball the data directory, then bring the warm standby back up. I'm having an issue where starting the database with pg_ctl results in the command never exiting. The warmstandby does come back online and starts recovering WAL files (evident in the log), however the command just does not exit. When I ctl -c from the script, the database receives a "fast shutdown".

Basic script logic:
pg_ctl -D /path/to/datadir stop -m fast

cd /path/to/datadir/
tar -czvf /backups/mydatabase.tar.gz  *

pg_ctl -D /path/to/datadir start



Originally, I was performing the 'pg_ctl start' over ssh from another box, but I ran into this issue and just assumed it had something to do with doing it over ssh. Now I'm doing it on the actual database box from a perl script and I've started having the same issue.

I'm testing this on a very small database, 2 megs in size. When I execute each event manually, it works just fine.

Actual perl code:
my $output = qx(/bin/pg_ctl -D $dataDir start 2>&1);

The last thing, while the command is 'hung', I search for a running pg_ctl process and come back with:
[postgres@gridpoint_4 bin]$ ps aux | grep pg_ctl
postgres   601  0.0  0.0      0     0 pts/3    Z+   15:26   0:00 [pg_ctl] <defunct>
postgres   619  0.0  0.0  61180   748 pts/2    S+   15:26   0:00 grep pg_ctl



Below is the log from the warmstandby as the actions take place.

LOG:  received fast shutdown request
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down in recovery at 2012-03-28 15:02:43 PDT
LOG:  starting archive recovery
LOG:  restored log file "000000010000000000000057" from archive
LOG:  redo starts at 0/57000240
LOG:  consistent recovery state reached at 0/58000000

<Here is where, after 5 minutes of waiting, I ctl-C the process>

LOG:  received fast shutdown request
LOG:  shutting down
LOG:  database system is shut down



Any thoughts on what could be the issues? This has happened on the same environment whether I'm doing it from within perl on the actual cluster, or over an ssh command such as  ssh user@standby "pg_ctl -D /path/to/data/ start". What's in common is that the pg_ctl becomes a child process of something other than my own shell, could that be the issue?

Thanks in advance,

- Brian F



pgsql-general by date:

Previous
From: Naoko Reeves
Date:
Subject: could not read block... how could I identify/fix
Next
From: Brian Fehrle
Date:
Subject: Re: scripted 'pg_ctl start' hangs and never finishes, goes