Thread: How to implement autostart of postgres?

How to implement autostart of postgres?

From
"Gavrina, Irina"
Date:

Thank you all for your answers.

 

I’m using the Red Hat 5 OS. So I have no the ‘upstart’ utility at my disposal. Unfortunately.

 

Yes, you are right, this is a question of database robustness. You know, If any of postgres processes died the postmaster process will again start up it. The main worry is if the postmaster process dies (truth be told, I can reproduce it only using ‘kill –KILL’, which should be never done, I know) and all other processes are up and running. All created connections will go on with their work. And it means all db queries will be processed after the postmaster died. Am I correct?

Is there any risk of losing of data or data corruption in this case?

There is any way to configure the postgres db like ‘if the postmaster process died all child processes are stopped’?

Undoubtedly the case of postmaster process dies should be investigated to discover the root cause.

 

Best regards,

Irina

Re: How to implement autostart of postgres?

From
Merlin Moncure
Date:
On Thu, Jul 28, 2011 at 5:06 AM, Gavrina, Irina <igavrina@mera.ru> wrote:
> Thank you all for your answers.
>
> I’m using the Red Hat 5 OS. So I have no the ‘upstart’ utility at my
> disposal. Unfortunately.
>
>
>
> Yes, you are right, this is a question of database robustness. You know, If
> any of postgres processes died the postmaster process will again start up
> it. The main worry is if the postmaster process dies (truth be told, I can
> reproduce it only using ‘kill –KILL’, which should be never done, I know)
> and all other processes are up and running. All created connections will go
> on with their work. And it means all db queries will be processed after the
> postmaster died. Am I correct?

It's fairly trivial on any unix platform to implement a cron script
that runs say, once a minute and checks for running postmater.  There
are a large number of ways to do that, but grepping ps output for
postmater process is usually a good way to go.  From there, you can
attempt starting the service the standard way.

> Is there any risk of losing of data or data corruption in this case?
>
> There is any way to configure the postgres db like ‘if the postmaster
> process died all child processes are stopped’?
>
> Undoubtedly the case of postmaster process dies should be investigated to
> discover the root cause.

exactly.  postgres is configured to attempt to restart itself whenever
possible in the event of the crash.  The two most likely scenarios I
can think of where postgres suddenly halted are 1. postgres PANIC and
2. being killed from external source, like the infamous OOM killer or
a human, etc.  In both cases I think it's preferable to not attempt to
blindly restart the db.

typically if you want postgres to be highly available i'd strongly
encourage you to set up a warm or hot standby and use external
monitoring to do a swichover based on your criteria.

merlin

Re: How to implement autostart of postgres?

From
Craig Ringer
Date:
On 28/07/11 18:06, Gavrina, Irina wrote:

> Yes, you are right, this is a question of database robustness.

Honestly, this is the wrong approach to high availability. Don't try to
make one machine unbreakable - you *cannot* do it. Use multi-machine
failover between a master and a replica with STONITH and a
heartbeat/monitoring setup. See Google for more info.

> You know,
> If any of postgres processes died the postmaster process will again
> start up it. The main worry is if the postmaster process dies (truth be
> told, I can reproduce it only using ‘kill –KILL’, which should be never
> done, I know) and all other processes are up and running. All created
> connections will go on with their work. And it means all db queries will
> be processed after the postmaster died. Am I correct?
>
> Is there any risk of losing of data or data corruption in this case?

Possibly, if you then remove postmaster.pid, thus bypassing the other
safety, and start a new postmaster.



I'm a bit alarmed to discover that Ubuntu's postgresql packaging appears
to do exactly this.

/usr/bin/pg_ctlcluster, from line 78:

# If a pid file is already present, delete it if it is stale/invalid,
# or exit with a notice if it belongs to an already running postmaster.
sub start_check_pid_file {
    my $pidfile = $info{'pgdata'}.'/postmaster.pid';
    if (check_running_postmaster $pidfile) {
        print "Cluster is already running.\n";
        exit 2;
    }

    # Remove invalid or stale PID file
    if (-e $pidfile) {
        unlink $pidfile;
        print "Removed stale pid file.\n";
    }
}


It merrily removes the pid file without any effort to check if there are
backends using that datadir, then goes on to relaunch the postmaster.

Is this sane? Safe?

I realize that they want to have Pg come back up properly after an OS
crash or hard power loss, where it doesn't get the chance to remove the
pid file. Fine. It also opens the system up to user "kill -9" abuse,
though...

> There is any way to configure the postgres db like ‘if the postmaster
> process died all child processes are stopped’?

The whole issue with kill -9 is that you're telling the system to bypass
*ALL* of the usual safeties and recover opportunities, including any
crash handler the program may have in place. You're denying the
postmaster any opportunity to warn the child processes that it is
crashing out.

"If I shoot my foot off with this shotgun, is there any way I can
automatically stop the bleeding and graft a new foot on?"

No. Don't shoot your foot off just because UNIX systems give you the
ability to do just that if you feel like it. You don't "dd if=/dev/zero
of=/dev/sdzz1" either... (target node changed in case any idiot tries to
run it anyway).

That said - a quick test shows that the postmaster dies without
signalling its children when killed with a SIGABRT (often raised by
glibc memory function errors, assertions, etc) too. A query sitting in a
big cartesian join merrily continues processing. Whether that applies to
queries that're doing write activity, I'm not sure.


Anyway, something like this only-very-briefly-tested bash code should
give you vaguely the right idea, though I *STRONGLY* recommend you use
external monitoring like nagios or the like, plus a proper failover
setup, instead of a hackish self-restart facility that's likely to make
things WORSE not better.


function capture_cash_info() {
    # Capture `dmesg' output, the last 1000 lines of the
    # postmaster logs, etc etc. You'll have to write this.
    # The function should email a report to the admin for urgent
    # examination.
    echo "Capture crash report"
}

function scream_for_help() {
    # Something unrecoverable happened. Email the admin, TXT them,
    # activate their house alarm, and electrocute them awake, do
    # whatever you usually have to do.
    # You'll have to write this.
    echo 'Help me! ARgh!'
}

function try_to_recover() {
  # If you want to know it, $1 will contain "stopped" if pidfile
  # missing, or "died" if pidfile present but stale. Can be used in
  # admin email reports etc.
  if test $(pgrep postgres|wc -l) -gt 0 ; then
     echo 1>&2 "Postmaster dead but children still running!";
     # Email/TXT the admin screaming for help and saying some idiot
     # "kill -9"'d the postmaster
     # If you felt brave you could SIGTERM the children here then
     # next time the watchdog ran might be able to relaunch the
     # postmaster if they exited. I wouldn't want to, though.
     scream_for_help
  else
    capture_cash_info
    # relaunch pg using your distro's init script or using pg_ctl,
    # depending on how you usually do it
    false || scream_for_help
  fi
}

DATADIR=/var/lib/postgresql/8.4/main/
PID=$DATADIR/postmaster.pid
if ! test -e "$PID"
then
  try_to_recover stopped
elif ! kill -0 `cat "$PID"` >&/dev/null; then
  try_to_recover died
fi


If you use the above and it eats your data then eats your cat, don't
blame me. I told you not to use it.

--
Craig Ringer

Re: How to implement autostart of postgres?

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> On 28/07/11 18:06, Gavrina, Irina wrote:
>> Is there any risk of losing of data or data corruption in this case?

> Possibly, if you then remove postmaster.pid, thus bypassing the other
> safety, and start a new postmaster.

> I'm a bit alarmed to discover that Ubuntu's postgresql packaging appears
> to do exactly this.
> ...
> It merrily removes the pid file without any effort to check if there are
> backends using that datadir, then goes on to relaunch the postmaster.

> Is this sane? Safe?

No, and no.  The postmaster is capable of removing the old pidfile if it
really is stale, and any scripted substitute for that is highly likely
to be unsafe.  You do have to be a bit careful about how you code the
script though, to ensure that the postmaster won't see false positives.
Martin, is that code your responsibility?

            regards, tom lane