Home > mailing lists

Slave promotion failure - Mailing list pgsql-general

From	François Beausoleil
Subject	Slave promotion failure
Date	June 6, 2013 19:37:37
Msg-id	9F3B0CFA-D0AA-4783-BF55-618D85FE37C9@teksol.info Whole thread Raw
Responses	Re: Slave promotion failure
List	pgsql-general

Tree view

Hi,

I have the following recovery.conf (Ubuntu 12.04):

standby_mode = on

restore_command = '/usr/local/omnipitr/bin/omnipitr-restore -D /var/lib/postgresql/9.1/main/ --source gzip=/var/backups/seevibes/wal/dbanalytics.production/ --remove-unneeded --temp-dir /var/tmp/omnipitr -l /var/log/omnipitr/restore.log --error-pgcontroldata hang --pgcontroldata-path /usr/lib/postgresql/9.1/bin/pg_controldata "%f" "%p"'

trigger_file = '/var/lib/postgresql/9.1/main/recovery.done'

archive_cleanup_command = '/usr/local/omnipitr/bin/omnipitr-cleanup --log /var/log/omnipitr/cleanup.log --archive gzip=/var/backups/seevibes/wal/dbanalytics.production "%r"'

I can't seem to promote the slave:

$ sudo -u postgres touch /var/lib/postgresql/9.1/main/recovery.done

# log is silent

$ sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main

server promoting

# log is silent

The postgresql around the time I attempted the promotions is:

2013-06-06 16:21:51.030 UTC - @ 26434 (00000) 2013-06-06 16:20:42 UTC - LOG: restored log file "0000000400001658000000CB" from archive

2013-06-06 16:22:35.324 UTC - @ 26411 (00000) 2013-06-06 16:20:41 UTC - LOG: received SIGHUP, reloading configuration files

2013-06-06 16:22:51.457 UTC - @ 26434 (00000) 2013-06-06 16:20:42 UTC - LOG: restored log file "0000000400001658000000CC" from archive

2013-06-06 16:24:51.034 UTC - @ 26434 (00000) 2013-06-06 16:20:42 UTC - LOG: restored log file "0000000400001658000000CD" from archive

The SIGHUP occurred because recovery.conf wasn't owned by user postgres (using Puppet and configuration reloads on any change). A quick scan of the data directory reveals nothing out of the ordinary:

$ sudo ls -la /var/lib/postgresql/9.1/main/

total 108

drwx------ 13 postgres postgres 4096 Jun 6 16:27 .

drwxr-xr-x 3 700 postgres 17 May 31 15:46 ..

-rw------- 1 postgres postgres 184 May 22 12:10 backup_label.old

drwx------ 12 postgres postgres 130 Apr 18 17:33 base

drwx------ 2 postgres postgres 8192 Jun 6 16:32 global

drwx------ 2 postgres postgres 4096 May 31 20:57 pg_clog

drwx------ 4 postgres postgres 34 Jan 12 2012 pg_multixact

drwx------ 2 postgres postgres 17 Jun 6 16:20 pg_notify

drwx------ 2 postgres postgres 6 Jan 12 2012 pg_serial

drwx------ 2 postgres postgres 24 Jun 6 16:20 pg_stat_tmp

drwx------ 2 postgres postgres 17 Jun 6 08:32 pg_subtrans

drwx------ 2 postgres postgres 6 Jan 12 2012 pg_tblspc

drwx------ 2 postgres postgres 6 Jan 12 2012 pg_twophase

-rw------- 1 postgres postgres 4 Jan 12 2012 PG_VERSION

drwxr-xr-x 3 postgres postgres 45056 Jun 6 16:34 pg_xlog

-rw------- 1 postgres postgres 350 Jun 6 16:20 postmaster.opts

-rw------- 1 postgres postgres 93 Jun 6 16:20 postmaster.pid

-rw-r--r-- 1 postgres postgres 591 Jun 6 16:20 recovery.conf

lrwxrwxrwx 1 postgres postgres 36 Dec 2 2012 server.crt -> /etc/ssl/certs/ssl-cert-snakeoil.pem

lrwxrwxrwx 1 postgres postgres 38 Dec 2 2012 server.key -> /etc/ssl/private/ssl-cert-snakeoil.key

I also attempted to restart the slave, with and without recovery.done, to no avail. I must be missing something. Someone has an idea? I did read http://www.postgresql.org/docs/9.1/static/warm-standby-failover.html very carefully. I believe I did everything I was supposed to do.

Thanks,

François Beausoleil

Attachment

smime.p7s

pgsql-general by date:

From: Jeff Janes
Date: 06 June 2013, 19:33:04
Subject: Re: Streaming replication with sync slave, but disconnects due to missing WAL segments

From: Vinicio Nocciolini
Date: 06 June 2013, 19:46:02
Subject: plpgsql : looping over multidimensional array : getting NULL for subdimension

Slave promotion failure - Mailing list pgsql-general

Attachment

Previous

Next