Thread: Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?

Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?

From
"David Guyot (English)"
Date:
Hello, world!

I'm trying to figure out how to use fail-over with 2 PgSQL 9.1 servers;
I already succeeded in exchanging roles (ie. making standby master and
vice versa), but only when both master and slave are up and running. I
tried to simulate a master crash by killing its virtual machine, but,
when I try to promote the standby server, even if
/usr/lib/postgresql/9.1/bin/pg_ctl promote -D
/var/lib/postgresql/9.1/main/ answers that promote is in progress, it
looks like the standby tries to connect to the no-longer-running master
to get some WAL files (despite the streaming replication; I noticed it
when I saw PgSQL launching SCP attempts on the crashed master), and, as
the master no longer runs, fails and stop. Here is the corresponding
(CSV) log output :

2013-04-02 15:13:56.983
CEST,,,3523,"[local]",515ad994.dc3,1,"",2013-04-02 15:13:56
CEST,,0,LOG,08P01,"incomplete startup
packet",,,,,,,,"ProcessStartupPacket, postmaster.c:1604",""
2013-04-02 15:13:56.985 CEST,,,3522,,515ad994.dc2,1,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"database system was interrupted; last known up at
2013-04-02 14:46:17 CEST",,,,,,,,"StartupXLOG, xlog.c:6098",""
2013-04-02 15:13:56.985 CEST,,,3522,,515ad994.dc2,2,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"creating missing WAL directory
""pg_xlog/archive_status""",,,,,,,,"ValidateXLOGDirectoryStructure,
xlog.c:3514",""
2013-04-02 15:13:57.589
CEST,"postgres","postgres",3529,"[local]",515ad995.dc9,1,"",2013-04-02
15:13:57 CEST,,0,FATAL,57P03,"the database system is starting
up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858",""
2013-04-02 15:13:57.842 CEST,,,3522,,515ad994.dc2,3,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG,
xlog.c:6164",""
2013-04-02 15:13:58.106
CEST,"postgres","postgres",3536,"[local]",515ad996.dd0,1,"",2013-04-02
15:13:58 CEST,,0,FATAL,57P03,"the database system is starting
up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858",""
2013-04-02 15:13:58.620
CEST,"postgres","postgres",3539,"[local]",515ad996.dd3,1,"",2013-04-02
15:13:58 CEST,,0,FATAL,57P03,"the database system is starting
up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858",""
2013-04-02 15:13:58.663 CEST,,,3522,,515ad994.dc2,4,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"restored log file ""0000000200000001000000D5"" from
archive",,,,,,,,"RestoreArchivedFile, xlog.c:3082",""
2013-04-02 15:13:59.005 CEST,,,3522,,515ad994.dc2,5,,2013-04-02 15:13:56
CEST,1/1,0,LOG,00000,"redo starts at 1/D5000020",,,,,,,,"StartupXLOG,
xlog.c:6563",""
2013-04-02 15:13:59.006 CEST,,,3522,,515ad994.dc2,6,,2013-04-02 15:13:56
CEST,1/1,0,LOG,00000,"consistent recovery state reached at
1/D6000000",,,,,,,,"CheckRecoveryConsistency, xlog.c:7023",""
2013-04-02 15:13:59.006 CEST,,,3520,,515ad992.dc0,1,,2013-04-02 15:13:54
CEST,,0,LOG,00000,"database system is ready to accept read only
connections",,,,,,,,"sigusr1_handler, postmaster.c:4268",""
2013-04-02 15:13:59.493 CEST,,,3548,,515ad997.ddc,1,,2013-04-02 15:13:59
CEST,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,"libpqrcv_connect, libpqwalreceiver.c:171",""
2013-04-02 15:15:46.940 CEST,,,3522,,515ad994.dc2,7,,2013-04-02 15:13:56
CEST,1/1,0,LOG,00000,"received promote
request",,,,,,,,"CheckForStandbyTrigger, xlog.c:10578",""
2013-04-02 15:15:46.940 CEST,,,3548,,515ad997.ddc,2,,2013-04-02 15:13:59
CEST,,0,FATAL,57P01,"terminating walreceiver process due to
administrator command",,,,,,,,"ProcessWalRcvInterrupts,
walreceiver.c:150",""
2013-04-02 15:16:08.044 CEST,,,3522,,515ad994.dc2,8,,2013-04-02 15:13:56
CEST,1/1,0,FATAL,XX000,"could not restore file
""0000000200000001000000D6"" from archive: return code
65280",,,,,,,,"RestoreArchivedFile, xlog.c:3132",""
2013-04-02 15:16:08.044 CEST,,,3520,,515ad992.dc0,2,,2013-04-02 15:13:54
CEST,,0,LOG,00000,"startup process (PID 3522) exited with exit code
1",,,,,,,,"LogChildExit, postmaster.c:2878",""
2013-04-02 15:16:08.044 CEST,,,3520,,515ad992.dc0,3,,2013-04-02 15:13:54
CEST,,0,LOG,00000,"terminating any other active server
processes",,,,,,,,"HandleChildCrash, postmaster.c:2712",""

As you can see, it tries to download an archive file from the old
master, but, obviously, it can't as the master has stopped.

Here is my recovery.conf :

standby_mode = 'on'
primary_conninfo = 'host=192.168.0.80 port=5433 user=replika
password=replika'
restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'

And my postgresql.conf (only the non-default values, same config on both
servers; note that all locales but lc_messages are in French because I'm
French and so are the servers [no froggies jokes, please ;)], but the
problem remains even if locales are all set to fr_FR.UTF-8; lc_messages
is set to en_GB.UTF-8 to allow me to send this mesage with English logs) :

data_directory = '/var/lib/postgresql/9.1/main'
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.1-main.pid'
listen_addresses = '*'
port = 5433
max_connections = 10
unix_socket_directory = '/var/run/postgresql'
ssl = true
shared_buffers = 24MB
wal_level = hot_standby
wal_buffers = 8MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/WAL/%f && cp %p
/var/lib/postgresql/WAL/%f'
max_wal_senders = 5
hot_standby = on
log_destination = 'csvlog'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-requests.log'
log_min_messages = info
log_error_verbosity = verbose
log_line_prefix = '%t '
lc_messages = 'en_GB.UTF-8'
lc_monetary = 'fr_FR.UTF-8'
lc_numeric = 'fr_FR.UTF-8'
lc_time = 'fr_FR.UTF-8'
default_text_search_config = 'pg_catalog.french'

I assume I missed something there because, as this is a replication
mechanism, in my opinion, the standby should simply be promoted after
master failure, or at least I should have the possibility to ask this to
the standby while asking not to connect to the old master and make its
own way with the local data and whose the replication stream already
sent. Or is it the standby promotion procedure in case of master failure
which misses something? I can't tell because the documentation isn't
verbose about the standby promotion after a master failure (or I wasn't
able to find it by myself). If there is a detailed procedure available
for the event of a master crash, where is it? Else, where is my error?

Thank you in advance.

Regards.

--
David Guyot
Sysadmin
Europe Camions Interactive
Moulin Collot
F-88500 Ambacourt

Tel: +33 (0)3 29 30 47 85
Fax : +33 (0)3 29 31 31 31




On Tue, Apr 2, 2013 at 10:29 AM, David Guyot (English) <david.guyot@europecamions-interactive.com> wrote:
restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'

You're telling the recovery process to get the files off the (now dead) master here...


--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web: douglasjhunley.com
G+: http://goo.gl/sajR3
Mister,

Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?

Thank you in advance.

Regards.

Le 02/04/2013 16:47, Douglas J Hunley a écrit :

On Tue, Apr 2, 2013 at 10:29 AM, David Guyot (English) <david.guyot@europecamions-interactive.com> wrote:
restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'

You're telling the recovery process to get the files off the (now dead) master here...


--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web: douglasjhunley.com
G+: http://goo.gl/sajR3

-- 
David Guyot
Administrateur système, réseau et télécommunications
Europe Camions Interactive
Moulin Collot
F-88500 Ambacourt

Tel: +33 (0)3 29 30 47 85
Fax : +33 (0)3 29 31 31 31
Attachment

On Tue, Apr 2, 2013 at 10:53 AM, David Guyot <david.guyot@europecamions-interactive.com> wrote:
Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?

If you're going to do WAL shipping, your 'archive_command' on the master needs to be copying over to the standby (rsync or whatever). Then the standby's recovery.conf would be a simple 'cp /path/to/wal-archive' type of command. 


--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web: douglasjhunley.com
G+: http://goo.gl/sajR3
On Tue, Apr 2, 2013 at 10:53 AM, David Guyot <david.guyot@europecamions-interactive.com> wrote:
Mister,

Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?

This might help (just stop before the cascade stuff) http://bartek.im/blog/2012/12/04/postgresql-92-streaming-primer.html

--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web: douglasjhunley.com
G+: http://goo.gl/sajR3

Encrypting PGBouncer to Postgres DB connections

From
Bhanu Murthy
Date:
Hi all,
 
Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresql database (version 8.4.3)? 
 
Thanks in advance!
 
Bhanu M. Gandikota
Cell: (415) 420-7740
 

Re: Encrypting PGBouncer to Postgres DB connections

From
AJ Weber
Date:
AFAIK, you have to use stunnel to do it (which is not hard to setup, but it almost makes you wonder whether you should go to the trouble of using pgbouncer at all).

I just went through this and I ended up just testing direct connections through the tunnel without pgbouncer in the middle.  It worked for the most part.  I had some unexplained issues with my app that I couldn't pinpoint so I stopped testing it for a while.  (And my app already does connection pooling, so pgbouncer wasn't going to help that much in that regard.)


On 4/10/2013 2:06 PM, Bhanu Murthy wrote:
Hi all,
 
Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresql database (version 8.4.3)? 
 
Thanks in advance!
 
Bhanu M. Gandikota
Cell: (415) 420-7740
 

Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
"ktm@rice.edu"
Date:
On Wed, Apr 10, 2013 at 11:06:32AM -0700, Bhanu Murthy wrote:
> Hi all,
>  
> Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and
Postgresqldatabase (version 8.4.3)?  
>  
> Thanks in advance!
>  
> Bhanu M. Gandikota
> Cell: (415) 420-7740

Hi Bhanu,

You will need to use your link encryption process of choice to tunnel the connections
from pgbouncer to the backend. SSH and STunnel are two that we have used successfully
in the past.

Regards,
Ken


Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
handsfree
Date:
We're looking to use streaming replication to a target via a secondary host
using stunnel.  I'd love to hear how you were able to achieve this,
ktm@rice.edu.

Effectively we're looking to have the database on our customer's site (let's
call that MachineA) replicate to our backend postgres target in the cloud
(let's call that MachineC).  However, MachineA has no direct communication
with MachineC, in fact, it should never be allowed to communicate with it.
We have another server that provides various services to the client MachineA
that is based in our home datacenter (let's call that MachineB) which we
would like to use as a 'staging' machine for the replication to the database
replication target.  Is this possible to achieve using stunnel (and
pgbouncer?) alone?

At no point can this traffic go 'in the clear', for obvious reasons ;)

Any pointers or assistance help gratefully received!  Thanks



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754606.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
Bhanu Murthy
Date:
Here is my understading of your requirement:
 
machine-A at customer site would replicate to staging machine-B which will then replicate to target machine-C in cloud - and you would want to encrypt data in motion from A to B to C.
 
I could think of 2 possible solutions:
 
1. Use Stunnel from machine-A to machine-B, and again from machine-B to machine-C.
 
 
2. Use streaming replication config features to secure traffic (encrypted data over TCP)
 
Master configuration on machine-A:
=>Update replication line in pg_hba.conf to "hostssl"
Slave configuration on machine-B:
=> primary_conninfo='host=machine-A port=5432 sslmode=require'
or
=> primary_conninfo='host=machine-A port=5432 sslmode=verify-ca'
 
You could then use cascading replication (available from postgres 9.2) from machine-B to machine-C.
 
From: handsfree <luke.hansbury@redwood.com>
To: pgsql-admin@postgresql.org
Sent: Tuesday, May 7, 2013 9:17 AM
Subject: Re: [ADMIN] [SQL] Encrypting PGBouncer to Postgres DB connections

We're looking to use streaming replication to a target via a secondary host
using stunnel.  I'd love to hear how you were able to achieve this,
ktm@rice.edu.

Effectively we're looking to have the database on our customer's site (let's
call that MachineA) replicate to our backend postgres target in the cloud
(let's call that MachineC).  However, MachineA has no direct communication
with MachineC, in fact, it should never be allowed to communicate with it.
We have another server that provides various services to the client MachineA
that is based in our home datacenter (let's call that MachineB) which we
would like to use as a 'staging' machine for the replication to the database
replication target.  Is this possible to achieve using stunnel (and
pgbouncer?) alone? 

At no point can this traffic go 'in the clear', for obvious reasons ;)

Any pointers or assistance help gratefully received!  Thanks



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754606.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
Albe Laurenz
Date:
Bhanu Murthy wrote:
> handsfree wrote:

>> We're looking to use streaming replication to a target via a secondary host
>> using stunnel.

> I could think of 2 possible solutions:

[...]

> 2. Use streaming replication config features to secure traffic (encrypted data over TCP)
>
> Master configuration on machine-A:
> =>Update replication line in pg_hba.conf to "hostssl"
>
> Slave configuration on machine-B:
> => primary_conninfo='host=machine-A port=5432 sslmode=require'
> or
> => primary_conninfo='host=machine-A port=5432 sslmode=verify-ca'
>
> You could then use cascading replication (available from postgres 9.2) from machine-B to machine-C.

That would be the best solution, but I ran into a problem with it:
http://www.postgresql.org/message-id/D960CB61B694CF459DCFB4B0128514C208A4E93C@exadv11.host.magwien.gv.at

It still works, but the replication connection is lost and restarted
whenever SSL renegotiation takes place.
I wasn't able to figure out what causes the problem.

Yours,
Laurenz Albe


Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
handsfree
Date:
Thanks for the response.  In terms of your suggestions:

1.  We already have stunnel installed on MachineA and MachineB, and it would
not be too difficult to install stunnel on MachineC either.  What I'm unsure
of is how to make MachineB a 'bridge' so the postgres target MachineC
appears as an end point to MachineA.  We just want to pass postgres rsync
through MachineB to MachineC and have commit notifications passed back to
MachineA (we're hoping to run postgres replication synchronously initially,
though if performance suffers too much we'll run async).

2. We're unable to stream directly to MachineB as it has its own postgres
database cluster which we don't want to write to.

For information, we're running Ubuntu 12.04 LTS and Postgres 9.2 on all
hosts.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754684.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
handsfree
Date:
Laurenz, thanks for that information; I will bear in mind the problems that
you encountered with SSL renegotiation.  I'm not sure that this will
necessarily be an issue using stunnel, but I will talk with my colleagues
who have more experience of stunnel and confirm whether or not it's
applicable.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754685.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: [SQL] Encrypting PGBouncer to Postgres DB connections

From
Albe Laurenz
Date:
handsfree wrote:

> Laurenz, thanks for that information; I will bear in mind the problems that
> you encountered with SSL renegotiation.  I'm not sure that this will
> necessarily be an issue using stunnel, but I will talk with my colleagues
> who have more experience of stunnel and confirm whether or not it's
> applicable.

That is definitely only applicable to streaming replication and not
to stunnel.

Yours,
Laurenz Albe