Thread: Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?
Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?
From
"David Guyot (English)"
Date:
Hello, world! I'm trying to figure out how to use fail-over with 2 PgSQL 9.1 servers; I already succeeded in exchanging roles (ie. making standby master and vice versa), but only when both master and slave are up and running. I tried to simulate a master crash by killing its virtual machine, but, when I try to promote the standby server, even if /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main/ answers that promote is in progress, it looks like the standby tries to connect to the no-longer-running master to get some WAL files (despite the streaming replication; I noticed it when I saw PgSQL launching SCP attempts on the crashed master), and, as the master no longer runs, fails and stop. Here is the corresponding (CSV) log output : 2013-04-02 15:13:56.983 CEST,,,3523,"[local]",515ad994.dc3,1,"",2013-04-02 15:13:56 CEST,,0,LOG,08P01,"incomplete startup packet",,,,,,,,"ProcessStartupPacket, postmaster.c:1604","" 2013-04-02 15:13:56.985 CEST,,,3522,,515ad994.dc2,1,,2013-04-02 15:13:56 CEST,,0,LOG,00000,"database system was interrupted; last known up at 2013-04-02 14:46:17 CEST",,,,,,,,"StartupXLOG, xlog.c:6098","" 2013-04-02 15:13:56.985 CEST,,,3522,,515ad994.dc2,2,,2013-04-02 15:13:56 CEST,,0,LOG,00000,"creating missing WAL directory ""pg_xlog/archive_status""",,,,,,,,"ValidateXLOGDirectoryStructure, xlog.c:3514","" 2013-04-02 15:13:57.589 CEST,"postgres","postgres",3529,"[local]",515ad995.dc9,1,"",2013-04-02 15:13:57 CEST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858","" 2013-04-02 15:13:57.842 CEST,,,3522,,515ad994.dc2,3,,2013-04-02 15:13:56 CEST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, xlog.c:6164","" 2013-04-02 15:13:58.106 CEST,"postgres","postgres",3536,"[local]",515ad996.dd0,1,"",2013-04-02 15:13:58 CEST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858","" 2013-04-02 15:13:58.620 CEST,"postgres","postgres",3539,"[local]",515ad996.dd3,1,"",2013-04-02 15:13:58 CEST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858","" 2013-04-02 15:13:58.663 CEST,,,3522,,515ad994.dc2,4,,2013-04-02 15:13:56 CEST,,0,LOG,00000,"restored log file ""0000000200000001000000D5"" from archive",,,,,,,,"RestoreArchivedFile, xlog.c:3082","" 2013-04-02 15:13:59.005 CEST,,,3522,,515ad994.dc2,5,,2013-04-02 15:13:56 CEST,1/1,0,LOG,00000,"redo starts at 1/D5000020",,,,,,,,"StartupXLOG, xlog.c:6563","" 2013-04-02 15:13:59.006 CEST,,,3522,,515ad994.dc2,6,,2013-04-02 15:13:56 CEST,1/1,0,LOG,00000,"consistent recovery state reached at 1/D6000000",,,,,,,,"CheckRecoveryConsistency, xlog.c:7023","" 2013-04-02 15:13:59.006 CEST,,,3520,,515ad992.dc0,1,,2013-04-02 15:13:54 CEST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4268","" 2013-04-02 15:13:59.493 CEST,,,3548,,515ad997.ddc,1,,2013-04-02 15:13:59 CEST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,"libpqrcv_connect, libpqwalreceiver.c:171","" 2013-04-02 15:15:46.940 CEST,,,3522,,515ad994.dc2,7,,2013-04-02 15:13:56 CEST,1/1,0,LOG,00000,"received promote request",,,,,,,,"CheckForStandbyTrigger, xlog.c:10578","" 2013-04-02 15:15:46.940 CEST,,,3548,,515ad997.ddc,2,,2013-04-02 15:13:59 CEST,,0,FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150","" 2013-04-02 15:16:08.044 CEST,,,3522,,515ad994.dc2,8,,2013-04-02 15:13:56 CEST,1/1,0,FATAL,XX000,"could not restore file ""0000000200000001000000D6"" from archive: return code 65280",,,,,,,,"RestoreArchivedFile, xlog.c:3132","" 2013-04-02 15:16:08.044 CEST,,,3520,,515ad992.dc0,2,,2013-04-02 15:13:54 CEST,,0,LOG,00000,"startup process (PID 3522) exited with exit code 1",,,,,,,,"LogChildExit, postmaster.c:2878","" 2013-04-02 15:16:08.044 CEST,,,3520,,515ad992.dc0,3,,2013-04-02 15:13:54 CEST,,0,LOG,00000,"terminating any other active server processes",,,,,,,,"HandleChildCrash, postmaster.c:2712","" As you can see, it tries to download an archive file from the old master, but, obviously, it can't as the master has stopped. Here is my recovery.conf : standby_mode = 'on' primary_conninfo = 'host=192.168.0.80 port=5433 user=replika password=replika' restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p' And my postgresql.conf (only the non-default values, same config on both servers; note that all locales but lc_messages are in French because I'm French and so are the servers [no froggies jokes, please ;)], but the problem remains even if locales are all set to fr_FR.UTF-8; lc_messages is set to en_GB.UTF-8 to allow me to send this mesage with English logs) : data_directory = '/var/lib/postgresql/9.1/main' hba_file = '/etc/postgresql/9.1/main/pg_hba.conf' ident_file = '/etc/postgresql/9.1/main/pg_ident.conf' external_pid_file = '/var/run/postgresql/9.1-main.pid' listen_addresses = '*' port = 5433 max_connections = 10 unix_socket_directory = '/var/run/postgresql' ssl = true shared_buffers = 24MB wal_level = hot_standby wal_buffers = 8MB checkpoint_segments = 32 checkpoint_completion_target = 0.9 archive_mode = on archive_command = 'test ! -f /var/lib/postgresql/WAL/%f && cp %p /var/lib/postgresql/WAL/%f' max_wal_senders = 5 hot_standby = on log_destination = 'csvlog' logging_collector = on log_directory = '/var/log/postgresql' log_filename = 'postgresql-requests.log' log_min_messages = info log_error_verbosity = verbose log_line_prefix = '%t ' lc_messages = 'en_GB.UTF-8' lc_monetary = 'fr_FR.UTF-8' lc_numeric = 'fr_FR.UTF-8' lc_time = 'fr_FR.UTF-8' default_text_search_config = 'pg_catalog.french' I assume I missed something there because, as this is a replication mechanism, in my opinion, the standby should simply be promoted after master failure, or at least I should have the possibility to ask this to the standby while asking not to connect to the old master and make its own way with the local data and whose the replication stream already sent. Or is it the standby promotion procedure in case of master failure which misses something? I can't tell because the documentation isn't verbose about the standby promotion after a master failure (or I wasn't able to find it by myself). If there is a detailed procedure available for the event of a master crash, where is it? Else, where is my error? Thank you in advance. Regards. -- David Guyot Sysadmin Europe Camions Interactive Moulin Collot F-88500 Ambacourt Tel: +33 (0)3 29 30 47 85 Fax : +33 (0)3 29 31 31 31
Re: Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?
From
Douglas J Hunley
Date:
On Tue, Apr 2, 2013 at 10:29 AM, David Guyot (English) <david.guyot@europecamions-interactive.com> wrote:
restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'
You're telling the recovery process to get the files off the (now dead) master here...
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd Web: douglasjhunley.com
G+: http://goo.gl/sajR3
Re: Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?
From
David Guyot
Date:
Mister,
Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?
Thank you in advance.
Regards.
Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?
Thank you in advance.
Regards.
Le 02/04/2013 16:47, Douglas J Hunley a écrit :
On Tue, Apr 2, 2013 at 10:29 AM, David Guyot (English) <david.guyot@europecamions-interactive.com> wrote:restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'
You're telling the recovery process to get the files off the (now dead) master here...--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd Web: douglasjhunley.com
G+: http://goo.gl/sajR3
-- David Guyot Administrateur système, réseau et télécommunications Europe Camions Interactive Moulin Collot F-88500 Ambacourt Tel: +33 (0)3 29 30 47 85 Fax : +33 (0)3 29 31 31 31
Attachment
Re: Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?
From
Douglas J Hunley
Date:
On Tue, Apr 2, 2013 at 10:53 AM, David Guyot <david.guyot@europecamions-interactive.com> wrote:
Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?
If you're going to do WAL shipping, your 'archive_command' on the master needs to be copying over to the standby (rsync or whatever). Then the standby's recovery.conf would be a simple 'cp /path/to/wal-archive' type of command.
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd Web: douglasjhunley.com
G+: http://goo.gl/sajR3
Re: Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?
From
Douglas J Hunley
Date:
On Tue, Apr 2, 2013 at 10:53 AM, David Guyot <david.guyot@europecamions-interactive.com> wrote:
This might help (just stop before the cascade stuff) http://bartek.im/blog/2012/12/04/postgresql-92-streaming-primer.html
--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd Web: douglasjhunley.com
G+: http://goo.gl/sajR3
Mister,
Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?
This might help (just stop before the cascade stuff) http://bartek.im/blog/2012/12/04/postgresql-92-streaming-primer.html
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd Web: douglasjhunley.com
G+: http://goo.gl/sajR3
Hi all,
Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresql database (version 8.4.3)?
Thanks in advance!
Bhanu M. Gandikota
Cell: (415) 420-7740
Cell: (415) 420-7740
AFAIK, you have to use stunnel to do it (which is not hard to setup, but it almost makes you wonder whether you should go to the trouble of using pgbouncer at all).
I just went through this and I ended up just testing direct connections through the tunnel without pgbouncer in the middle. It worked for the most part. I had some unexplained issues with my app that I couldn't pinpoint so I stopped testing it for a while. (And my app already does connection pooling, so pgbouncer wasn't going to help that much in that regard.)
On 4/10/2013 2:06 PM, Bhanu Murthy wrote:
I just went through this and I ended up just testing direct connections through the tunnel without pgbouncer in the middle. It worked for the most part. I had some unexplained issues with my app that I couldn't pinpoint so I stopped testing it for a while. (And my app already does connection pooling, so pgbouncer wasn't going to help that much in that regard.)
On 4/10/2013 2:06 PM, Bhanu Murthy wrote:
Hi all,Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresql database (version 8.4.3)?Thanks in advance!Bhanu M. Gandikota
Cell: (415) 420-7740
On Wed, Apr 10, 2013 at 11:06:32AM -0700, Bhanu Murthy wrote: > Hi all, > > Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresqldatabase (version 8.4.3)? > > Thanks in advance! > > Bhanu M. Gandikota > Cell: (415) 420-7740 Hi Bhanu, You will need to use your link encryption process of choice to tunnel the connections from pgbouncer to the backend. SSH and STunnel are two that we have used successfully in the past. Regards, Ken
We're looking to use streaming replication to a target via a secondary host using stunnel. I'd love to hear how you were able to achieve this, ktm@rice.edu. Effectively we're looking to have the database on our customer's site (let's call that MachineA) replicate to our backend postgres target in the cloud (let's call that MachineC). However, MachineA has no direct communication with MachineC, in fact, it should never be allowed to communicate with it. We have another server that provides various services to the client MachineA that is based in our home datacenter (let's call that MachineB) which we would like to use as a 'staging' machine for the replication to the database replication target. Is this possible to achieve using stunnel (and pgbouncer?) alone? At no point can this traffic go 'in the clear', for obvious reasons ;) Any pointers or assistance help gratefully received! Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754606.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
Here is my understading of your requirement:
machine-A at customer site would replicate to staging machine-B which will then replicate to target machine-C in cloud - and you would want to encrypt data in motion from A to B to C.
I could think of 2 possible solutions:
1. Use Stunnel from machine-A to machine-B, and again from machine-B to machine-C.
2. Use streaming replication config features to secure traffic (encrypted data over TCP)
Master configuration on machine-A:
=>Update replication line in pg_hba.conf to "hostssl"
Slave configuration on machine-B:
=> primary_conninfo='host=machine-A port=5432 sslmode=require'
or
=> primary_conninfo='host=machine-A port=5432 sslmode=verify-ca'
=> primary_conninfo='host=machine-A port=5432 sslmode=verify-ca'
You could then use cascading replication (available from postgres 9.2) from machine-B to machine-C.
From: handsfree <luke.hansbury@redwood.com>
To: pgsql-admin@postgresql.org
Sent: Tuesday, May 7, 2013 9:17 AM
Subject: Re: [ADMIN] [SQL] Encrypting PGBouncer to Postgres DB connections
To: pgsql-admin@postgresql.org
Sent: Tuesday, May 7, 2013 9:17 AM
Subject: Re: [ADMIN] [SQL] Encrypting PGBouncer to Postgres DB connections
We're looking to use streaming replication to a target via a secondary host
using stunnel. I'd love to hear how you were able to achieve this,
ktm@rice.edu.
Effectively we're looking to have the database on our customer's site (let's
call that MachineA) replicate to our backend postgres target in the cloud
(let's call that MachineC). However, MachineA has no direct communication
with MachineC, in fact, it should never be allowed to communicate with it.
We have another server that provides various services to the client MachineA
that is based in our home datacenter (let's call that MachineB) which we
would like to use as a 'staging' machine for the replication to the database
replication target. Is this possible to achieve using stunnel (and
pgbouncer?) alone?
At no point can this traffic go 'in the clear', for obvious reasons ;)
Any pointers or assistance help gratefully received! Thanks
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754606.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Bhanu Murthy wrote: > handsfree wrote: >> We're looking to use streaming replication to a target via a secondary host >> using stunnel. > I could think of 2 possible solutions: [...] > 2. Use streaming replication config features to secure traffic (encrypted data over TCP) > > Master configuration on machine-A: > =>Update replication line in pg_hba.conf to "hostssl" > > Slave configuration on machine-B: > => primary_conninfo='host=machine-A port=5432 sslmode=require' > or > => primary_conninfo='host=machine-A port=5432 sslmode=verify-ca' > > You could then use cascading replication (available from postgres 9.2) from machine-B to machine-C. That would be the best solution, but I ran into a problem with it: http://www.postgresql.org/message-id/D960CB61B694CF459DCFB4B0128514C208A4E93C@exadv11.host.magwien.gv.at It still works, but the replication connection is lost and restarted whenever SSL renegotiation takes place. I wasn't able to figure out what causes the problem. Yours, Laurenz Albe
Thanks for the response. In terms of your suggestions: 1. We already have stunnel installed on MachineA and MachineB, and it would not be too difficult to install stunnel on MachineC either. What I'm unsure of is how to make MachineB a 'bridge' so the postgres target MachineC appears as an end point to MachineA. We just want to pass postgres rsync through MachineB to MachineC and have commit notifications passed back to MachineA (we're hoping to run postgres replication synchronously initially, though if performance suffers too much we'll run async). 2. We're unable to stream directly to MachineB as it has its own postgres database cluster which we don't want to write to. For information, we're running Ubuntu 12.04 LTS and Postgres 9.2 on all hosts. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754684.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
Laurenz, thanks for that information; I will bear in mind the problems that you encountered with SSL renegotiation. I'm not sure that this will necessarily be an issue using stunnel, but I will talk with my colleagues who have more experience of stunnel and confirm whether or not it's applicable. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-with-streaming-replication-under-PgSQL-9-1-x-failover-when-master-crashes-tp5750442p5754685.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
handsfree wrote: > Laurenz, thanks for that information; I will bear in mind the problems that > you encountered with SSL renegotiation. I'm not sure that this will > necessarily be an issue using stunnel, but I will talk with my colleagues > who have more experience of stunnel and confirm whether or not it's > applicable. That is definitely only applicable to streaming replication and not to stunnel. Yours, Laurenz Albe