Thread: Minimal streaming replication
I'm attempting to set up minimal/simple replication with one master and one standby using the following pair of identical machines connected through through a 1-Gb switch: 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit The documentation says "To use streaming replication, set up a file-based log-shipping standby server as described in Section 25.2...." however I'm not using any of the archive or restore commands but instead use pg_basebackup to do the initial copy in a script that at its core runs pg_basebackup then starts the standby server. So... Given a sufficiently large wal_keep_segments on the master is this a reasonable approach? Is there a disadvantage, other than disk-space required, to having wal_keep_segments set to a fairly large number, say 256 or 512? Once replication was running I tried to stress/break it. I started pgbench with 100 clients and then simultaneously started a restore (12 GB of tables plus associated indexes). It *seems* to work. I get appropriate results from test queries, and the master and standby monitoring queries seem reasonable (queries taken at different times - log locations won't match): --Standby select pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), now()-pg_last_xact_replay_timestamp() as log_delay; pg_last_xlog_receive_location | pg_last_xlog_replay_location | log_delay -------------------------------+------------------------------+----------------- 1F2/F4E4F8C0 | 1F2/F4E4F8C0 | 00:00:00.995516 --Master select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ procpid | 25945 usesysid | 10 usename | postgres application_name | walreceiver client_addr | 192.168.4.215 client_hostname | client_port | 41335 backend_start | 2012-06-25 15:59:02.833441-07 state | streaming sent_location | 1F3/659F2000 write_location | 1F3/659D3538 flush_location | 1F3/659D3538 replay_location | 1F3/659C1570 sync_priority | 0 sync_state | async However I'm seeing troubling messages in the log. While running pgbench I see the following types of messages on the master every minute or few: 2012-06-25 16:15:51 PDT WARNING: pgstat wait timeout 2012-06-25 16:16:26 PDT LOG: SSL renegotiation failure 2012-06-25 16:16:26 PDT LOG: SSL error: unexpected record 2012-06-25 16:16:26 PDT LOG: could not send data to client: Connection reset by peer The standby has the following sorts of messages: ... 2012-06-25 11:12:11 PDT FATAL: could not receive data from WAL stream: SSL error: sslv3 alert unexpected message 2012-06-25 11:12:11 PDT LOG: record with zero length at 1C5/95D2FE00 2012-06-25 11:12:26 PDT LOG: streaming replication successfully connected to primary ... 2012-06-25 11:30:59 PDT LOG: unexpected pageaddr 1C7/C9FAE000 in log file 456, segment 173, offset 16441344 2012-06-25 11:30:59 PDT LOG: streaming replication successfully connected to primary ... 2012-06-25 11:36:26 PDT FATAL: could not send data to WAL stream: SSL error: sslv3 alert unexpected message 2012-06-25 11:36:26 PDT LOG: invalid magic number 0000 in log file 457, segment 173, offset 15851520 ... 2012-06-25 11:36:41 PDT LOG: streaming replication successfully connected to primary ... Any advice on what this is telling me? I'm not keen on words like "FATAL" in my logs. Cheers, Steve
On Tue, Jun 26, 2012 at 6:47 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > The documentation says "To use streaming replication, set up a file-based > log-shipping standby server as described in Section 25.2...." however I'm > not using any of the archive or restore commands but instead use > pg_basebackup to do the initial copy in a script that at its core runs > pg_basebackup then starts the standby server. So... > > Given a sufficiently large wal_keep_segments on the master is this a > reasonable approach? Its what I've setup and seems to be working fine. > Is there a disadvantage, other than disk-space required, to having > wal_keep_segments set to a fairly large number, say 256 or 512? I set mine to 5000. > However I'm seeing troubling messages in the log. While running pgbench I > see the following types of messages on the master every minute or few: > 2012-06-25 11:36:26 PDT FATAL: could not send data to WAL stream: SSL > error: sslv3 alert unexpected message > 2012-06-25 11:36:26 PDT LOG: invalid magic number 0000 in log file 457, > segment 173, offset 15851520 > ... > 2012-06-25 11:36:41 PDT LOG: streaming replication successfully connected > to primary > ... > > Any advice on what this is telling me? I'm not keen on words like "FATAL" in > my logs. I saw this with Ubuntu 12.04 and PostgreSQL 9.1.4, replicating to an identical machine. Google suggested it was caused by different versions of libssl, but I don't think that is the case here unless one of the packages got statically linked with an old libssl. I haven't had time to investigate so I've disabled SSL for now, even though replication appears to work apart from the disconnections. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On 06/26/2012 05:18 AM, Stuart Bishop wrote: > On Tue, Jun 26, 2012 at 6:47 AM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: > > ... > I'm seeing troubling messages in the log. While running pgbench I > see the following types of messages on the master every minute or few: >> 2012-06-25 11:36:26 PDT FATAL: could not send data to WAL stream: SSL >> error: sslv3 alert unexpected message >> 2012-06-25 11:36:26 PDT LOG: invalid magic number 0000 in log file 457, >> segment 173, offset 15851520 >> ... >> 2012-06-25 11:36:41 PDT LOG: streaming replication successfully connected >> to primary >> ... >> >> Any advice on what this is telling me? I'm not keen on words like "FATAL" in >> my logs. > I saw this with Ubuntu 12.04 and PostgreSQL 9.1.4, replicating to an > identical machine. Google suggested it was caused by different > versions of libssl, but I don't think that is the case here unless one > of the packages got statically linked with an old libssl. I haven't > had time to investigate so I've disabled SSL for now, even though > replication appears to work apart from the disconnections. > I don't think different SSL versions is the issue as both machines are identical hardware and were built within minutes of each other from the same install source, updates have been applied simultaneously and the current package lists pulled from the machines is identical. I did some research and testing and suspect the issue is related to the SSL renegotiation security vulnerability. The ssl_renegotiation_limit defaults to 512MB which goes by pretty quickly when running pgbench. I set it to "0" (off) and the errors stopped. There is a note in the documentation: "SSL libraries from before November 2009 are insecure when using SSL renegotiation, due to a vulnerability in the SSL protocol. As a stop-gap fix for this vulnerability, some vendors shipped SSL libraries incapable of doing renegotiation. If any such libraries are in use on the client or server, SSL renegotiation should be disabled." It would appear that the defaults set by the Ubuntu PostgreSQL packagers are in conflict with the decisions of the Ubuntu SSL packagers. Cheers, Steve