Thread: Minimal streaming replication

Minimal streaming replication

From
Steve Crawford
Date:
I'm attempting to set up minimal/simple replication with one master and
one standby using the following pair of identical machines connected
through through a 1-Gb switch:
3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64
x86_64 x86_64 GNU/Linux
PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

The documentation says "To use streaming replication, set up a
file-based log-shipping standby server as described in Section 25.2...."
however I'm not using any of the archive or restore commands but instead
use pg_basebackup to do the initial copy in a script that at its core
runs pg_basebackup then starts the standby server. So...

Given a sufficiently large wal_keep_segments on the master is this a
reasonable approach?

Is there a disadvantage, other than disk-space required, to having
wal_keep_segments set to a fairly large number, say 256 or 512?

Once replication was running I tried to stress/break it. I started
pgbench with 100 clients and then simultaneously started a restore (12
GB of tables plus associated indexes). It *seems* to work. I get
appropriate results from test queries, and the master and standby
monitoring queries seem reasonable (queries taken at different times -
log locations won't match):

--Standby
select
     pg_last_xlog_receive_location(),
     pg_last_xlog_replay_location(),
     now()-pg_last_xact_replay_timestamp() as log_delay;
  pg_last_xlog_receive_location | pg_last_xlog_replay_location |
log_delay
-------------------------------+------------------------------+-----------------
  1F2/F4E4F8C0                  | 1F2/F4E4F8C0                 |
00:00:00.995516

--Master
select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
procpid          | 25945
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 192.168.4.215
client_hostname  |
client_port      | 41335
backend_start    | 2012-06-25 15:59:02.833441-07
state            | streaming
sent_location    | 1F3/659F2000
write_location   | 1F3/659D3538
flush_location   | 1F3/659D3538
replay_location  | 1F3/659C1570
sync_priority    | 0
sync_state       | async


However I'm seeing troubling messages in the log. While running pgbench
I see the following types of messages on the master every minute or few:

2012-06-25 16:15:51 PDT WARNING:  pgstat wait timeout
2012-06-25 16:16:26 PDT LOG:  SSL renegotiation failure
2012-06-25 16:16:26 PDT LOG:  SSL error: unexpected record
2012-06-25 16:16:26 PDT LOG:  could not send data to client: Connection
reset by peer

The standby has the following sorts of messages:
...
2012-06-25 11:12:11 PDT FATAL:  could not receive data from WAL stream:
SSL error: sslv3 alert unexpected message
2012-06-25 11:12:11 PDT LOG:  record with zero length at 1C5/95D2FE00
2012-06-25 11:12:26 PDT LOG:  streaming replication successfully
connected to primary
...
2012-06-25 11:30:59 PDT LOG:  unexpected pageaddr 1C7/C9FAE000 in log
file 456, segment 173, offset 16441344
2012-06-25 11:30:59 PDT LOG:  streaming replication successfully
connected to primary
...
2012-06-25 11:36:26 PDT FATAL:  could not send data to WAL stream: SSL
error: sslv3 alert unexpected message
2012-06-25 11:36:26 PDT LOG:  invalid magic number 0000 in log file 457,
segment 173, offset 15851520
...
2012-06-25 11:36:41 PDT LOG:  streaming replication successfully
connected to primary
...

Any advice on what this is telling me? I'm not keen on words like
"FATAL" in my logs.

Cheers,
Steve


Re: Minimal streaming replication

From
Stuart Bishop
Date:
On Tue, Jun 26, 2012 at 6:47 AM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

> The documentation says "To use streaming replication, set up a file-based
> log-shipping standby server as described in Section 25.2...." however I'm
> not using any of the archive or restore commands but instead use
> pg_basebackup to do the initial copy in a script that at its core runs
> pg_basebackup then starts the standby server. So...
>
> Given a sufficiently large wal_keep_segments on the master is this a
> reasonable approach?

Its what I've setup and seems to be working fine.

> Is there a disadvantage, other than disk-space required, to having
> wal_keep_segments set to a fairly large number, say 256 or 512?

I set mine to 5000.


> However I'm seeing troubling messages in the log. While running pgbench I
> see the following types of messages on the master every minute or few:

> 2012-06-25 11:36:26 PDT FATAL:  could not send data to WAL stream: SSL
> error: sslv3 alert unexpected message
> 2012-06-25 11:36:26 PDT LOG:  invalid magic number 0000 in log file 457,
> segment 173, offset 15851520
> ...
> 2012-06-25 11:36:41 PDT LOG:  streaming replication successfully connected
> to primary
> ...
>
> Any advice on what this is telling me? I'm not keen on words like "FATAL" in
> my logs.

I saw this with Ubuntu 12.04 and PostgreSQL 9.1.4, replicating to an
identical machine. Google suggested it was caused by different
versions of libssl, but I don't think that is the case here unless one
of the packages got statically linked with an old libssl. I haven't
had time to investigate so I've disabled SSL for now, even though
replication appears to work apart from the disconnections.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Minimal streaming replication

From
Steve Crawford
Date:
On 06/26/2012 05:18 AM, Stuart Bishop wrote:
> On Tue, Jun 26, 2012 at 6:47 AM, Steve Crawford
> <scrawford@pinpointresearch.com>  wrote:
>
> ...
> I'm seeing troubling messages in the log. While running pgbench I
> see the following types of messages on the master every minute or few:
>> 2012-06-25 11:36:26 PDT FATAL:  could not send data to WAL stream: SSL
>> error: sslv3 alert unexpected message
>> 2012-06-25 11:36:26 PDT LOG:  invalid magic number 0000 in log file 457,
>> segment 173, offset 15851520
>> ...
>> 2012-06-25 11:36:41 PDT LOG:  streaming replication successfully connected
>> to primary
>> ...
>>
>> Any advice on what this is telling me? I'm not keen on words like "FATAL" in
>> my logs.
> I saw this with Ubuntu 12.04 and PostgreSQL 9.1.4, replicating to an
> identical machine. Google suggested it was caused by different
> versions of libssl, but I don't think that is the case here unless one
> of the packages got statically linked with an old libssl. I haven't
> had time to investigate so I've disabled SSL for now, even though
> replication appears to work apart from the disconnections.
>
I don't think different SSL versions is the issue as both machines are
identical hardware and were built within minutes of each other from the
same install source, updates have been applied simultaneously and the
current package lists pulled from the machines is identical.

I did some research and testing and suspect the issue is related to the
SSL renegotiation security vulnerability.

The ssl_renegotiation_limit defaults to 512MB which goes by pretty
quickly when running pgbench. I set it to "0" (off) and the errors stopped.

There is a note in the documentation: "SSL libraries from before
November 2009 are insecure when using SSL renegotiation, due to a
vulnerability in the SSL protocol. As a stop-gap fix for this
vulnerability, some vendors shipped SSL libraries incapable of doing
renegotiation. If any such libraries are in use on the client or server,
SSL renegotiation should be disabled."

It would appear that the defaults set by the Ubuntu PostgreSQL packagers
are in conflict with the decisions of the Ubuntu SSL packagers.

Cheers,
Steve