Thread: Review: Patch for Synchronous Replication

Review: Patch for Synchronous Replication

From
Thom Brown
Date:
This is a basic review of Fujii Masao's synchronous replication patch
from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com

Review Description
==================
This patch extends existing asynchronous streaming replication with
options to enable different levels of synchronous behaviour.  On the
primary, the is a new standbys.conf file containing a manifest of
standbys it seeks a form a confirmation from before committing
transactions.  These contain standby name, level of synchronicity (if
it wasn't a word, it is now), and a timeout value specifying how long
the primary is willing to wait for confirmation before aborting the
transaction.  On the standby, the new option "standby_name" has been
added to publish a named identity of the standby server to the
primary.


Patch application
=================
The patch applies cleanly to HEAD.  All regression tests pass
successfully as expected.


Testing
=======
I configured the primary's standby.conf to provide synchronous
replication to a single slave using fsync as it's replication level
and a timeout of 100ms.  The wal_level was set to hot_standby in
postgresql.conf.  The standby's recovery.conf had its standby_name
value set to match that expected by the primary's standby.conf.  This
is summarised as follows:

## primary postgresql.conf:
wal_level = hot_standby
max_wal_senders = 2
wal_keep_segments = 2

## primary standbys.conf
# STANDBY NAME    SYNCHRONOUS   TIMEOUT
cougar            fsync           100ms

## primary pg_hba.conf
# TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
host    replication     postgres        192.168.102.17/32       trust

## standby postgresql.conf
hot_standby = on

## standby recovery.conf
standby_name = 'cougar'
standby_mode = 'on'
primary_conninfo = 'host=192.168.102.125 port=5432'


Issues
======

The primary started up fine and was accepting connections.  A base
backup was taken for the standby, and after configuring the standby, I
attempted to bring it up, but received the following error:

postgres@cougar:~/project/data$ pg_ctl start
server starting
postgres@cougar:~/project/data$ LOG:  database system was shut down in
recovery at 2010-09-29 22:52:24 BST
LOG:  entering standby mode
LOG:  redo starts at 0/1000020
LOG:  record with zero length at 0/10000B0
FATAL:  could not connect to the primary server: invalid connection
option "standby_name"

I believe I am using the correct parameter as I followed the
additional documentation provided in the patch.


Conclusion
==========

It at least appears to me that it isn't functional in its current
state, or there is setup information missing from the
documentation.... or I've make a stupid mistake somewhere (likely).

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Review: Patch for Synchronous Replication

From
Thom Brown
Date:
On 29 September 2010 23:29, Thom Brown <thom@linux.com> wrote:
> This is a basic review of Fujii Masao's synchronous replication patch
> from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com
>
> Review Description
> ==================
> This patch extends existing asynchronous streaming replication with
> options to enable different levels of synchronous behaviour.  On the
> primary, the is a new standbys.conf file containing a manifest of
> standbys it seeks a form a confirmation from before committing
> transactions.  These contain standby name, level of synchronicity (if
> it wasn't a word, it is now), and a timeout value specifying how long
> the primary is willing to wait for confirmation before aborting the
> transaction.  On the standby, the new option "standby_name" has been
> added to publish a named identity of the standby server to the
> primary.
>
>
> Patch application
> =================
> The patch applies cleanly to HEAD.  All regression tests pass
> successfully as expected.
>
>
> Testing
> =======
> I configured the primary's standby.conf to provide synchronous
> replication to a single slave using fsync as it's replication level
> and a timeout of 100ms.  The wal_level was set to hot_standby in
> postgresql.conf.  The standby's recovery.conf had its standby_name
> value set to match that expected by the primary's standby.conf.  This
> is summarised as follows:
>
> ## primary postgresql.conf:
> wal_level = hot_standby
> max_wal_senders = 2
> wal_keep_segments = 2
>
> ## primary standbys.conf
> # STANDBY NAME    SYNCHRONOUS   TIMEOUT
> cougar            fsync           100ms
>
> ## primary pg_hba.conf
> # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
> host    replication     postgres        192.168.102.17/32       trust
>
> ## standby postgresql.conf
> hot_standby = on
>
> ## standby recovery.conf
> standby_name = 'cougar'
> standby_mode = 'on'
> primary_conninfo = 'host=192.168.102.125 port=5432'
>
>
> Issues
> ======
>
> The primary started up fine and was accepting connections.  A base
> backup was taken for the standby, and after configuring the standby, I
> attempted to bring it up, but received the following error:
>
> postgres@cougar:~/project/data$ pg_ctl start
> server starting
> postgres@cougar:~/project/data$ LOG:  database system was shut down in
> recovery at 2010-09-29 22:52:24 BST
> LOG:  entering standby mode
> LOG:  redo starts at 0/1000020
> LOG:  record with zero length at 0/10000B0
> FATAL:  could not connect to the primary server: invalid connection
> option "standby_name"
>
> I believe I am using the correct parameter as I followed the
> additional documentation provided in the patch.
>
>
> Conclusion
> ==========
>
> It at least appears to me that it isn't functional in its current
> state, or there is setup information missing from the
> documentation.... or I've make a stupid mistake somewhere (likely).

Quick back-peddle... it appears the patch was only successful on the
standby.  Only doc changes appeared to make it to the primary.
Re-attempt tomorrow.  Apologies.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Review: Patch for Synchronous Replication

From
Thom Brown
Date:
On 29 September 2010 23:57, Thom Brown <thom@linux.com> wrote:
> On 29 September 2010 23:29, Thom Brown <thom@linux.com> wrote:
>> This is a basic review of Fujii Masao's synchronous replication patch
>> from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com
>>
>> Review Description
>> ==================
>> This patch extends existing asynchronous streaming replication with
>> options to enable different levels of synchronous behaviour.  On the
>> primary, the is a new standbys.conf file containing a manifest of
>> standbys it seeks a form a confirmation from before committing
>> transactions.  These contain standby name, level of synchronicity (if
>> it wasn't a word, it is now), and a timeout value specifying how long
>> the primary is willing to wait for confirmation before aborting the
>> transaction.  On the standby, the new option "standby_name" has been
>> added to publish a named identity of the standby server to the
>> primary.
>>
>>
>> Patch application
>> =================
>> The patch applies cleanly to HEAD.  All regression tests pass
>> successfully as expected.
>>
>>
>> Testing
>> =======
>> I configured the primary's standby.conf to provide synchronous
>> replication to a single slave using fsync as it's replication level
>> and a timeout of 100ms.  The wal_level was set to hot_standby in
>> postgresql.conf.  The standby's recovery.conf had its standby_name
>> value set to match that expected by the primary's standby.conf.  This
>> is summarised as follows:
>>
>> ## primary postgresql.conf:
>> wal_level = hot_standby
>> max_wal_senders = 2
>> wal_keep_segments = 2
>>
>> ## primary standbys.conf
>> # STANDBY NAME    SYNCHRONOUS   TIMEOUT
>> cougar            fsync           100ms
>>
>> ## primary pg_hba.conf
>> # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
>> host    replication     postgres        192.168.102.17/32       trust
>>
>> ## standby postgresql.conf
>> hot_standby = on
>>
>> ## standby recovery.conf
>> standby_name = 'cougar'
>> standby_mode = 'on'
>> primary_conninfo = 'host=192.168.102.125 port=5432'
>>
>>
>> Issues
>> ======
>>
>> The primary started up fine and was accepting connections.  A base
>> backup was taken for the standby, and after configuring the standby, I
>> attempted to bring it up, but received the following error:
>>
>> postgres@cougar:~/project/data$ pg_ctl start
>> server starting
>> postgres@cougar:~/project/data$ LOG:  database system was shut down in
>> recovery at 2010-09-29 22:52:24 BST
>> LOG:  entering standby mode
>> LOG:  redo starts at 0/1000020
>> LOG:  record with zero length at 0/10000B0
>> FATAL:  could not connect to the primary server: invalid connection
>> option "standby_name"
>>
>> I believe I am using the correct parameter as I followed the
>> additional documentation provided in the patch.
>>
>>
>> Conclusion
>> ==========
>>
>> It at least appears to me that it isn't functional in its current
>> state, or there is setup information missing from the
>> documentation.... or I've make a stupid mistake somewhere (likely).
>
> Quick back-peddle... it appears the patch was only successful on the
> standby.  Only doc changes appeared to make it to the primary.
> Re-attempt tomorrow.  Apologies.

Well, that doesn't seem to have made any difference.  Confirmed the
patch was applied in both cases, rebuilt, base backup again etc... no
change.  Same error as in original review. *shrug*

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Review: Patch for Synchronous Replication

From
Robert Haas
Date:
Thom -

Just as a logistical note, reviews should be posted to -hackers.
-rrreviewers is for discussion of assigning reviewers, coordinating
who is doing what, etc.

...Robert

On Wed, Sep 29, 2010 at 6:29 PM, Thom Brown <thom@linux.com> wrote:
> This is a basic review of Fujii Masao's synchronous replication patch
> from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com
>
> Review Description
> ==================
> This patch extends existing asynchronous streaming replication with
> options to enable different levels of synchronous behaviour.  On the
> primary, the is a new standbys.conf file containing a manifest of
> standbys it seeks a form a confirmation from before committing
> transactions.  These contain standby name, level of synchronicity (if
> it wasn't a word, it is now), and a timeout value specifying how long
> the primary is willing to wait for confirmation before aborting the
> transaction.  On the standby, the new option "standby_name" has been
> added to publish a named identity of the standby server to the
> primary.
>
>
> Patch application
> =================
> The patch applies cleanly to HEAD.  All regression tests pass
> successfully as expected.
>
>
> Testing
> =======
> I configured the primary's standby.conf to provide synchronous
> replication to a single slave using fsync as it's replication level
> and a timeout of 100ms.  The wal_level was set to hot_standby in
> postgresql.conf.  The standby's recovery.conf had its standby_name
> value set to match that expected by the primary's standby.conf.  This
> is summarised as follows:
>
> ## primary postgresql.conf:
> wal_level = hot_standby
> max_wal_senders = 2
> wal_keep_segments = 2
>
> ## primary standbys.conf
> # STANDBY NAME    SYNCHRONOUS   TIMEOUT
> cougar            fsync           100ms
>
> ## primary pg_hba.conf
> # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
> host    replication     postgres        192.168.102.17/32       trust
>
> ## standby postgresql.conf
> hot_standby = on
>
> ## standby recovery.conf
> standby_name = 'cougar'
> standby_mode = 'on'
> primary_conninfo = 'host=192.168.102.125 port=5432'
>
>
> Issues
> ======
>
> The primary started up fine and was accepting connections.  A base
> backup was taken for the standby, and after configuring the standby, I
> attempted to bring it up, but received the following error:
>
> postgres@cougar:~/project/data$ pg_ctl start
> server starting
> postgres@cougar:~/project/data$ LOG:  database system was shut down in
> recovery at 2010-09-29 22:52:24 BST
> LOG:  entering standby mode
> LOG:  redo starts at 0/1000020
> LOG:  record with zero length at 0/10000B0
> FATAL:  could not connect to the primary server: invalid connection
> option "standby_name"
>
> I believe I am using the correct parameter as I followed the
> additional documentation provided in the patch.
>
>
> Conclusion
> ==========
>
> It at least appears to me that it isn't functional in its current
> state, or there is setup information missing from the
> documentation.... or I've make a stupid mistake somewhere (likely).
>
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> --
> Sent via pgsql-rrreviewers mailing list (pgsql-rrreviewers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-rrreviewers
>



--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company