Thread: BUG #7803: Replication Problem(no master is there)

BUG #7803: Replication Problem(no master is there)

From
katsumata.tomonari@po.ntts.co.jp
Date:
The following bug has been logged on the website:

Bug reference:      7803
Logged by:          Tomonari Katsumata
Email address:      katsumata.tomonari@po.ntts.co.jp
PostgreSQL version: 9.2.2
Operating system:   RHEL 5.3 x86_64
Description:        =


hi, I'm playing with Synchronous Replication on PostgreSQL 9.2.2.
And I saw a strange behavior.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
[issues]
two standbys are connected on each other, but
no master is there.


[reproduce]
1. create a master on port 22922

--- postgresql.conf ---
wal_level =3D hot_standby
archive_mode =3D on
archive_command =3D 'cp %p ${PGARC}/%f'
max_wal_senders =3D 3
listen_addresses =3D '*'
hot_standby =3D on
------------------------

2. start the master

mst$ pg_ctl start

3. create a standby(taking a base backup from master)

pg_basebackup -p 22922 -D ${PGDATA}

-- recovery.conf --
restore_command =3D 'cp ${PGARC}/%f %p'
standby_mode =3D on
primary_conninfo =3D 'port=3D22922 application_name=3Dsby'
recovery_target_timeline =3D 'newest'
-------------------

4. stop the master

mst$ pg_ctl stop

5. make recovery.conf for the master

-- recovery.conf --
restore_command =3D 'cp ${PGARC}/%f %p'
standby_mode =3D on
primary_conninfo =3D 'port=3D23922 application_name=3Dsby'
recovery_target_timeline =3D 'newest'
-------------------

6. start the master as standby

mst$ pg_ctl start

7. start the standby as standby on port 23922

sby$ pg_ctl start
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

I did not see the situation like above on PostgreSQL 9.1.7.

Is this intended change?


regards,
-----------------
NTT Software Corporation
 Tomonari Katsumata

Re: BUG #7803: Replication Problem(no master is there)

From
Heikki Linnakangas
Date:
On 11.01.2013 06:09, katsumata.tomonari@po.ntts.co.jp wrote:
> The following bug has been logged on the website:
>
> Bug reference:      7803
> Logged by:          Tomonari Katsumata
> Email address:      katsumata.tomonari@po.ntts.co.jp
> PostgreSQL version: 9.2.2
> Operating system:   RHEL 5.3 x86_64
> Description:
>
> hi, I'm playing with Synchronous Replication on PostgreSQL 9.2.2.
> And I saw a strange behavior.

Unless you left out something, the configuration you described actually
sets up asynchronous replication.

> =================================================================
> [issues]
> two standbys are connected on each other, but
> no master is there.
>...
> =================================================================
>
> I did not see the situation like above on PostgreSQL 9.1.7.
>
> Is this intended change?

In 9.1, this scenario was impossible because you could not connect a
standby to another standby. In 9.2, that's allowed. It's a new feature
called "cascading replication", see
http://www.postgresql.org/docs/9.2/static/warm-standby.html#CASCADING-REPLICATION.

With that feature, it's indeed possible to form a cycle of standby
servers connected to each other. There was just a long discussion on
pgsql-hackers on whether we should try to detect that scenario [1], but
the consensus seems to be that we should not. It would be difficult to
implement such detection, and sometimes it's useful to have such a
cycle, as a transient state at a failover, for example.

So the bottom line is that this is an intended change, and the admin
will just have to avoid doing that.

This makes me wonder if there should be a GUC to forbid cascading
replication, though. If you don't want to do cascading replication
(which is quite rare, I'd say), you could just disable it to avoid a
situation like this.

[1] http://archives.postgresql.org/pgsql-hackers/2012-12/msg01134.php

- Heikki

Re: BUG #7803: Replication Problem(no master is there)

From
Simon Riggs
Date:
On 11 January 2013 08:40, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> This makes me wonder if there should be a GUC to forbid cascading
> replication, though. If you don't want to do cascading replication (which is
> quite rare, I'd say), you could just disable it to avoid a situation like
> this.

Connection from a standby is disabled by default. Don't enable it...

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #7803: Replication Problem(no master is there)

From
Heikki Linnakangas
Date:
On 11.01.2013 11:19, Simon Riggs wrote:
> On 11 January 2013 08:40, Heikki Linnakangas<hlinnakangas@vmware.com>  wrote:
>
>> This makes me wonder if there should be a GUC to forbid cascading
>> replication, though. If you don't want to do cascading replication (which is
>> quite rare, I'd say), you could just disable it to avoid a situation like
>> this.
>
> Connection from a standby is disabled by default. Don't enable it...

It's controlled by hot_standby=on/off. You might well want to enable hot
standby, to run queries on a standby, but disable cascading replication.

You can also forbid replication connections using pg_hba.conf, but then
you need to modify it to allow connections again after failover, when
the standby becomes master. That's doable, but inconvenient.

(just thinking out loud here..) If we were to have such a switch, it'd
be nice to still allow base backups from the standby.

- Heikki

Re: BUG #7803: Replication Problem(no master is there)

From
Tomonari Katsumata
Date:
Hi,

>> hi, I'm playing with Synchronous Replication on PostgreSQL 9.2.2.
>> And I saw a strange behavior.
>
> Unless you left out something, the configuration you described
> actually sets up asynchronous replication.
>

Thank you for the comment.
I was thinking to promote one of them and set
synchronous_standby_names = 'sby' and reload configure file.
Before that, I noticed the issue.


>> =================================================================
>> [issues]
>> two standbys are connected on each other, but
>> no master is there.
>> ...
>> =================================================================
>>
>> I did not see the situation like above on PostgreSQL 9.1.7.
>>
>> Is this intended change?
>
> In 9.1, this scenario was impossible because you could not connect a
> standby to another standby. In 9.2, that's allowed. It's a new feature
> called "cascading replication", see
> http://www.postgresql.org/docs/9.2/static/warm-standby.html#CASCADING-REPLICATION.
>
> With that feature, it's indeed possible to form a cycle of standby
> servers connected to each other. There was just a long discussion on
> pgsql-hackers on whether we should try to detect that scenario [1],
> but the consensus seems to be that we should not. It would be
> difficult to implement such detection, and sometimes it's useful to
> have such a cycle, as a transient state at a failover, for example.

Sorry, I had not read the discussion and have reported it again.
And I understand it's hard to detect cyclic situation.

>
> So the bottom line is that this is an intended change, and the admin
> will just have to avoid doing that.

OK.
This situation was made by playing, so it's not big problem for now.

>
> This makes me wonder if there should be a GUC to forbid cascading
> replication, though. If you don't want to do cascading replication
> (which is quite rare, I'd say), you could just disable it to avoid a
> situation like this.

I'm not sure but what about adding the parameter("cascade_mode") on
recovery.conf ?
The parameter represents a will to connect to standby server when
starting as standby.
If the parameter is set to on, connect to a server forcely like
PostgreSQL 9.2,
and if the parameter is set to off, connect to the another standby server is
refused like PostgreSQL 9.1.


>
> [1] http://archives.postgresql.org/pgsql-hackers/2012-12/msg01134.php
>
> - Heikki
>
>

regards,
--------
NTT Software Corporation
  Tomonari Katsumata

Re: BUG #7803: Replication Problem(no master is there)

From
Simon Riggs
Date:
On 11 January 2013 12:18, Tomonari Katsumata
<katsumata.tomonari@po.ntts.co.jp> wrote:

> I'm not sure but what about adding the parameter("cascade_mode") on
> recovery.conf ?
> The parameter represents a will to connect to standby server when starting
> as standby.
> If the parameter is set to on, connect to a server forcely like PostgreSQL
> 9.2,
> and if the parameter is set to off, connect to the another standby server is
> refused like PostgreSQL 9.1.

We added a REPLICATION privelge onto user accounts to control access.

Perhaps we should add a CASCADE privilege as well, so that we can
control whether we can connect to a master and/or a standby.

Syntax would be

ALTER USER foo
[MASTER | CASCADE] REPLICATION

REPLICATION allows both master and cascaded replication (same as now)
MASTER REPLICATION allows master only
CASCADE REPLICATION allows cascaded replication only
NOREPLICATION allows neither option

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #7803: Replication Problem(no master is there)

From
Tomonari Katsumata
Date:
Hi,

Sorry for late response.

(2013/01/11 23:11), Simon Riggs wrote:
 > On 11 January 2013 12:18, Tomonari Katsumata
 > <katsumata.tomonari@po.ntts.co.jp> wrote:
 >
 >> I'm not sure but what about adding the parameter("cascade_mode") on
 >> recovery.conf ?
 >> The parameter represents a will to connect to standby server when
starting
 >> as standby.
 >> If the parameter is set to on, connect to a server forcely like
PostgreSQL
 >> 9.2,
 >> and if the parameter is set to off, connect to the another standby
server is
 >> refused like PostgreSQL 9.1.
 >
 > We added a REPLICATION privelge onto user accounts to control access.
 >
 > Perhaps we should add a CASCADE privilege as well, so that we can
 > control whether we can connect to a master and/or a standby.
 >
 > Syntax would be
 >
 > ALTER USER foo
 > [MASTER | CASCADE] REPLICATION
 >
 > REPLICATION allows both master and cascaded replication (same as now)
 > MASTER REPLICATION allows master only
 > CASCADE REPLICATION allows cascaded replication only
 > NOREPLICATION allows neither option
 >

This idea seems better than mine.

Someone is working for it already ?
If not yet, may I try to implement it ?

regards,
--------
NTT Software Corporation
  Tomonari Katsumata

Re: BUG #7803: Replication Problem(no master is there)

From
Simon Riggs
Date:
On 15 January 2013 05:12, Tomonari Katsumata
<katsumata.tomonari@po.ntts.co.jp> wrote:

>> We added a REPLICATION privelge onto user accounts to control access.
>>
>> Perhaps we should add a CASCADE privilege as well, so that we can
>> control whether we can connect to a master and/or a standby.
>>
>> Syntax would be
>>
>> ALTER USER foo
>> [MASTER | CASCADE] REPLICATION
>>
>> REPLICATION allows both master and cascaded replication (same as now)
>> MASTER REPLICATION allows master only
>> CASCADE REPLICATION allows cascaded replication only
>> NOREPLICATION allows neither option
>>

> Someone is working for it already ?
> If not yet, may I try to implement it ?

Please do. It looks fairly short.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #7803: Replication Problem(no master is there)

From
Heikki Linnakangas
Date:
On 15.01.2013 10:14, Simon Riggs wrote:
> On 15 January 2013 05:12, Tomonari Katsumata
> <katsumata.tomonari@po.ntts.co.jp>  wrote:
>
>>> We added a REPLICATION privelge onto user accounts to control access.
>>>
>>> Perhaps we should add a CASCADE privilege as well, so that we can
>>> control whether we can connect to a master and/or a standby.
>>>
>>> Syntax would be
>>>
>>> ALTER USER foo
>>> [MASTER | CASCADE] REPLICATION
>>>
>>> REPLICATION allows both master and cascaded replication (same as now)
>>> MASTER REPLICATION allows master only
>>> CASCADE REPLICATION allows cascaded replication only
>>> NOREPLICATION allows neither option
>>>
>
>> Someone is working for it already ?
>> If not yet, may I try to implement it ?
>
> Please do. It looks fairly short.

To me, permissions doesn't feel like the right vehicle for controlling
this. Not sure what to suggest instead, a new GUC perhaps.

BTW, is there any reason to not allow streaming replication when
hot_standby=off? A streaming replication connection doesn't execute any
queries, so it doesn't need the system to be consistent.

Another thing to consider is that "pg_basebackup -X stream" also uses
streaming replication, so if you forbid cascade replication, you also
forbid using "pg_basebackup -X stream" on the standby. At the protocol
level, pg_basebackup streams the WAL just like a standby server does, so
we cannot distinguish those two cases in the server. The client could
tell the server which one it is, but using permissions to allow/forbid
based on that would make no sense as the client could lie which one it is.

- Heikki

Re: BUG #7803: Replication Problem(no master is there)

From
Fujii Masao
Date:
On Tue, Jan 15, 2013 at 5:25 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 15.01.2013 10:14, Simon Riggs wrote:
>>
>> On 15 January 2013 05:12, Tomonari Katsumata
>> <katsumata.tomonari@po.ntts.co.jp>  wrote:
>>
>>>> We added a REPLICATION privelge onto user accounts to control access.
>>>>
>>>> Perhaps we should add a CASCADE privilege as well, so that we can
>>>> control whether we can connect to a master and/or a standby.
>>>>
>>>> Syntax would be
>>>>
>>>> ALTER USER foo
>>>> [MASTER | CASCADE] REPLICATION
>>>>
>>>> REPLICATION allows both master and cascaded replication (same as now)
>>>> MASTER REPLICATION allows master only
>>>> CASCADE REPLICATION allows cascaded replication only
>>>> NOREPLICATION allows neither option
>>>>
>>
>>> Someone is working for it already ?
>>> If not yet, may I try to implement it ?
>>
>>
>> Please do. It looks fairly short.
>
>
> To me, permissions doesn't feel like the right vehicle for controlling this.
> Not sure what to suggest instead, a new GUC perhaps.
>
> BTW, is there any reason to not allow streaming replication when
> hot_standby=off? A streaming replication connection doesn't execute any
> queries, so it doesn't need the system to be consistent.

I was thinking that the system must be consistent since streaming replication
connection reads the system catalog (e.g., ROLE information).

> Another thing to consider is that "pg_basebackup -X stream" also uses
> streaming replication, so if you forbid cascade replication, you also forbid
> using "pg_basebackup -X stream" on the standby. At the protocol level,
> pg_basebackup streams the WAL just like a standby server does, so we cannot
> distinguish those two cases in the server. The client could tell the server
> which one it is, but using permissions to allow/forbid based on that would
> make no sense as the client could lie which one it is.

Probably I'm missing something, but the standby server only has to reject the
replication connection if cascade replication is disabled, whether it's from
another standby or pg_basebackup. ISTM there is no need to distinguish
those connections. No?

When "pg_basebackup -X stream" fails to establish the replication connection,
it only has to just fail or automatically switch to "pg_basebackup -X fetch".

Regards,

--
Fujii Masao

Re: BUG #7803: Replication Problem(no master is there)

From
Tomonari Katsumata
Date:
Hi,

(2013/01/16 4:03), Fujii Masao wrote:
 > On Tue, Jan 15, 2013 at 5:25 PM, Heikki Linnakangas
 > <hlinnakangas@vmware.com> wrote:
 >> On 15.01.2013 10:14, Simon Riggs wrote:
 >>>
 >>> On 15 January 2013 05:12, Tomonari Katsumata
 >>> <katsumata.tomonari@po.ntts.co.jp>  wrote:
 >>>
 >>>>> We added a REPLICATION privelge onto user accounts to control access.
 >>>>>
 >>>>> Perhaps we should add a CASCADE privilege as well, so that we can
 >>>>> control whether we can connect to a master and/or a standby.
 >>>>>
 >>>>> Syntax would be
 >>>>>
 >>>>> ALTER USER foo
 >>>>> [MASTER | CASCADE] REPLICATION
 >>>>>
 >>>>> REPLICATION allows both master and cascaded replication (same as now)
 >>>>> MASTER REPLICATION allows master only
 >>>>> CASCADE REPLICATION allows cascaded replication only
 >>>>> NOREPLICATION allows neither option
 >>>>>
 >>>
 >>>> Someone is working for it already ?
 >>>> If not yet, may I try to implement it ?
 >>>
 >>>
 >>> Please do. It looks fairly short.
 >>
 >>
 >> To me, permissions doesn't feel like the right vehicle for
controlling this.
 >> Not sure what to suggest instead, a new GUC perhaps.
 >>

If this is before releasing 9.2.0, it is not problem to have a new GUC.
But 9.2 has released already.

I'm thinking about this change is for 9.3, right ?

And I'm thinking about compatibility when version up too.
If we control this with permissions, I think it is easy to
upgrade from 9.1 and 9.2 to 9.3 using pg_upgrade/pg_dumpall.

Type of pg_authid.rolreplication is boolean.
---------------------------------------------------------------------
9.1 false(f=0)         true(t=1)
9.2 false(f=0)         true(t=1)
---------------------------------------------------------------------

If I add permissions for cascading replication,
it will become integer and represent each permission like this:
---------------------------------------------------------------------
9.3 noreplication(0) replication(1)  master-only(2)  cascade-only(3)
---------------------------------------------------------------------

If pg_upgrade/pg_dumpall handle like bellow, user would never mind about
difference between versions.

from 9.1 to 9.3.
   false(f=0) --> noreplication(0)
   true(t=1)  --> master-only(2)

from 9.2 to 9.3.
   false(f=0) --> noreplication(0)
   true(t=1)  --> replication(1)


 >> BTW, is there any reason to not allow streaming replication when
 >> hot_standby=off? A streaming replication connection doesn't execute any
 >> queries, so it doesn't need the system to be consistent.
 >
 > I was thinking that the system must be consistent since streaming
replication
 > connection reads the system catalog (e.g., ROLE information).
 >
And I think it's because replication connection is established
by same way with another backend connection.

 >> Another thing to consider is that "pg_basebackup -X stream" also uses
 >> streaming replication, so if you forbid cascade replication, you
also forbid
 >> using "pg_basebackup -X stream" on the standby. At the protocol level,
 >> pg_basebackup streams the WAL just like a standby server does, so we
cannot
 >> distinguish those two cases in the server. The client could tell the
server
 >> which one it is, but using permissions to allow/forbid based on that
would
 >> make no sense as the client could lie which one it is.
 >
 > Probably I'm missing something, but the standby server only has to
reject the
 > replication connection if cascade replication is disabled, whether
it's from
 > another standby or pg_basebackup. ISTM there is no need to distinguish
 > those connections. No?
 >
 > When "pg_basebackup -X stream" fails to establish the replication
connection,
 > it only has to just fail or automatically switch to "pg_basebackup -X
fetch".
 >
I think so too.
If user who does not have right permission executes "pg_basebackup -X
stream",
it would make fail.

regards,
--------
NTT Software Corporation
  Tomonari Katsumata