Thread: bdr appears to be trying to replicate to itself

bdr appears to be trying to replicate to itself

From
Cj B
Date:
I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then.

```
HOST A:
postgres=# select * from pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database  | active | xmin | catalog_xmin |
restart_lsn

-----------------------------------------+--------+-----------+--------+-----------+--------+------+--------------+-------------
 bdr_16385_6188730679935789649_1_16385__ | bdr    | logical   |  16385 | database_name | f      |      |     28174997 |
25/7677F300
 bdr_16385_6188733518371128845_2_16385__ | bdr    | logical   |  16385 | database_name | t      |      |     38613316 |
41/7B7EDF80

select bdr.bdr_get_local_nodeid();
     bdr_get_local_nodeid
-------------------------------
 (6188730679935789649,1,16385)

SELECT     slot_name, database, active,     pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS
retained_bytes FROM pg_replication_slots WHERE plugin = 'bdr'; 
                slot_name                | database  | active | retained_bytes
-----------------------------------------+-----------+--------+----------------
 bdr_16385_6188730679935789649_1_16385__ | database_name | f      |   120353015152
 bdr_16385_6188733518371128845_2_16385__ | database_name | t      |           2288
(2 rows)
```

HOST B:

```
select * from pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database  | active | xmin | catalog_xmin |
restart_lsn

-----------------------------------------+--------+-----------+--------+-----------+--------+------+--------------+-------------
 bdr_16385_6188730679935789649_1_16385__ | bdr    | logical   |  16385 | database_name | t      |      |      3499719 |
3/B53F00A8

select bdr.bdr_get_local_nodeid();
     bdr_get_local_nodeid
-------------------------------
 (6188733518371128845,2,16385)

SELECT     slot_name, database, active,     pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS
retained_bytes FROM pg_replication_slots WHERE plugin = 'bdr'; 
                slot_name                | database  | active | retained_bytes
-----------------------------------------+-----------+--------+----------------
 bdr_16385_6188730679935789649_1_16385__ | database_name | t      |          68736
```

So it almost looks like HOST A is trying to replicate to itself since the replication_slots has the same node_id of
itselfand it's just filling up the pg_xlog. 

This server has been setup since mid-may and I haven't added any new nodes but I did upgrade just this past weekend
beforeI noticed there was a problem: 
dpkg -l | grep '^ii' | grep postgre
ii  pgdg-keyring                        2014.1                              all          keyring for apt.postgresql.org
ii  postgresql-bdr-9.4                  9.4.5-1trusty                       amd64        object-relational SQL
database,version 9.4 server 
ii  postgresql-bdr-9.4-bdr-plugin       0.9.3-1trusty                       amd64        BDR Plugin for PostgreSQL-BDR
9.4
ii  postgresql-bdr-client-9.4           9.4.5-1trusty                       amd64        front-end programs for
PostgreSQL-BDR9.4 
ii  postgresql-bdr-contrib-9.4          9.4.5-1trusty                       amd64        additional facilities for
PostgreSQL
ii  postgresql-client-common            170.pgdg14.04+1                     all          manager for multiple
PostgreSQLclient versions 
ii  postgresql-common                   170.pgdg14.04+1                     all          PostgreSQL database-cluster
manager
ii  postgresql-contrib                  9.4+170.pgdg14.04+1                 all          additional facilities for
PostgreSQL(supported version) 

Before I was on postgresql-bdr-client-9.4           9.4.4-1trusty

Can anyone help me fix this? I'm running out of HD space.

Thanks!

Re: bdr appears to be trying to replicate to itself

From
Alvaro Herrera
Date:
Cj B wrote:
> I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then.

For reference, this was also asked on github, and answered there.
See https://github.com/2ndQuadrant/bdr/issues/143


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: bdr appears to be trying to replicate to itself

From
Cj B
Date:
Hi,

Yes, I posted on github because I wasn’t sure where to post. And the reason I’m posting here is because I’m not clear
aboutthe answer "Drop the slot bdr_16385_6188730679935789649_1_16385__ on the first host.” 

Do this just mean to
select pg_drop_replication_slot(‘bdr_16385_6188730679935789649_1_16385__’)

What impact will this have?

Thanks
Cj B

> On Nov 16, 2015, at 8:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Cj B wrote:
>> I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then.
>
> For reference, this was also asked on github, and answered there.
> See https://github.com/2ndQuadrant/bdr/issues/143
>
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: bdr appears to be trying to replicate to itself

From
Craig Ringer
Date:
On 17 November 2015 at 00:33, Cj B <blackc2004@gmail.com> wrote:
 
select pg_drop_replication_slot(‘bdr_16385_6188730679935789649_1_16385__’)

Correct.
 
What impact will this have?

If the slot is unused, it'll allow the WAL that's being held by the slot to be removed. It'll also unpin the catalog xmin to allow autovacuum to clean up dead tuples in the catalogs.

This doesn't explain how the system got into this state. For that it'd really be necessary to see the steps taken during setup. BDR tries to protect against attempts to replicate-from-self. Presumably there's an oversight in those checks. If you're able to reproduce this state I'd like to hear details on how.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: bdr appears to be trying to replicate to itself

From
Cj B
Date:
On 17 November 2015 at 00:33, Cj B <blackc2004@gmail.com> wrote:

This doesn't explain how the system got into this state. For that it'd really be necessary to see the steps taken during setup. BDR tries to protect against attempts to replicate-from-self. Presumably there's an oversight in those checks. If you're able to reproduce this state I'd like to hear details on how.

Thanks for the help that seems to have done it. I’m not sure how it got into this state either. I do have the commands still that I ran but I doubt that’ll help much. What’s strange is that everything was working fine since May then on Oct 29th it appears to have started keeping copies of the pg_xlog, so something must have happened, but sadly I don’t know what. 

I’ll keep an eye on it to see if it happens again.