Thread: bdr appears to be trying to replicate to itself
I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then. ``` HOST A: postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn -----------------------------------------+--------+-----------+--------+-----------+--------+------+--------------+------------- bdr_16385_6188730679935789649_1_16385__ | bdr | logical | 16385 | database_name | f | | 28174997 | 25/7677F300 bdr_16385_6188733518371128845_2_16385__ | bdr | logical | 16385 | database_name | t | | 38613316 | 41/7B7EDF80 select bdr.bdr_get_local_nodeid(); bdr_get_local_nodeid ------------------------------- (6188730679935789649,1,16385) SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots WHERE plugin = 'bdr'; slot_name | database | active | retained_bytes -----------------------------------------+-----------+--------+---------------- bdr_16385_6188730679935789649_1_16385__ | database_name | f | 120353015152 bdr_16385_6188733518371128845_2_16385__ | database_name | t | 2288 (2 rows) ``` HOST B: ``` select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn -----------------------------------------+--------+-----------+--------+-----------+--------+------+--------------+------------- bdr_16385_6188730679935789649_1_16385__ | bdr | logical | 16385 | database_name | t | | 3499719 | 3/B53F00A8 select bdr.bdr_get_local_nodeid(); bdr_get_local_nodeid ------------------------------- (6188733518371128845,2,16385) SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots WHERE plugin = 'bdr'; slot_name | database | active | retained_bytes -----------------------------------------+-----------+--------+---------------- bdr_16385_6188730679935789649_1_16385__ | database_name | t | 68736 ``` So it almost looks like HOST A is trying to replicate to itself since the replication_slots has the same node_id of itselfand it's just filling up the pg_xlog. This server has been setup since mid-may and I haven't added any new nodes but I did upgrade just this past weekend beforeI noticed there was a problem: dpkg -l | grep '^ii' | grep postgre ii pgdg-keyring 2014.1 all keyring for apt.postgresql.org ii postgresql-bdr-9.4 9.4.5-1trusty amd64 object-relational SQL database,version 9.4 server ii postgresql-bdr-9.4-bdr-plugin 0.9.3-1trusty amd64 BDR Plugin for PostgreSQL-BDR 9.4 ii postgresql-bdr-client-9.4 9.4.5-1trusty amd64 front-end programs for PostgreSQL-BDR9.4 ii postgresql-bdr-contrib-9.4 9.4.5-1trusty amd64 additional facilities for PostgreSQL ii postgresql-client-common 170.pgdg14.04+1 all manager for multiple PostgreSQLclient versions ii postgresql-common 170.pgdg14.04+1 all PostgreSQL database-cluster manager ii postgresql-contrib 9.4+170.pgdg14.04+1 all additional facilities for PostgreSQL(supported version) Before I was on postgresql-bdr-client-9.4 9.4.4-1trusty Can anyone help me fix this? I'm running out of HD space. Thanks!
Cj B wrote: > I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then. For reference, this was also asked on github, and answered there. See https://github.com/2ndQuadrant/bdr/issues/143 -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Yes, I posted on github because I wasn’t sure where to post. And the reason I’m posting here is because I’m not clear aboutthe answer "Drop the slot bdr_16385_6188730679935789649_1_16385__ on the first host.” Do this just mean to select pg_drop_replication_slot(‘bdr_16385_6188730679935789649_1_16385__’) What impact will this have? Thanks Cj B > On Nov 16, 2015, at 8:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Cj B wrote: >> I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then. > > For reference, this was also asked on github, and answered there. > See https://github.com/2ndQuadrant/bdr/issues/143 > > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 November 2015 at 00:33, Cj B <blackc2004@gmail.com> wrote:
select pg_drop_replication_slot(‘bdr_16385_6188730679935789649_1_16385__’)
Correct.
What impact will this have?
This doesn't explain how the system got into this state. For that it'd really be necessary to see the steps taken during setup. BDR tries to protect against attempts to replicate-from-self. Presumably there's an oversight in those checks. If you're able to reproduce this state I'd like to hear details on how.
On 17 November 2015 at 00:33, Cj B <blackc2004@gmail.com> wrote:This doesn't explain how the system got into this state. For that it'd really be necessary to see the steps taken during setup. BDR tries to protect against attempts to replicate-from-self. Presumably there's an oversight in those checks. If you're able to reproduce this state I'd like to hear details on how.
Thanks for the help that seems to have done it. I’m not sure how it got into this state either. I do have the commands still that I ran but I doubt that’ll help much. What’s strange is that everything was working fine since May then on Oct 29th it appears to have started keeping copies of the pg_xlog, so something must have happened, but sadly I don’t know what.
I’ll keep an eye on it to see if it happens again.