BDR not catching up - Mailing list pgsql-general

From cchee-ob
Subject BDR not catching up
Date
Msg-id 1457739190408-5892335.post@n5.nabble.com
Whole thread Raw
List pgsql-general
I'm getting this message repeating on the UDR node that I just added today.
Any way to get it start applying?
svp2=# select * from bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status |
node_name    |            node_local_dsn             |
node_init_from_dsn


---------------------+---------------+------------+-------------+-----------------+---------------------------------------+------------------------------------
-------
 6206439726032130602 |             1 |      16385 | r           | UDR1
|                                       |
 6260914790689848233 |             1 |      16385 | c           |
UDR1-subscriber | host=10.253.0.8 port=5432 dbname=svp2 |
host=10.253.228.105 port=5432 dbnam
e=svp2
(2 rows)



t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG:  00000: per-db worker for
node bdr (6260914790689848233,1,16385,) starting
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION:  bdr_perdb_worker_main,
bdr_perdb.c:707
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG:  00000: init_replica init
from remote host=10.253.228.105 port=5432 dbname=svp2
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION:  bdr_init_replica,
bdr_init_replica.c:830
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG:  00000: found valid
replication identifier 1
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG:  00000: launching catchup
mode apply worker
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION:  bdr_init_replica,
bdr_init_replica.c:1043
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG:  00000: Registering bdr
apply catchup worker for bdr (6206439726032130602,1,16385,) to lsn
19E/10AC4F0
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION:  bdr_catchup_to_lsn,
bdr_init_replica.c:1161
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG:  00000: registering background
worker "bdr: catchup apply to 19E/10AC4F0"
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION:
BackgroundWorkerStateChange, bgworker.c:347
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG:  00000: starting background
worker process "bdr: catchup apply to 19E/10AC4F0"
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION:  do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:23:51 PST d= h= p=7227 a=NOTICE:  00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:51 PST d= h= p=7227 a=NOTICE:  00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:51 PST d= h= p=7227 a=NOTICE:  42622: identifier "bdr
(6260914790689848233,1,16385,): apply catchup up to 19E/10AC4F0" will be
truncated to "bdr (6260914790689848233,1,16385,): apply catchup up to
19E/10A"
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:  truncate_identifier,
scansup.c:195
t=2016-03-11 15:23:51 PST d= h= p=7227 a=DEBUG:  00000: found valid
replication identifier 1
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:51 PST d= h= p=7227 a=INFO:  00000: starting up
replication from 1 at 19D/D204D0C8
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:  bdr_apply_main,
bdr_apply.c:2550
t=2016-03-11 15:23:51 PST d= h= p=7227 a=DEBUG:  00000: bdr_apply: BEGIN
origin(source, orig_lsn, timestamp): 19D/D204D3A0, 2016-03-11
13:49:47.293208-08
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:  process_remote_begin,
bdr_apply.c:198
t=2016-03-11 15:23:51 PST d= h= p=7227 a=ERROR:  XX000: tuple natts
mismatch, 26 vs 28
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:  read_tuple_parts,
bdr_apply.c:1892
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG:  00000: worker process: bdr:
catchup apply to 19E/10AC4F0 (PID 7227) exited with exit code 1
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION:  LogChildExit,
postmaster.c:3325
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG:  00000: unregistering
background worker "bdr: catchup apply to 19E/10AC4F0"
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION:  ForgetBackgroundWorker,
bgworker.c:376
t=2016-03-11 15:23:52 PST d= h= p=7226 a=ERROR:  XX000: catchup worker
exited before catching up to target LSN 19E/10AC4F0
t=2016-03-11 15:23:52 PST d= h= p=7226 a=LOCATION:  bdr_catchup_to_lsn,
bdr_init_replica.c:1273
t=2016-03-11 15:23:52 PST d= h= p=4718 a=LOG:  00000: worker process: bdr
db: svp2 (PID 7226) exited with exit code 1
t=2016-03-11 15:23:52 PST d= h= p=4718 a=LOCATION:  LogChildExit,
postmaster.c:3325
t=2016-03-11 15:23:54 PST d= h= p=7228 a=DEBUG:  00000: autovacuum:
processing database "bdr_supervisordb"
t=2016-03-11 15:23:54 PST d= h= p=7228 a=LOCATION:  AutoVacWorkerMain,
autovacuum.c:1684
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG:  00000: starting background
worker process "bdr db: svp2"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION:  do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:23:57 PST d= h= p=7229 a=NOTICE:  00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7229 a=NOTICE:  00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG:  00000: per-db worker for
node bdr (6260914790689848233,1,16385,) starting
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:  bdr_perdb_worker_main,
bdr_perdb.c:707
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG:  00000: init_replica init
from remote host=10.253.228.105 port=5432 dbname=svp2
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:  bdr_init_replica,
bdr_init_replica.c:830
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG:  00000: found valid
replication identifier 1
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG:  00000: launching catchup
mode apply worker
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:  bdr_init_replica,
bdr_init_replica.c:1043
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG:  00000: Registering bdr
apply catchup worker for bdr (6206439726032130602,1,16385,) to lsn
19E/10BA488
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:  bdr_catchup_to_lsn,
bdr_init_replica.c:1161
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG:  00000: registering background
worker "bdr: catchup apply to 19E/10BA488"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION:
BackgroundWorkerStateChange, bgworker.c:347
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG:  00000: starting background
worker process "bdr: catchup apply to 19E/10BA488"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION:  do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:23:57 PST d= h= p=7230 a=NOTICE:  00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7230 a=NOTICE:  00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7230 a=NOTICE:  42622: identifier "bdr
(6260914790689848233,1,16385,): apply catchup up to 19E/10BA488" will be
truncated to "bdr (6260914790689848233,1,16385,): apply catchup up to
19E/10B"
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:  truncate_identifier,
scansup.c:195
t=2016-03-11 15:23:57 PST d= h= p=7230 a=DEBUG:  00000: found valid
replication identifier 1
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:57 PST d= h= p=7230 a=INFO:  00000: starting up
replication from 1 at 19D/D204D0C8
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:  bdr_apply_main,
bdr_apply.c:2550
t=2016-03-11 15:23:57 PST d= h= p=7230 a=DEBUG:  00000: bdr_apply: BEGIN
origin(source, orig_lsn, timestamp): 19D/D204D3A0, 2016-03-11
13:49:47.293208-08
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:  process_remote_begin,
bdr_apply.c:198
t=2016-03-11 15:23:57 PST d= h= p=7230 a=ERROR:  XX000: tuple natts
mismatch, 26 vs 28
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:  read_tuple_parts,
bdr_apply.c:1892
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG:  00000: worker process: bdr:
catchup apply to 19E/10BA488 (PID 7230) exited with exit code 1
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION:  LogChildExit,
postmaster.c:3325
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG:  00000: unregistering
background worker "bdr: catchup apply to 19E/10BA488"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION:  ForgetBackgroundWorker,
bgworker.c:376
t=2016-03-11 15:23:58 PST d= h= p=7229 a=ERROR:  XX000: catchup worker
exited before catching up to target LSN 19E/10BA488
t=2016-03-11 15:23:58 PST d= h= p=7229 a=LOCATION:  bdr_catchup_to_lsn,
bdr_init_replica.c:1273
t=2016-03-11 15:23:58 PST d= h= p=4718 a=LOG:  00000: worker process: bdr
db: svp2 (PID 7229) exited with exit code 1
t=2016-03-11 15:23:58 PST d= h= p=4718 a=LOCATION:  LogChildExit,
postmaster.c:3325
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG:  00000: starting background
worker process "bdr db: svp2"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION:  do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:24:03 PST d= h= p=7231 a=NOTICE:  00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7231 a=NOTICE:  00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG:  00000: per-db worker for
node bdr (6260914790689848233,1,16385,) starting
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:  bdr_perdb_worker_main,
bdr_perdb.c:707
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG:  00000: init_replica init
from remote host=10.253.228.105 port=5432 dbname=svp2
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:  bdr_init_replica,
bdr_init_replica.c:830
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG:  00000: found valid
replication identifier 1
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG:  00000: launching catchup
mode apply worker
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:  bdr_init_replica,
bdr_init_replica.c:1043
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG:  00000: Registering bdr
apply catchup worker for bdr (6206439726032130602,1,16385,) to lsn
19E/10E9D58
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:  bdr_catchup_to_lsn,
bdr_init_replica.c:1161
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG:  00000: registering background
worker "bdr: catchup apply to 19E/10E9D58"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION:
BackgroundWorkerStateChange, bgworker.c:347
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG:  00000: starting background
worker process "bdr: catchup apply to 19E/10E9D58"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION:  do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:24:03 PST d= h= p=7232 a=NOTICE:  00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7232 a=NOTICE:  00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:  ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7232 a=NOTICE:  42622: identifier "bdr
(6260914790689848233,1,16385,): apply catchup up to 19E/10E9D58" will be
truncated to "bdr (6260914790689848233,1,16385,): apply catchup up to
19E/10E"
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:  truncate_identifier,
scansup.c:195
t=2016-03-11 15:24:03 PST d= h= p=7232 a=DEBUG:  00000: found valid
replication identifier 1
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:24:03 PST d= h= p=7232 a=INFO:  00000: starting up
replication from 1 at 19D/D204D0C8
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:  bdr_apply_main,
bdr_apply.c:2550
t=2016-03-11 15:24:03 PST d= h= p=7232 a=DEBUG:  00000: bdr_apply: BEGIN
origin(source, orig_lsn, timestamp): 19D/D204D3A0, 2016-03-11
13:49:47.293208-08
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:  process_remote_begin,
bdr_apply.c:198
t=2016-03-11 15:24:03 PST d= h= p=7232 a=ERROR:  XX000: tuple natts
mismatch, 26 vs 28
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:  read_tuple_parts,
bdr_apply.c:1892
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG:  00000: worker process: bdr:
catchup apply to 19E/10E9D58 (PID 7232) exited with exit code 1
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION:  LogChildExit,
postmaster.c:3325
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG:  00000: unregistering
background worker "bdr: catchup apply to 19E/10E9D58"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION:  ForgetBackgroundWorker,
bgworker.c:376
t=2016-03-11 15:24:04 PST d= h= p=7231 a=ERROR:  XX000: catchup worker
exited before catching up to target LSN 19E/10E9D58
t=2016-03-11 15:24:04 PST d= h= p=7231 a=LOCATION:  bdr_catchup_to_lsn,
bdr_init_replica.c:1273
t=2016-03-11 15:24:04 PST d= h= p=4718 a=LOG:  00000: worker process: bdr
db: svp2 (PID 7231) exited with exit code 1
t=2016-03-11 15:24:04 PST d= h= p=4718 a=LOCATION:  LogChildExit,
postmaster.c:3325

This is from the primary node:

svp2=# SELECT
      slot_name, database, active,
      pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn)
AS retained_bytes
    FROM pg_replication_slots
    WHERE plugin = 'bdr';
                slot_name                | database | active |
retained_bytes
-----------------------------------------+----------+--------+----------------
 bdr_16385_6260914790689848233_1_16385__ | svp2     | f      |
687816472
(1 row)


And this same scenario happens every time I try to add a new node.

Thank you,

Carter




--
View this message in context: http://postgresql.nabble.com/BDR-not-catching-up-tp5892335.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


pgsql-general by date:

Previous
From: Elein
Date:
Subject: Re: enum bug
Next
From: "David G. Johnston"
Date:
Subject: Re: enum bug