Re: Subscription stuck at initialize state - Mailing list pgsql-general

From Vijaykumar Jain
Subject Re: Subscription stuck at initialize state
Date
Msg-id CAM+6J962X11n9Pic_3izg5GsUtvqAdUQAVCebpbAQBuHvB+fwQ@mail.gmail.com
Whole thread Raw
In response to Re: Subscription stuck at initialize state  (Abhishek Bhola <abhishek.bhola@japannext.co.jp>)
Responses Re: Subscription stuck at initialize state
List pgsql-general
trimming the email, to avoid noise.

I spent a lot of time trying multiple options/combinations and finally managed to replicate your debug output.
this is when i have , 

postgres=# show max_logical_replication_workers;
 max_logical_replication_workers
---------------------------------
 2
(1 row)


on publisher
postgres@controller:~$ psql -p 5001
psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
Type "help" for help.

postgres=# create table t_bytea(id int primary key, somebytea bytea);
CREATE TABLE
postgres=# create table t1(id int);
CREATE TABLE
postgres=# create table t2(id int);
CREATE TABLE
postgres=# create table t3(id int);
CREATE TABLE
postgres=# create publication mypub1 for table t1;
CREATE PUBLICATION
postgres=# create publication mypub2 for table t2;
CREATE PUBLICATION
postgres=# create publication mypub3 for table t3;
CREATE PUBLICATION
postgres=# create publication mypub4 for table t3;
CREATE PUBLICATION
postgres=# create publication mypub5 for table t_bytea;
CREATE PUBLICATION
postgres=# insert into t_bytea  select x,repeat(repeat('xxx', 100), 1000)::bytea from generate_series(1, 1000) x;
INSERT 0 1000



on subscriber
postgres@controller:~$ psql -p 5002
psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
Type "help" for help.

postgres=# create table t_bytea(id int primary key, somebytea bytea);
CREATE TABLE
postgres=#  create table t1(id int);
CREATE TABLE
postgres=#  create table t2(id int);
CREATE TABLE
postgres=#  create table t3(id int);
CREATE TABLE
postgres=#  create table t4(id int);
CREATE TABLE
postgres=#  create table t5(id int);
CREATE TABLE
postgres=# create subscription mysub1 connection 'port=5001' publication mypub1;
NOTICE:  created replication slot "mysub1" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub2 connection 'port=5001' publication mypub2;
NOTICE:  created replication slot "mysub2" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub3 connection 'port=5001' publication mypub3;
NOTICE:  created replication slot "mysub3" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub4 connection 'port=5001' publication mypub4;
NOTICE:  created replication slot "mysub4" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub5 connection 'port=5001' publication mypub5;
NOTICE:  created replication slot "mysub5" on publisher
CREATE SUBSCRIPTION
postgres=# select count(1) from t_bytea;
 count
-------
     0
(1 row)

postgres=# table pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+-----------
   16406 |   16391 | r          | 0/1722838
   16407 |   16394 | i          |
   16408 |   16397 | i          |
   16409 |   16397 | i          |
   16410 |   16384 | i          |
(5 rows)

# as expected no data in t_bytea (as it could not get any worker)
postgres=# select count(1) from t_bytea;
 count
-------
     0
(1 row)



but logs clearly state the problem
2022-02-03 23:18:31.107 IST [3430] LOG:  logical replication table synchronization worker for subscription "mysub1", table "t1" has started
2022-02-03 23:18:31.138 IST [3430] LOG:  logical replication table synchronization worker for subscription "mysub1", table "t1" has finished
2022-02-03 23:18:40.730 IST [3433] LOG:  logical replication apply worker for subscription "mysub2" has started
2022-02-03 23:18:40.737 IST [3433] WARNING:  out of logical replication worker slots
2022-02-03 23:18:40.737 IST [3433] HINT:  You might need to increase max_logical_replication_workers.
2022-02-03 23:18:45.865 IST [3433] WARNING:  out of logical replication worker slots


#publisher logs
2022-02-03 23:18:31.096 IST [3427] STATEMENT:  CREATE_REPLICATION_SLOT "mysub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:31.106 IST [3429] LOG:  starting logical decoding for slot "mysub1"
2022-02-03 23:18:31.106 IST [3429] DETAIL:  Streaming transactions committing after 0/1722800, reading WAL from 0/17227C8.
2022-02-03 23:18:31.106 IST [3429] STATEMENT:  START_REPLICATION SLOT "mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"')
2022-02-03 23:18:31.106 IST [3429] LOG:  logical decoding found consistent point at 0/17227C8
2022-02-03 23:18:31.106 IST [3429] DETAIL:  There are no running transactions.
2022-02-03 23:18:31.106 IST [3429] STATEMENT:  START_REPLICATION SLOT "mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"')
2022-02-03 23:18:31.129 IST [3431] LOG:  logical decoding found consistent point at 0/1722800
2022-02-03 23:18:31.129 IST [3431] DETAIL:  There are no running transactions.
2022-02-03 23:18:31.129 IST [3431] STATEMENT:  CREATE_REPLICATION_SLOT "pg_16406_sync_16391_7060540926182153512" LOGICAL pgoutput USE_SNAPSHOT
2022-02-03 23:18:31.135 IST [3431] LOG:  starting logical decoding for slot "pg_16406_sync_16391_7060540926182153512"
2022-02-03 23:18:31.135 IST [3431] DETAIL:  Streaming transactions committing after 0/1722838, reading WAL from 0/1722800.
2022-02-03 23:18:31.135 IST [3431] STATEMENT:  START_REPLICATION SLOT "pg_16406_sync_16391_7060540926182153512" LOGICAL 0/1722838 (proto_version '2', publication_names '"mypub1"')
2022-02-03 23:18:35.718 IST [3432] LOG:  logical decoding found consistent point at 0/1722838
2022-02-03 23:18:35.718 IST [3432] DETAIL:  There are no running transactions.
2022-02-03 23:18:35.718 IST [3432] STATEMENT:  CREATE_REPLICATION_SLOT "mysub2" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:40.737 IST [3434] LOG:  starting logical decoding for slot "mysub2"
2022-02-03 23:18:40.737 IST [3434] DETAIL:  Streaming transactions committing after 0/1722870, reading WAL from 0/1722838.
2022-02-03 23:18:40.737 IST [3434] STATEMENT:  START_REPLICATION SLOT "mysub2" LOGICAL 0/0 (proto_version '2', publication_names '"mypub2"')
2022-02-03 23:18:40.737 IST [3434] LOG:  logical decoding found consistent point at 0/1722838
2022-02-03 23:18:40.737 IST [3434] DETAIL:  There are no running transactions.
2022-02-03 23:18:40.737 IST [3434] STATEMENT:  START_REPLICATION SLOT "mysub2" LOGICAL 0/0 (proto_version '2', publication_names '"mypub2"')
2022-02-03 23:18:40.857 IST [3435] LOG:  logical decoding found consistent point at 0/1722870
2022-02-03 23:18:40.857 IST [3435] DETAIL:  There are no running transactions.
2022-02-03 23:18:40.857 IST [3435] STATEMENT:  CREATE_REPLICATION_SLOT "mysub3" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:46.091 IST [3437] LOG:  logical decoding found consistent point at 0/17228A8
2022-02-03 23:18:46.091 IST [3437] DETAIL:  There are no running transactions.
2022-02-03 23:18:46.091 IST [3437] STATEMENT:  CREATE_REPLICATION_SLOT "mysub4" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:51.941 IST [3439] LOG:  logical decoding found consistent point at 0/17228E0
2022-02-03 23:18:51.941 IST [3439] DETAIL:  There are no running transactions.
2022-02-03 23:18:51.941 IST [3439] STATEMENT:  CREATE_REPLICATION_SLOT "mysub5" LOGICAL pgoutput NOEXPORT_SNAPSHOT



# solution
so i bump the  max_logical_replication_workers = 10 and restart the db servers.
as there are workers available, replication catches up and tables in sync.

# on subscriber
psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
Type "help" for help.

postgres=# select count(1) from t_bytea;
 count
-------
     0
(1 row)

postgres=# table pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+-----------
   16406 |   16391 | r          | 0/1722838
   16407 |   16394 | r          | 0/2000110
   16408 |   16397 | r          | 0/2000148
   16409 |   16397 | r          | 0/2000148
   16410 |   16384 | r          | 0/20001B8
(5 rows)

postgres=# select count(1) from t_bytea;
 count
-------
  1000
(1 row)


so two things,
can you check logs for warning messages  of out of  logical replication worker slots on subscribers ?
can you try bumping them, and check if it catches up (srsubstate should either be r(READY) or d (INITIAL COPY))
 
also monitor logs for both publisher and subscriber.

I also figured out all the debugging steps I requested earlier were useless. 



--

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Undetected Deadlock
Next
From: Matthias Apitz
Date:
Subject: Re: sort order for UTF-8 char column with Japanese UTF-8