logical replication problems (10.4) - Mailing list pgsql-admin

From Achilleas Mantzios
Subject logical replication problems (10.4)
Date
Msg-id 3fe052bb-76b4-59b4-44f5-7de2cec23282@matrix.gatewaynet.com
Whole thread Raw
Responses Re: logical replication problems (10.4)  (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
List pgsql-admin
Hello,
yesterday we did the first effort to deploy our test database server (aimed at being used by developers and testers)
usinglogical replication, using a publication for selected tables (563 in total 
 
with some of them being pretty large with bytea atts).
Expected total DB size : about 1.2 TB
I started it (created the subscription) at about 2018-08-02 11:35:00. By 2018-08-02 22:02:43.778 I got the last logical
replication-relatedmessage on the subscriber side:
 
2018-08-02 22:02:43.778 EEST  @ line:1 LOG:  logical replication table synchronization worker for subscription
"data_for_testsmadb_pub",table "moc_final_persons" has started
 

On the publisher side I got :
2018-08-02 22:06:07.726 EEST data_for_testsmadb_pub repmgr@dynacom line:7 LOG:  terminating walsender process due to
replicationtimeout
 
2018-08-02 22:06:07.726 EEST data_for_testsmadb_pub repmgr@dynacom line:8 LOG:  disconnection: session time:
0:04:34.968user=repmgr database=dynacom host=10.9.0.77 port=47820
 
and later
2018-08-02 22:10:15.153 EEST data_for_testsmadb_pub_33650_sync_22294 repmgr@dynacom line:5 LOG:  duration: 25428550.886
ms statement: COPY public.marinerpapers_atts TO STDOUT
 

The nightly barman differential backup took more than twice the time and space than usual, totaling to some 500GB
(whereasusually it is about 200 GB).
 

In the morning I saw no activity. pg_stat_subscription showed last send / receipt time yesterday night (unfortunately I
didn'tkeep this info).
 
Then I reacted by restarting the subscriber database and this seemed to "wake up" the logical replication.
Still this is in progress syncing a large table : marinerpapers_atts (about 400GB) (hopefully the last?).

I don't know if all this is going according to the intended/documented behavior.

Currently I have 2 wal sender / logical replication worker pairs :

one which is doing the streaming and from the looks of it it is pretty close to the LSNs of your upstream physical
standbyand
 
one which still tries to sync this very large table : marinerpapers_atts .

While I was at it I did some tests on a similar table named : marinerpapers_atts_tmp , and here is what I get to my
surprise:
 

Publisher :
select count(*) from marinerpapers_atts_tmp;
  count
--------
  673391
(1 row)

Subscriber :
select count(*) from marinerpapers_atts_tmp;
  count
--------
  673377
(1 row)

There is a 14 difference in row count, which seems to be consistent, if retry those counts in a while I'll get again 14
differencein row count. I diffed two times and found out the 14 missing rows. I 
 
got no errors regarding this table, neither on the publisher nor on the subscriber :

Publisher : egrep -e 'data_for.*22304' data/log/postgresql-2018-08-02.log
10.9.0.77(47819) [113746] 5b62f9e9.1bc52 2018-08-02 15:32:41.181 EEST data_for_testsmadb_pub_33650_sync_22304
repmgr@dynacomline:3 LOG:  logical decoding found consistent point at FE5/53827A50
 
10.9.0.77(47819) [113746] 5b62f9e9.1bc52 2018-08-02 15:32:41.181 EEST data_for_testsmadb_pub_33650_sync_22304
repmgr@dynacomline:4 DETAIL:  There are no running transactions.
 
10.9.0.77(47819) [113746] 5b62f9e9.1bc52 2018-08-02 22:01:27.502 EEST data_for_testsmadb_pub_33650_sync_22304
repmgr@dynacomline:5 LOG:  duration: 23326314.921 ms  statement: COPY 
 
public.marinerpapers_atts_tmp TO STDOUT
10.9.0.77(47819) [113746] 5b62f9e9.1bc52 2018-08-02 22:01:27.669 EEST data_for_testsmadb_pub_33650_sync_22304
repmgr@dynacomline:6 LOG:  disconnection: session time: 6:28:46.565 user=repmgr 
 
database=dynacom host=10.9.0.77 port=47819

Subscriber : egrep -e 'marinerpapers_atts_tmp' data/log/postgresql-02.log
  [34052] 5b62f9e9.8504 2018-08-02 15:32:41.043 EEST  @ line:1 LOG:  logical replication table synchronization worker
forsubscription "data_for_testsmadb_pub", table "marinerpapers_atts_tmp" has started
 
  [34052] 5b62f9e9.8504 2018-08-02 22:01:27.663 EEST  @ line:2 LOG:  logical replication table synchronization worker
forsubscription "data_for_testsmadb_pub", table "marinerpapers_atts_tmp" has finished
 


No error at all, but I got those 14 rows missing .

-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



pgsql-admin by date:

Previous
From: Fabio Pardi
Date:
Subject: Re: Optimal SSD based RAID 0 config
Next
From: "Anjul Tyagi"
Date:
Subject: Linux Kernel setting