Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: speed up a logical replica setup
Date
Msg-id bde6ac67-69cc-c104-5ab6-dd4f5deadf24@gmail.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: speed up a logical replica setup
Re: speed up a logical replica setup
List pgsql-hackers
Hello Amit and Kuroda-san,

03.07.2024 14:02, Amit Kapila wrote:
> Pushed 0002 and 0003. Let's wait for a discussion on 0001.

Please look at another failure of the test [1]:
[13:28:05.647](2.460s) not ok 26 - failover slot is synced
[13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
#   at /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl line 307.
[13:28:05.648](0.000s) #          got: ''
#     expected: 'failover_slot'

with 040_pg_createsubscriber_node_s.log containing:
2024-07-08 13:28:05.369 UTC [3985464][client backend][0/2:0] LOG: statement: SELECT pg_sync_replication_slots()
2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] LOG: could not sync slot "failover_slot" as remote slot 
precedes local slot
2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] DETAIL:  Remote slot has LSN 0/30047B8 and catalog xmin 
743, but local slot has LSN 0/30047B8 and catalog xmin 744.

I could not reproduce it locally, but I've discovered that that subtest
somehow depends on pg_createsubscriber executed for the
'primary contains unmet conditions on node P' check. For example with this
test modification:
@@ -249,7 +249,7 @@ command_fails(
          $node_p->connstr($db1), '--socket-directory',
          $node_s->host, '--subscriber-port',
          $node_s->port, '--database',
-        $db1, '--database',
+        'XXX', '--database',
          $db2
      ],
      'primary contains unmet conditions on node P');

I see the same failure:
2024-07-09 10:19:43.284 UTC [938890] 040_pg_createsubscriber.pl LOG:  statement: SELECT pg_sync_replication_slots()
2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl LOG:  could not sync slot "failover_slot" as remote
slot
 
precedes local slot
2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl DETAIL:  Remote slot has LSN 0/3004780 and catalog xmin

743, but local slot has LSN 0/3004780 and catalog xmin 744.

Thus maybe even a normal pg_createsubscriber run can affect the primary
server (it's catalog xmin) differently?

One difference I found in the logs, is that the skink failure's
regress_log_040_pg_createsubscriber contains:
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 1 remain

Though for a successful run I see locally (I can't find logs of
successful test runs on skink):
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-07-08%2013%3A16%3A35

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: Junwang Zhao
Date:
Subject: Re: Address the -Wuse-after-free warning in ATExecAttachPartition()
Next
From: Tomas Vondra
Date:
Subject: Re: 回复: An implementation of multi-key sort