RE: speed up a logical replica setup - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: speed up a logical replica setup
Date
Msg-id OSBPR01MB25528300C71FDD83EA1DCA12F5DD2@OSBPR01MB2552.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: speed up a logical replica setup
List pgsql-hackers
Dear Alexander and other hackers,

> As a recent buildfarm failure [1] shows, that test addition introduced
> new instability:
> ### Starting node "node_s"
> # Running: pg_ctl -w -D
> /home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_c
> reatesubscriber/data/t_040_pg_createsubscriber_node_s_data/pgdata
> -l
> /home/bf/bf-build/piculet/HEAD/pgsql.build/testrun/pg_basebackup/040_pg_c
> reatesubscriber/log/040_pg_createsubscriber_node_s.log
> -o --cluster-name=node_s start
> waiting for server to start.... done
> server started
> # Postmaster PID for node "node_s" is 416482
> error running SQL: 'psql:<stdin>:1: ERROR:  skipping slot synchronization as the
> received slot sync LSN 0/30047F0 for
> slot "failover_slot" is ahead of the standby position 0/3004708'
> while running 'psql -XAtq -d port=51506 host=/tmp/pqWohdD5Qj
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT
> pg_sync_replication_slots()' at
> /home/bf/bf-build/piculet/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster
> .pm line 2126.
> 
> I could reproduce this failure with:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -517,6 +517,7 @@ WalReceiverMain(char *startup_data, size_t
> startup_data_len)
>                        * let the startup process and primary server know
> about
>                        * them.
>                        */
> +pg_usleep(300000);
>                       XLogWalRcvFlush(false, startpointTLI);
> 
> make -s check -C src/bin/pg_basebackup/ PROVE_TESTS="t/040*"
> 
> # +++ tap check in src/bin/pg_basebackup +++
> t/040_pg_createsubscriber.pl .. 22/? # Tests were run but no plan was declared
> and done_testing() was not seen.
> # Looks like your test exited with 29 just after 23.
> t/040_pg_createsubscriber.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
> All 23 subtests passed
> 
> Test Summary Report
> -------------------
> t/040_pg_createsubscriber.pl (Wstat: 7424 Tests: 23 Failed: 0)
>    Non-zero exit status: 29
>    Parse errors: No plan found in TAP output
> Files=1, Tests=23,  4 wallclock secs ( 0.01 usr  0.01 sys +  0.49 cusr  0.44
> csys =  0.95 CPU)

I thought it was not related with the feature of pg_createsubscriber. It was
related with the slotsync function. In the first place, the error message was
output with the below backtrace.

```
synchronize_one_slot
synchronize_slots
pg_sync_replication_slots
```

And I found the error could be reproduced by the attached script with the source
modification by Alexander [1]. According to my analysis, this could be caused when
1) a replication slot is created with failover = true and XLOG_RUNNING_XACT record is generated, and
2) pg_sync_replication_slots() is called *before* the record is flushed on the
standby. To stabilize the test, we can call wait_for_replay_catchup() after the slot
creation on the primary.

> Moreover, this test may suffer from autovacuum:
> echo "
> autovacuum_naptime = 1
> autovacuum_analyze_threshold = 1
> " > /tmp/temp.config
> TEMP_CONFIG=/tmp/temp.config make -s check -C src/bin/pg_basebackup/
> PROVE_TESTS="t/040*"
> 
> # +++ tap check in src/bin/pg_basebackup +++
> t/040_pg_createsubscriber.pl .. 24/?
> #   Failed test 'failover slot is synced'
> #   at t/040_pg_createsubscriber.pl line 273.
> #          got: ''
> #     expected: 'failover_slot'
> t/040_pg_createsubscriber.pl .. 28/? # Looks like you failed 1 test of 33.
> t/040_pg_createsubscriber.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/33 subtests

Thanks for reporting. I could reproduce the failure, and some BF animals said NG.
According to them, the root cause seemed that slot synchronization was failed.

```
LOG:  statement: SELECT pg_sync_replication_slots()
LOG:  could not sync slot "failover_slot" as remote slot precedes local slot
DETAIL:  Remote slot has LSN 0/3006890 and catalog xmin 743, but local slot has LSN 0/3006890 and catalog xmin 744.
```

Based on that, I considered a scenario why the slot could not be synchronized.
I felt this was not caused by the pg_createsubscriber.

1. At initial stage, the xmin of the physical slot is 743, and nextXid of the
   primary is also 743.
2. Autovacuum worker starts a new transaction. nextXid is incremented to 744.
3. Tries to creates a logical replication slot with failover=true *before the
   transaction at step2 is replicated to the standby*.
4. While creating the slot, the catalog_xmin must be determined.
   The initial candidate is nextXid (= 744), but the oldest xmin of replication
   slots (=743) is used if it is older than nextXid. So 743 is chosen in this case.
   This operaion is done in CreateInitDecodingContext()->GetOldestSafeDecodingContext().
5. After that, the transaction at step2 is reached to the standby node and it
   updates the nextXid.
6. Finally runs pg pg_sync_replication_slots() on the standby. It finds a failover
   slot on the primary and tries to create on the standby. However, the
   catalog_xmin on the primary (743) is older than the nextXid of the standby (744)
   so that it skips to create a slot.

To avoid the issue, we can disable the autovacuuming while testing.

# Descriptions for attached files

An attached script can be used to reproduce the first failure without pg_createsubscriber.
It requires to modify the code like [1].

0001 patch can reduce the wait time of test. See [3]. I know the approach is bit hacky but
it worked.
0002 patch waits until the WAL record is replicated. This fixes a first failure.
0003 patch disables autovacuum for node_p and node_s. I think node_p is enough, but did
like that just in case. This fixes a second failure.


[1]: https://www.postgresql.org/message-id/0dffca12-bf17-4a7a-334d-225569de5e6e%40gmail.com
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-07-02%2008%3A45%3A39
[3]:
https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Attachment

pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: speed up a logical replica setup
Next
From: Dilip Kumar
Date:
Subject: Re: Conflict Detection and Resolution