Re: speed up a logical replica setup - Mailing list pgsql-hackers
From | Shlok Kyal |
---|---|
Subject | Re: speed up a logical replica setup |
Date | |
Msg-id | CANhcyEUCt-g4JLQU3Q3ofFk_Vt-Tqh3ZdXoLcpT8fjz9LY_-ww@mail.gmail.com Whole thread Raw |
In response to | Re: speed up a logical replica setup (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
Responses |
Re: speed up a logical replica setup
|
List | pgsql-hackers |
On Fri, 5 Jan 2024 at 12:19, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > On Thu, 4 Jan 2024 at 16:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jan 4, 2024 at 12:22 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > Hi, > > > I was testing the patch with following test cases: > > > > > > Test 1 : > > > - Create a 'primary' node > > > - Setup physical replica using pg_basebackup "./pg_basebackup –h > > > localhost –X stream –v –R –W –D ../standby " > > > - Insert data before and after pg_basebackup > > > - Run pg_subscriber and then insert some data to check logical > > > replication "./pg_subscriber –D ../standby -S “host=localhost > > > port=9000 dbname=postgres” -P “host=localhost port=9000 > > > dbname=postgres” -d postgres" > > > - Also check pg_publication, pg_subscriber and pg_replication_slots tables. > > > > > > Observation: > > > Data is not lost. Replication is happening correctly. Pg_subscriber is > > > working as expected. > > > > > > Test 2: > > > - Create a 'primary' node > > > - Use normal pg_basebackup but don’t set up Physical replication > > > "./pg_basebackup –h localhost –v –W –D ../standby" > > > - Insert data before and after pg_basebackup > > > - Run pg_subscriber > > > > > > Observation: > > > Pg_subscriber command is not completing and is stuck with following > > > log repeating: > > > LOG: waiting for WAL to become available at 0/3000168 > > > LOG: invalid record length at 0/3000150: expected at least 24, got 0 > > > > > > > I think probably the required WAL is not copied. Can you use the -X > > option to stream WAL as well and then test? But I feel in this case > > also, we should wait for some threshold time and then exit with > > failure, removing new objects created, if any. > > I have tested with -X stream option in pg_basebackup as well. In this > case also the pg_subscriber command is getting stuck. > logs: > 2024-01-05 11:49:34.436 IST [61948] LOG: invalid resource manager ID > 102 at 0/3000118 > 2024-01-05 11:49:34.436 IST [61948] LOG: waiting for WAL to become > available at 0/3000130 > > > > > > Test 3: > > > - Create a 'primary' node > > > - Use normal pg_basebackup but don’t set up Physical replication > > > "./pg_basebackup –h localhost –v –W –D ../standby" > > > -Insert data before pg_basebackup but not after pg_basebackup > > > -Run pg_subscriber > > > > > > Observation: > > > Pg_subscriber command is not completing and is stuck with following > > > log repeating: > > > LOG: waiting for WAL to become available at 0/3000168 > > > LOG: invalid record length at 0/3000150: expected at least 24, got 0 > > > > > > > This is similar to the previous test and you can try the same option > > here as well. > For this test as well tried with -X stream option in pg_basebackup. > It is getting stuck here as well with similar log. > > Will investigate the issue further. I noticed that the pg_subscriber get stuck when we run it on node which is not a standby. It is because the of the code: + conn = connect_database(dbinfo[0].pubconninfo); + if (conn == NULL) + exit(1); + consistent_lsn = create_logical_replication_slot(conn, &dbinfo[0], + temp_replslot); + ..... +else + { + appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n", + consistent_lsn); + WriteRecoveryConfig(conn, subscriber_dir, recoveryconfcontents); + } Here the standby node would be waiting for the 'consistent_lsn' wal during recovery but this wal will not be present on standby if no physical replication is setup. Hence the command will be waiting infinitely for the wal. To solve this added a timeout of 60s for the recovery process and also added a check so that pg_subscriber would give a error when it called for node which is not in physical replication. Have attached the patch for the same. It is a top-up patch of the patch shared by Euler at [1]. Please review the changes and merge the changes if it looks ok. [1] - https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com Thanks and regards Shlok Kyal
Attachment
pgsql-hackers by date: