Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Shubham Khanna
Subject Re: speed up a logical replica setup
Date
Msg-id CAHv8Rj+AvtnxNyyMAFRVuuzP3Cy5j1zq99wkMUXOUnskPZg5nw@mail.gmail.com
Whole thread Raw
In response to RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: speed up a logical replica setup
List pgsql-hackers
On Fri, Feb 2, 2024 at 3:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Euler,
>
> Thanks for updating the patch!
>
> >
> I'm still working on the data structures to group options. I don't like the way
> it was grouped in v13-0005. There is too many levels to reach database name.
> The setup_subscriber() function requires the 3 data structures.
> >
>
> Right, your refactoring looks fewer stack. So I pause to revise my refactoring
> patch.
>
> >
> The documentation update is almost there. I will include the modifications in
> the next patch.
> >
>
> OK. I think it should be modified before native speakers will attend to the
> thread.
>
> >
> Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
> However, it comes with a restriction that requires a similar HBA rule for both
> regular and replication connections. Is it an acceptable restriction? We might
> paint ourselves into the corner. A reasonable proposal is not to remove this
> option. Instead, it should be optional. If it is not provided, primary_conninfo
> is used.
> >
>
> I didn't have such a point of view. However, it is not related whether -P exists
> or not. Even v14-0001 requires primary to accept both normal/replication connections.
> If we want to avoid it, the connection from pg_createsubscriber can be restored
> to replication-connection.
> (I felt we do not have to use replication protocol even if we change the connection mode)
>
> The motivation why -P is not needed is to ensure the consistency of nodes.
> pg_createsubscriber assumes that the -P option can connect to the upstream node,
> but no one checks it. Parsing two connection strings may be a solution but be
> confusing. E.g., what if some options are different?
> I think using a same parameter is a simplest solution.
>
> And below part contains my comments for v14.
>
> 01.
> ```
> char            temp_replslot[NAMEDATALEN] = {0};
> ```
>
> I found that no one refers the name of temporary slot. Can we remove the variable?
>
> 02.
> ```
>         CreateSubscriberOptions opt;
> ...
>         memset(&opt, 0, sizeof(CreateSubscriberOptions));
>
>         /* Default settings */
>         opt.subscriber_dir = NULL;
>         opt.pub_conninfo_str = NULL;
>         opt.sub_conninfo_str = NULL;
>         opt.database_names = (SimpleStringList)
>         {
>                 NULL, NULL
>         };
>         opt.retain = false;
>         opt.recovery_timeout = 0;
> ```
>
> Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
> All values are set to 0x0.
>
> 03.
> ```
> /*
>  * Is the standby server ready for logical replication?
>  */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
>
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
>  a) check the existence "standby.signal" in the data directory
>  b) call an SQL function "pg_is_in_recovery"
>
> 04.
> ```
> static char *pg_ctl_path = NULL;
> static char *pg_resetwal_path = NULL;
> ```
>
> I still think they can be combined as "bindir".
>
> 05.
>
> ```
>         /*
>          * Write recovery parameters.
> ...
>                 WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
> ```
>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").
>
> 06.
> ```
> static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char
*sub_base_conninfo);
> ...
> static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
> ...
> static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
> ```
>
> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
> we can pass a value which would be really used.
>
> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
>                                                            const char *noderole);
> ```
>
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
> 08.
> The terminology is still not consistent. Some functions call the target as standby,
> but others call it as subscriber.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
>  a) raise an ERROR when these parameter were set. check_subscriber() can do it
>  b) overwrite these GUCs as empty strings.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
> use-case, so current implementation seems not user-friendly. How should we fix?
> Below bullets are my idea:
>
>  a) avoid stopping the standby in case of dry_run: seems possible.
>  b) accept even if the standby is stopped: seems possible.
>  c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?
>
> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
>           Connections to standby is not changed because the standby/subscriber
>           does not require such type of connection, in principle.
>           If we can accept connecting to subscriber with replication mode,
>           this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
>           I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com


While reviewing the v15 patches I discovered that subscription
connection string has added a lot of options which are not required
now:
v15-0001
postgres=# select subname, subconninfo from pg_subscription;
            subname            |               subconninfo
-------------------------------+------------------------------------------
pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
(1 row)



v15-0001+0002+0003
postgres=# select subname, subconninfo from pg_subscription;
            subname            |

      subconninfo



-------------------------------+------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
------------------------------
pg_createsubscriber_5_1895366 | user=shubham
passfile='/home/shubham/.pgpass' channel_binding=prefer ho
st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_versi
on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_
hosts=disable dbname=postgres
(1 row)


Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
gssencmode, krbsrvname, etc are getting included. This does not look
intentional, we should keep the subscription connection same as in
v15-0001.

Thanks and Regards,
Shubham Khanna.


Thanks and Regards,
Shubham Khanna.



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Amit Kapila
Date:
Subject: Re: Why is subscription/t/031_column_list.pl failing so much?