RE: speed up a logical replica setup - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: speed up a logical replica setup
Date
Msg-id TYCPR01MB120776F5C9824042DD8BB1307F54F2@TYCPR01MB12077.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
Dear hackers,

I've replied for trackability.

> Further comments for v17.
> 
> 01.
> This program assumes that the target server has same major version with this.
> Because the target server would be restarted by same version's pg_ctl command.
> I felt it should be ensured by reading the PG_VERSION.

Still investigating.

> 02.
> pg_upgrade checked the version of using executables, like pg_ctl, postgres, and
> pg_resetwal. I felt it should be as well.

Still investigating.

> 03. get_bin_directory
> ```
>     if (find_my_exec(path, full_path) < 0)
>     {
>         pg_log_error("The program \"%s\" is needed by %s but was not
> found in the\n"
>                      "same directory as \"%s\".\n",
>                      "pg_ctl", progname, full_path);
> ```
> 
> s/"pg_ctl"/progname

The message was updated.

> 04.
> Missing canonicalize_path()?

I found find_my_exec() calls canonicalize_path(). No need to do.

> 05.
> Assuming that the target server is a cascade standby, i.e., it has a role as
> another primary. In this case, I thought the child node would not work. Because
> pg_createsubcriber runs pg_resetwal and all WAL files would be discarded at that
> time. I have not tested, but should the program detect it and exit earlier?

Still investigating.

> 06.
> wait_for_end_recovery() waits forever even if the standby has been disconnected
> from the primary, right? should we check the status of the replication via
> pg_stat_wal_receiver?

Still investigating.

> 07.
> The cleanup function has couple of bugs.
> 
> * If subscriptions have been created on the database, the function also tries to
>   drop a publication. But it leads an ERROR because it has been already dropped.
>   See setup_subscriber().
> * If the subscription has been created, drop_replication_slot() leads an ERROR.
>   Because the subscriber tried to drop the subscription while executing DROP
> SUBSCRIPTION.

Only drop_publication() was removed.

> 08.
> I found that all messages (ERROR, WARNING, INFO, etc...) would output to stderr,
> but I felt it should be on stdout. Is there a reason? pg_dump outputs messages to
> stderr, but the motivation might be to avoid confusion with dumps.

Still investigating.

> 09.
> I'm not sure the cleanup for subscriber is really needed. Assuming that there
> are two databases, e.g., pg1 pg2 , and we fail to create a subscription on pg2.
> This can happen when the subscription which has the same name has been
> already
> created on the primary server.
> In this case a subscirption pn pg1 would be removed. But what is a next step?
> Since a timelineID on the standby server is larger than the primary (note that
> the standby has been promoted once), we cannot resume the physical replication
> as-is. IIUC the easiest method to retry is removing a cluster once and restarting
> from pg_basebackup. If so, no need to cleanup the standby because it is
> corrupted.
> We just say "Please remove the cluster and recreate again".

I still think it should be, but not done yet.

New patch can be available in [1].

[1]:
https://www.postgresql.org/message-id/TYCPR01MB12077A6BB424A025F04A8243DF54F2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: speed up a logical replica setup
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: speed up a logical replica setup