Re: speed up a logical replica setup - Mailing list pgsql-hackers
From | Euler Taveira |
---|---|
Subject | Re: speed up a logical replica setup |
Date | |
Msg-id | e390e35e-508e-4eb8-92e4-e6b066407a41@app.fastmail.com Whole thread Raw |
In response to | RE: speed up a logical replica setup ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
Re: speed up a logical replica setup
RE: speed up a logical replica setup Re: speed up a logical replica setup |
List | pgsql-hackers |
On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
Based on idea from Euler, I roughly implemented. Thought?0001-0013 were not changed from the previous version.V24-0014: addressed your comment in the replied e-mail.V24-0015: Add disconnect_database() again, per [3]V24-0016: addressed your comment in [4].V24-0017: addressed your comment in [5].V24-0018: addressed your comment in [6].
Thanks for your review. I'm attaching v25 that hopefully addresses all pending
points.
Regarding your comments [1] on v21, I included changes for almost all items
except 2, 20, 23, 24, and 25. It still think item 2 is not required because
pg_ctl will provide a suitable message. I decided not to rearrange the block of
SQL commands (item 20) mainly because it would avoid these objects on node_f.
Do we really need command_checks_all? Depending on the output it uses
additional cycles than command_ok.
In summary:
v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
checked what you wrote and fix some gaps in what I've been written.
v24-0003: as I said I don't think we need to add it, however, I won't fight
against it if people want to add this check.
v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
and include the start_standby_server code. It starts the server using the
specified socket directory, port and username, hence, preventing external client
connections during the execution.
v24-0005: partially applied
v24-0006: applied with cosmetic change
v24-0007: applied with cosmetic change
v24-0008: applied
v24-0009: applied with cosmetic change
v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
not used when there is a database connection open. Instead, pg_log_error()
followed by disconnect_database(). In cases that it should exit immediately,
disconnect_database() has a new parameter (exit_on_error) that controls if it
needs to call exit(1). I go ahead and did the same for connect_database().
v24-0011: partially applied. I included some of the suggestions (18, 19, and 21).
v24-0012: not applied. Under reflection, after working on v24-0004, the target
server will start with new parameters (that only accepts local connections),
hence, during the execution is not possible anymore to detect if the target
server is a primary for another server. I added a sentence for it in the
documentation (Warning section).
v24-0013: good catch. Applied.
v24-0014: partially applied. After some experiments I decided to use a small
number of attempts. The current code didn't reset the counter if the connection
is reestablished. I included the documentation suggestion. I didn't include the
IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
I refactored the drop_publication() to not try again if the DROP PUBLICATION
failed.
v24-0015: not applied. I refactored the exit code to do the right thing: print
error message, disconnect database (if applicable) and exit.
v24-0016: not applied. But checked if the information was presented in the
documentation; it is.
v24-0017: good catch. Applied.
v24-0018: not applied.
I removed almost all boolean return and include the error logic inside the
function. It reads better. I also changed the connect|disconnect_database
functions to include the error logic inside it. There is a new parameter
on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
think someone suggested it -- and the error message was moved to outside the
function. I improved the cleanup routine. It provides useful information if it
cannot remove the object (publication or replication slot) from the primary.
Since I applied v24-0004, I realized that extra start / stop service are
required. It mean pg_createsubscriber doesn't start the transformation with the
current standby settings. Instead, it stops the standby if it is running and
start it with the provided command-line options (socket, port,
listen_addresses). It has a few drawbacks:
* See v34-0012. It cannot detect if the target server is a primary for another
server. It is documented.
* I also removed the check for standby is running. If the standby was stopped a
long time ago, it will take some time to reach the start point.
* Dry run mode has to start / stop the service to work correctly. Is it an
issue?
However, I decided to include --retain option, I'm thinking about to remove it.
If the logging is enabled, the information during the pg_createsubscriber will
be available. The client log can be redirected to a file for future inspection.
Comments?
Attachment
pgsql-hackers by date: