Re: speed up a logical replica setup - Mailing list pgsql-hackers
From | Shubham Khanna |
---|---|
Subject | Re: speed up a logical replica setup |
Date | |
Msg-id | CAHv8RjKst9j8GU7ZNNUcrOKsSc-fgO-yHRaJ1ze=uzJi-NrKUA@mail.gmail.com Whole thread Raw |
In response to | Re: speed up a logical replica setup ("Euler Taveira" <euler@eulerto.com>) |
Responses |
Re: speed up a logical replica setup
Re: speed up a logical replica setup |
List | pgsql-hackers |
On Sat, Mar 2, 2024 at 2:19 AM Euler Taveira <euler@eulerto.com> wrote: > > On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote: > > Based on idea from Euler, I roughly implemented. Thought? > > 0001-0013 were not changed from the previous version. > > V24-0014: addressed your comment in the replied e-mail. > V24-0015: Add disconnect_database() again, per [3] > V24-0016: addressed your comment in [4]. > V24-0017: addressed your comment in [5]. > V24-0018: addressed your comment in [6]. > > > Thanks for your review. I'm attaching v25 that hopefully addresses all pending > points. > > Regarding your comments [1] on v21, I included changes for almost all items > except 2, 20, 23, 24, and 25. It still think item 2 is not required because > pg_ctl will provide a suitable message. I decided not to rearrange the block of > SQL commands (item 20) mainly because it would avoid these objects on node_f. > Do we really need command_checks_all? Depending on the output it uses > additional cycles than command_ok. > > In summary: > > v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I > checked what you wrote and fix some gaps in what I've been written. > v24-0003: as I said I don't think we need to add it, however, I won't fight > against it if people want to add this check. > v24-0004: I spent some time on it. This patch is not completed. I cleaned it up > and include the start_standby_server code. It starts the server using the > specified socket directory, port and username, hence, preventing external client > connections during the execution. > v24-0005: partially applied > v24-0006: applied with cosmetic change > v24-0007: applied with cosmetic change > v24-0008: applied > v24-0009: applied with cosmetic change > v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is > not used when there is a database connection open. Instead, pg_log_error() > followed by disconnect_database(). In cases that it should exit immediately, > disconnect_database() has a new parameter (exit_on_error) that controls if it > needs to call exit(1). I go ahead and did the same for connect_database(). > v24-0011: partially applied. I included some of the suggestions (18, 19, and 21). > v24-0012: not applied. Under reflection, after working on v24-0004, the target > server will start with new parameters (that only accepts local connections), > hence, during the execution is not possible anymore to detect if the target > server is a primary for another server. I added a sentence for it in the > documentation (Warning section). > v24-0013: good catch. Applied. > v24-0014: partially applied. After some experiments I decided to use a small > number of attempts. The current code didn't reset the counter if the connection > is reestablished. I included the documentation suggestion. I didn't include the > IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead, > I refactored the drop_publication() to not try again if the DROP PUBLICATION > failed. > v24-0015: not applied. I refactored the exit code to do the right thing: print > error message, disconnect database (if applicable) and exit. > v24-0016: not applied. But checked if the information was presented in the > documentation; it is. > v24-0017: good catch. Applied. > v24-0018: not applied. > > I removed almost all boolean return and include the error logic inside the > function. It reads better. I also changed the connect|disconnect_database > functions to include the error logic inside it. There is a new parameter > on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I > think someone suggested it -- and the error message was moved to outside the > function. I improved the cleanup routine. It provides useful information if it > cannot remove the object (publication or replication slot) from the primary. > > Since I applied v24-0004, I realized that extra start / stop service are > required. It mean pg_createsubscriber doesn't start the transformation with the > current standby settings. Instead, it stops the standby if it is running and > start it with the provided command-line options (socket, port, > listen_addresses). It has a few drawbacks: > * See v34-0012. It cannot detect if the target server is a primary for another > server. It is documented. > * I also removed the check for standby is running. If the standby was stopped a > long time ago, it will take some time to reach the start point. > * Dry run mode has to start / stop the service to work correctly. Is it an > issue? > > However, I decided to include --retain option, I'm thinking about to remove it. > If the logging is enabled, the information during the pg_createsubscriber will > be available. The client log can be redirected to a file for future inspection. > > Comments? I applied the v25 patch and did RUN the 'pg_createsubscriber' command. I was unable to execute it and experienced the following error: $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432 dbname=postgres" -d postgres -d db1 -p 9000 -r ./pg_createsubscriber: invalid option -- 'p' pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more information. Here, the --p is not accepting the desired port number. Thoughts? Thanks and Regards, Shubham Khanna.
pgsql-hackers by date: