Thread: Enhance pg_createsubscriber to create required standby.
Hi All, Currently, pg_createsubscriber is designed to convert an existing physical replica into a logical subscriber. To use it, the user must manually set up a standby node beforehand, ensure that physical replication is active, and only then run pg_createsubscriber to perform the switchover to logical replication. To simplify this workflow, I propose enhancing the pg_createsubscriber utility to handle the creation of the standby node as part of the pg_createsubscriber itself. This idea was also suggested earlier by Andres Freund in [1]. To support this, I have introduced a new option: --create-standby This allows the user to specify the parameters needed for pg_basebackup to automatically create the standby node, establish physical replication, and then proceed with the existing steps to convert it into a logical subscriber. Following is the output that creates the standby node: ./pg_createsubscriber -D standby/ -P "host=localhost port=5432 dbname=postgres" --create-standby --verbose pg_createsubscriber: creating the standby server: "/home/shubham/Project/Git/postgres/inst/bin/pg_basebackup" -d'host=localhost port=5432 dbname=postgres' -D standby -P -X stream -R 22881/22881 kB (100%), 1/1 tablespace pg_createsubscriber: set standby port in standby/postgresql.auto.conf: port = 50432 pg_createsubscriber: starting the standby server: "/home/shubham/Project/Git/postgres/inst/bin/pg_ctl" -D "standby" -l "standby/standby.log" -o "-p 50432" start waiting for server to start.... done server started pg_createsubscriber: configuring the standby server: "/home/shubham/Project/Git/postgres/inst/bin/psql" -d postgres -p 50432 -c "ALTER SYSTEM SET wal_level = 'logical';" ALTER SYSTEM pg_createsubscriber: configuring the standby server: "/home/shubham/Project/Git/postgres/inst/bin/psql" -d postgres -p 50432 -c "ALTER SYSTEM SET listen_addresses = '*';" ALTER SYSTEM pg_createsubscriber: stopping the standby server: "/home/shubham/Project/Git/postgres/inst/bin/pg_ctl" -D "standby" -l "standby/standby.log" -o "-p 50432" stop waiting for server to shut down.... done server stopped pg_createsubscriber: validating publisher connection string Currently, pg_basebackup is executed with the following default parameters: pg_basebackup <primary_node_connection_info> -D <sub_dir> -P -U replication -X stream -R If users wish to customize these parameters, a new option pg_basebackup_parameters can be supported in the future. Similarly, the standby node is created using the default configuration, except for the port and wal_level settings. To customize other aspects of the standby setup, a new option standby_config can be supported in the future if required. [1] - https://www.postgresql.org/message-id/20220221232849.x6s24ete4eyg6jol%40alap3.anarazel.de Thanks and regards, Shubham Khanna.
Attachment
On Wed, Jun 4, 2025 at 2:39 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 04.06.25 05:56, Shubham Khanna wrote: > > Currently, pg_createsubscriber is designed to convert an existing > > physical replica into a logical subscriber. To use it, the user must > > manually set up a standby node beforehand, ensure that physical > > replication is active, and only then run pg_createsubscriber to > > perform the switchover to logical replication. > > To simplify this workflow, I propose enhancing the pg_createsubscriber > > utility to handle the creation of the standby node as part of the > > pg_createsubscriber itself. > > Yes, this was contemplated when pg_createsubscriber was first proposed. > We opted against it, mainly for simplicity initially. > > It's not clear to me how this change now would substantially improve the > user experience. The number of characters you type is approximately the > same. You still need to support the old mode because the backup might > not come from pg_basebackup. In the current functionality, the user must first manually create a standby or use an existing standby to make it a subscriber. I thought saving this step for users would be quite helpful. It also helps streamline the process into a single, cohesive workflow. > > And then you'd have the maintenance > overhead that every new feature in pg_basebackup would potentially have > to be passed through or somehow be integrated into pg_createsubscriber. > I am not so sure about this because we use other utilities like pg_ctl in this tool, so same argument could be build for it. -- With Regards, Amit Kapila.
On 04.06.25 11:56, Amit Kapila wrote: >> It's not clear to me how this change now would substantially improve the >> user experience. The number of characters you type is approximately the >> same. You still need to support the old mode because the backup might >> not come from pg_basebackup. > > In the current functionality, the user must first manually create a > standby or use an existing standby to make it a subscriber. I thought > saving this step for users would be quite helpful. It also helps > streamline the process into a single, cohesive workflow. Unless I'm missing something, doesn't this merely replace pg_basebackup && pg_createsubscriber with pg_createsubscriber --create-standby I mean, as I'm typing this out, this is literally the same number of characters. Is the second one easier somehow? It's not clear. >> And then you'd have the maintenance >> overhead that every new feature in pg_basebackup would potentially have >> to be passed through or somehow be integrated into pg_createsubscriber. > > I am not so sure about this because we use other utilities like pg_ctl > in this tool, so same argument could be build for it. Yes, and we tried really hard to avoid the dependency on pg_ctl, but it was too hard. (I would entertain patches to get rid of it.)
On Thu, 5 Jun 2025 at 01:50, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 04.06.25 11:56, Amit Kapila wrote: > >> It's not clear to me how this change now would substantially improve the > >> user experience. The number of characters you type is approximately the > >> same. You still need to support the old mode because the backup might > >> not come from pg_basebackup. > > > > In the current functionality, the user must first manually create a > > standby or use an existing standby to make it a subscriber. I thought > > saving this step for users would be quite helpful. It also helps > > streamline the process into a single, cohesive workflow. > > Unless I'm missing something, doesn't this merely replace > > pg_basebackup && pg_createsubscriber > > with > > pg_createsubscriber --create-standby > > I mean, as I'm typing this out, this is literally the same number of > characters. Is the second one easier somehow? It's not clear. I believe adding support for standby creation directly within pg_createsubscriber offers several key advantages over the manual two-step approach of running pg_basebackup followed by pg_createsubscriber. The main benefits include: 1) Improved Progress Reporting: pg_createsubscriber can be enhanced to provide a clear, step-by-step progress indicator that shows both the total number of steps and the current step being executed. For example: [Step 1/5] Executing pg_basebackup... [Step 2/5] Validating prerequisites... [Step 3/5] Creating publication and replication slot... [Step 4/5] Waiting for standby to reach consistent state... [Step 5/5] Creating subscription... 2) Graceful Handling of Interruptions: pg_createsubscriber can be enhanced to handle user interruptions (e.g., Ctrl+C) more gracefully by cleaning up any partially created directories, such as the base backup target directory. This can be achieved by registering an atexit handler for safe cleanup during early termination. 3) Cleanup on Partial Failure: If pg_basebackup completes but a subsequent step (e.g., subscription or replication slot creation) fails, pg_createsubscriber could offer the option to automatically clean up the standby contents. In contrast, with the manual pg_basebackup && pg_createsubscriber approach, failures after the backup step leave residual data on disk, as there’s no built-in cleanup mechanism. 4) Simplified Option Handling: The --create-standby option streamlines the workflow by removing the need to manually pass options like -D and -R. With this option, the user can simply run: pg_createsubscriber -D <target_dir> -P <primary_connstr> -d <target_database> --create-standby In the combined commands approach: #primary is on the different server pg_basebackup -D <target_dir> -d <primary_connstr> -R && pg_createsubscriber -d <target_database> -D <target_dir> -P <primary_connstr> #primary is on the same server pg_basebackup -D <target_dir> -R && pg_createsubscriber -d <target_database> -D <target_dir> -P <primary_connstr> The integrated --create-standby option reduces the risk of user error, misconfiguration, and is shorter to write for users. Based on the above points I felt the create-standby option through pg_createsubscriber will be helpful to users. Your thoughts? Regards, Vignesh
On Thu, Jun 5, 2025 at 1:50 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 04.06.25 11:56, Amit Kapila wrote: > >> It's not clear to me how this change now would substantially improve the > >> user experience. The number of characters you type is approximately the > >> same. You still need to support the old mode because the backup might > >> not come from pg_basebackup. > > > > In the current functionality, the user must first manually create a > > standby or use an existing standby to make it a subscriber. I thought > > saving this step for users would be quite helpful. It also helps > > streamline the process into a single, cohesive workflow. > > Unless I'm missing something, doesn't this merely replace > > pg_basebackup && pg_createsubscriber > > with > > pg_createsubscriber --create-standby > > I mean, as I'm typing this out, this is literally the same number of > characters. Is the second one easier somehow? It's not clear. > As shown in Vignesh's email [1] (point 4), there could be multiple additional parameters required for the first option suggested by you, which will make it longer. Additionally, there are some other benefits of having the second option (pg_createsubscriber --create-standby), like better cleanup of contents during failures and better progress reporting. Are you still against adding such an option? [1]: https://www.postgresql.org/message-id/CALDaNm1biZBMOzFMfHYzqrAeosJSD5YRG%3D82-pp6%2BJhALsfe6w%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, Jun 17, 2025 at 9:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
As shown in Vignesh's email [1] (point 4), there could be multiple
additional parameters required for the first option suggested by you,
which will make it longer. Additionally, there are some other benefits
of having the second option (pg_createsubscriber --create-standby),
like better cleanup of contents during failures and better progress
reporting. Are you still against adding such an option?
[1]: https://www.postgresql.org/message-id/CALDaNm1biZBMOzFMfHYzqrAeosJSD5YRG%3D82-pp6%2BJhALsfe6w%40mail.gmail.com
None of those benefits convince me that "let's write a shell script in C and put it under an annual feature release policy" is the way to go here.
Let's make something like this available to the community, say on the Wiki, and make it work in PostgreSQL 18 so they can benefit from it today, and readily use it as a basis to tweak things for their own unique circumstances.
David J.