Thread: RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, > I propose adding the --clean-publisher-objects option to the > pg_createsubscriber utility. As discussed in [1], this feature ensures > a clean and streamlined setup of logical replication by removing stale > or unnecessary publications from the subscriber node. These > publications, replicated during streaming replication, become > redundant after converting to logical replication and serve no further > purpose. This patch introduces the drop_all_publications() function, > which efficiently fetches and drops all publications on the subscriber > node within a single transaction. I think replication slot are also type of 'publisher-objects', but they are not removed for now: API-name may not be accurate. And... > Additionally, other related objects, such as subscriptions and > replication slots, may also require cleanup. I plan to analyze this > further and include them in subsequent patches. I'm not sure replication slots should be cleaned up. Apart from other items like publication/subscription, replication slots are not replicated when it is created on the primary instance. This means they are intentionally created by DBAs and there may not be no strong reasons to drop them after the conversion. Another question is the style of APIs. Do you plan to provide APIs like 'cleanup-subscriber-objects' and 'cleanup-publisher-objects', or just one 'cleanup-logical-replication-objects'? Regarding the patch: 1. ``` + The <application>pg_createsubscriber</application> now supports the + <option>--clean-publisher-objects</option> to remove all publications on + the subscriber node before creating a new subscription. ``` This description is not suitable for the documentation. Something like: Remove all publications on the subscriber node. 2. ``` + /* Drop publications from the subscriber if requested */ + drop_all_publications(dbinfo); ``` This should be called when `opts.clean_publisher_objects` is true. 3. You said publications are dropped within a single transaction, but the patch does not do. Which is correct? 4. ``` +# Set up node A as primary +my $node_a = PostgreSQL::Test::Cluster->new('node_a'); +my $aconnstr = $node_a->connstr; +$node_a->init(allows_streaming => 'logical'); +$node_a->append_conf('postgresql.conf', 'autovacuum = off'); +$node_a->start; + +# Set up node B as standby linking to node A +$node_a->backup('backup_3'); +my $node_b = PostgreSQL::Test::Cluster->new('node_b'); +$node_b->init_from_backup($node_a, 'backup_3', has_streaming => 1); +$node_b->append_conf( + 'postgresql.conf', qq[ + primary_conninfo = '$aconnstr' + hot_standby_feedback = on + max_logical_replication_workers = 5 + ]); +$node_b->set_standby_mode(); +$node_b->start; ``` I don't think new nodes are needed in the test. Can you reuse node_p/node_s for the purpose? ---------- Best regards, Haato Kuroda
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shlok Kyal
Date:
On Wed, 29 Jan 2025 at 15:14, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Wed, Jan 29, 2025 at 10:42 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Shubham, > > > > > I propose adding the --clean-publisher-objects option to the > > > pg_createsubscriber utility. As discussed in [1], this feature ensures > > > a clean and streamlined setup of logical replication by removing stale > > > or unnecessary publications from the subscriber node. These > > > publications, replicated during streaming replication, become > > > redundant after converting to logical replication and serve no further > > > purpose. This patch introduces the drop_all_publications() function, > > > which efficiently fetches and drops all publications on the subscriber > > > node within a single transaction. > > > > I think replication slot are also type of 'publisher-objects', but they are not > > removed for now: API-name may not be accurate. And... > > > > > Additionally, other related objects, such as subscriptions and > > > replication slots, may also require cleanup. I plan to analyze this > > > further and include them in subsequent patches. > > > > I'm not sure replication slots should be cleaned up. Apart from other items like > > publication/subscription, replication slots are not replicated when it is created > > on the primary instance. This means they are intentionally created by DBAs and there > > may not be no strong reasons to drop them after the conversion. > > > > Another question is the style of APIs. Do you plan to provide APIs like > > 'cleanup-subscriber-objects' and 'cleanup-publisher-objects', or just one > > 'cleanup-logical-replication-objects'? > > > > Thanks for the suggestions, I will keep them in mind while preparing > the 0002 patch for the same. > Currently, I have changed the API to '--cleanup-publisher-objects'. > > > Regarding the patch: > > > > 1. > > ``` > > + The <application>pg_createsubscriber</application> now supports the > > + <option>--clean-publisher-objects</option> to remove all publications on > > + the subscriber node before creating a new subscription. > > ``` > > > > This description is not suitable for the documentation. Something like: > > > > Remove all publications on the subscriber node. > > > > 2. > > ``` > > + /* Drop publications from the subscriber if requested */ > > + drop_all_publications(dbinfo); > > ``` > > > > This should be called when `opts.clean_publisher_objects` is true. > > > > 3. > > You said publications are dropped within a single transaction, but the > > patch does not do. Which is correct? > > > > 4. > > ``` > > +# Set up node A as primary > > +my $node_a = PostgreSQL::Test::Cluster->new('node_a'); > > +my $aconnstr = $node_a->connstr; > > +$node_a->init(allows_streaming => 'logical'); > > +$node_a->append_conf('postgresql.conf', 'autovacuum = off'); > > +$node_a->start; > > + > > +# Set up node B as standby linking to node A > > +$node_a->backup('backup_3'); > > +my $node_b = PostgreSQL::Test::Cluster->new('node_b'); > > +$node_b->init_from_backup($node_a, 'backup_3', has_streaming => 1); > > +$node_b->append_conf( > > + 'postgresql.conf', qq[ > > + primary_conninfo = '$aconnstr' > > + hot_standby_feedback = on > > + max_logical_replication_workers = 5 > > + ]); > > +$node_b->set_standby_mode(); > > +$node_b->start; > > ``` > > > > Fixed the given comments. The attached patch contains the suggested changes. > Hi Shubham, I have reviewed the v2 patch. Here are my comments: 1. Initial of the comment should in smallcase to make it similar to other comments: + bool cleanup_publisher_objects; /* Drop all publications */ 2. We should provide a comment for the function. +static void +drop_all_publications(const struct LogicalRepInfo *dbinfo) +{ 3. We should declare it as "const char*" + char *search_query = "SELECT pubname FROM pg_catalog.pg_publication;"; + 4. Instead of warning we should throw an error here: + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_warning("could not obtain publication information: %s", + PQresultErrorMessage(res)); + 5. Should we throw an error in this case as well? + if (PQresultStatus(res_for_drop) != PGRES_COMMAND_OK) + { + pg_log_warning("could not drop publication \"%s\": %s", + pubname, PQresultErrorMessage(res)); + } 6. We should start the standby server before creating the publications, so that the publications are replicated to standby. +# Create publications to test it's removal +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); + +# Verify the existing publications +my $pub_count_before = + $node_p->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); Also maybe we should check the publication count on the standby node instead of primary. Thanks and Regards, Shlok Kyal
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shlok Kyal
Date:
On Wed, 5 Feb 2025 at 07:49, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shlok, > > > 4. Instead of warning we should throw an error here: > > + if (PQresultStatus(res) != PGRES_TUPLES_OK) > > + { > > + pg_log_warning("could not obtain publication information: %s", > > + PQresultErrorMessage(res)); > > + > > I don't think so. ERROR evokes user to retry the command or recreate the physical > replica, but the conversion has already been finished when drop_all_publications() > is called. Cleanup operations should not affect the final result. > drop_primary_replication_slot() and drop_failover_replication_slots() raise WARNING > when they fail to drop objects because they are just cleanup functions. > I feel we can follow this manner. > Hi Kuroda-san, I agree with you. Raising WARNING makes sense to me. Thanks and Regards, Shlok Kyal
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, Thanks for updating the patch. Previously you told that you had a plan to extend the patch to drop other replication objects [1], but I think it is not needed. pg_createsubscriber has already been able to drop the existing subscrisubscriptions in check_and_drop_existing_subscriptions(). As for the replication slot, I have told in [2], it would be created intentionally thus I feel it should not be dropped. Thus I regard the patch does not have concrete extending plan. Below part contains my review comment. 01. Option name Based on the above discussion, "--cleanup-publisher-objects" is not suitable because it won't drop replication slots. How about "--cleanup-publications"? 02. usage() ``` + printf(_(" -C --cleanup-publisher-objects drop all publications on the logical replica\n")); ``` s/logical replica/subscriber 03. drop_all_publications ``` +/* Drops all existing logical replication publications from all subscriber + * databases + */ +static void ``` Initial line of the comment must be blank [3]. 04. main ``` + {"cleanup-publisher-objects", no_argument, NULL, 'C'}, ``` Is there a reason why upper case is used? I feel lower one is enough. 05. main ``` + /* Drop publications from the subscriber if requested */ + if (opt.cleanup_publisher_objects) + drop_all_publications(dbinfo); ``` After considering more, I noticed that we have already called drop_publication() in the setup_subscriber(). Can we call drop_all_publications() there instead when -C is specified? 06. 040_pg_createsubscriber.pl ``` +$node_s->start; +# Create publications to test it's removal +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); + +# Verify the existing publications +my $pub_count_before = + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); +is($pub_count_before, '2', + 'two publications created before --cleanup-publisher-objects is run'); + +$node_s->stop; ``` I feel it requires unnecessary startup and shutdown. IIUC, creating publications and check counts can be before stopping the node_s, around line 331. 07. 040_pg_createsubscriber.pl ``` +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); + +# Verify the existing publications +my $pub_count_before = + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); +is($pub_count_before, '2', + 'two publications created before --cleanup-publisher-objects is run'); + ``` Also, there is a possibility that CREATE PUBLICATION on node_p is not replicated yet when SELECT COUNT(*) is executed. Please wait for the replay. [1]: https://www.postgresql.org/message-id/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg%40mail.gmail.com [2]: https://www.postgresql.org/message-id/OSCPR01MB1496664FDC38DA40A441F449FF5EE2%40OSCPR01MB14966.jpnprd01.prod.outlook.com [3]: https://www.postgresql.org/docs/devel/source-format.html Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shlok Kyal
Date:
On Tue, 11 Feb 2025 at 09:51, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Fri, Feb 7, 2025 at 7:46 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Shubham, > > > > Thanks for updating the patch. > > > > Previously you told that you had a plan to extend the patch to drop other replication > > objects [1], but I think it is not needed. pg_createsubscriber has already been > > able to drop the existing subscrisubscriptions in check_and_drop_existing_subscriptions(). > > As for the replication slot, I have told in [2], it would be created intentionally > > thus I feel it should not be dropped. > > Thus I regard the patch does not have concrete extending plan. > > > > Below part contains my review comment. > > > > 01. Option name > > > > Based on the above discussion, "--cleanup-publisher-objects" is not suitable because > > it won't drop replication slots. How about "--cleanup-publications"? > > > > I have changed the name of the option to "--cleanup-existing-publications" > > > 02. usage() > > ``` > > + printf(_(" -C --cleanup-publisher-objects drop all publications on the logical replica\n")); > > ``` > > Fixed. > > > s/logical replica/subscriber > > > > 03. drop_all_publications > > ``` > > +/* Drops all existing logical replication publications from all subscriber > > + * databases > > + */ > > +static void > > ``` > > > > Initial line of the comment must be blank [3]. > > > > Removed this function. > > > 04. main > > ``` > > + {"cleanup-publisher-objects", no_argument, NULL, 'C'}, > > ``` > > > > Is there a reason why upper case is used? I feel lower one is enough. > > > > Fixed. > > > 05. main > > ``` > > + /* Drop publications from the subscriber if requested */ > > + if (opt.cleanup_publisher_objects) > > + drop_all_publications(dbinfo); > > ``` > > > > After considering more, I noticed that we have already called drop_publication() > > in the setup_subscriber(). Can we call drop_all_publications() there instead when > > -C is specified? > > > > I agree with you on this. I have removed the drop_all_publication() > and added the "--cleanup-existing-publications" option to the > drop_publication() > > > 06. 040_pg_createsubscriber.pl > > > > ``` > > +$node_s->start; > > +# Create publications to test it's removal > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > + > > +# Verify the existing publications > > +my $pub_count_before = > > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > > +is($pub_count_before, '2', > > + 'two publications created before --cleanup-publisher-objects is run'); > > + > > +$node_s->stop; > > ``` > > > > I feel it requires unnecessary startup and shutdown. IIUC, creating publications and check > > counts can be before stopping the node_s, around line 331. > > > > Fixed. > > > 07. 040_pg_createsubscriber.pl > > > > ``` > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > + > > +# Verify the existing publications > > +my $pub_count_before = > > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > > +is($pub_count_before, '2', > > + 'two publications created before --cleanup-publisher-objects is run'); > > + > > ``` > > > > Also, there is a possibility that CREATE PUBLICATION on node_p is not replicated yet > > when SELECT COUNT(*) is executed. Please wait for the replay. > > > > [1]: https://www.postgresql.org/message-id/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg%40mail.gmail.com > > [2]: https://www.postgresql.org/message-id/OSCPR01MB1496664FDC38DA40A441F449FF5EE2%40OSCPR01MB14966.jpnprd01.prod.outlook.com > > [3]: https://www.postgresql.org/docs/devel/source-format.html > > > > Fixed. > > The attached Patch contains the suggested changes. > Hi Shubham, I have some comments for v4 patch: 1. I think we should update the comment for the function 'drop_publication'. As its usage is changed with this patch Currently it states: /* * Remove publication if it couldn't finish all steps. */ 2. In case when --cleanup_existing_publications is not specified the info message has two double quotes. pg_createsubscriber: dropping publication ""pg_createsubscriber_5_aa3c31f2"" in database "postgres" The code: + appendPQExpBufferStr(targets, + PQescapeIdentifier(conn, dbinfo->pubname, + strlen(dbinfo->pubname))); It is appending the value along with the double quotes. I think we should assign the value of 'PQescapeIdentifier(conn, dbinfo->pubname, strlen(dbinfo->pubname)' to a string and then use it. Thanks and Regards, Shlok Kyal
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, Thanks for updating the patch! Few comments. 01. pg_createsubscriber.sgml ``` + <para> + Remove all existing publications on the subscriber node before creating + a subscription. + </para> + ``` I think this is not accurate. Publications on databases which are not target will retain. Also, I'm not sure it is important to clarify "before creating a subscription.". How about: Remove all existing publications from specified databases. 02. main() ``` + opt.cleanup_existing_publications = false; ``` ISTM initialization is done with the same ordering with CreateSubscriberOptions. Thus this should be at after recovery_timeout. 03. 040_pg_createsubscriber.pl ``` +$node_p->wait_for_replay_catchup($node_s); + +# Verify the existing publications +my $pub_count_before = + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); +is($pub_count_before, '2', + 'two publications created before --cleanup-existing-publications is run'); ``` I think no need to declare $pub_count_before. How about: ``` ok( $node_s->poll_query_until( $db1, "SELECT COUNT(*) = 2 FROM pg_publication"), 'two publications created before --cleanup-existing-publications is run'); ``` 04. 040_pg_createsubscriber.pl ``` +my $pub_count_after = + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); +is($pub_count_after, '0', + 'all publications dropped after --cleanup-existing-publications is run'); + ``` I think no need to declare $pub_count_after. How about: ``` is($node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"), '0', 'all publications dropped after --cleanup-existing-publications is run'); ``` 05. According to the document [1], we must do double-quote for each identifiers. But current patch seems not to do. Below example shows the case when three publications exist. ``` pg_createsubscriber: dropping publications "aaa, bbb, ccc" in database "postgres" ``` I think the output must be `"aaa", "bbb", "ccc"`. This means drop_publication() should be refactored. [1]: https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-QUOTES Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Feb 12, 2025 at 12:57 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shubham, > > Thanks for updating the patch! Few comments. > > 01. pg_createsubscriber.sgml > ``` > + <para> > + Remove all existing publications on the subscriber node before creating > + a subscription. > + </para> > + > ``` > Fixed. > I think this is not accurate. Publications on databases which are not target will > retain. Also, I'm not sure it is important to clarify "before creating a subscription.". > > How about: Remove all existing publications from specified databases. > > 02. main() > ``` > + opt.cleanup_existing_publications = false; > ``` > > ISTM initialization is done with the same ordering with CreateSubscriberOptions. > Thus this should be at after recovery_timeout. > Fixed. > 03. 040_pg_createsubscriber.pl > ``` > +$node_p->wait_for_replay_catchup($node_s); > + > +# Verify the existing publications > +my $pub_count_before = > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > +is($pub_count_before, '2', > + 'two publications created before --cleanup-existing-publications is run'); > ``` > > I think no need to declare $pub_count_before. How about: > > ``` > ok( $node_s->poll_query_until( > $db1, "SELECT COUNT(*) = 2 FROM pg_publication"), > 'two publications created before --cleanup-existing-publications is run'); > ``` > Fixed. > 04. 040_pg_createsubscriber.pl > ``` +my $pub_count_after = > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > +is($pub_count_after, '0', > + 'all publications dropped after --cleanup-existing-publications is run'); > + > ``` > > I think no need to declare $pub_count_after. How about: > > ``` > is($node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"), '0', > 'all publications dropped after --cleanup-existing-publications is run'); > ``` > Fixed. > 05. > > According to the document [1], we must do double-quote for each identifiers. But current > patch seems not to do. Below example shows the case when three publications exist. > > ``` > pg_createsubscriber: dropping publications "aaa, bbb, ccc" in database "postgres" > ``` > > I think the output must be `"aaa", "bbb", "ccc"`. This means drop_publication() should be refactored. > > [1]: https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-QUOTES > Fixed. The attached patch at [1] contains the suggested changes. [1] - https://www.postgresql.org/message-id/CAHv8RjKGywu%2B3cAv9MPgxi1_TUXBT8yzUsW%2Bf%3Dg5UsgeJ8Uk6g%40mail.gmail.com Thanks and regards, Shubham Khanna.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shlok Kyal
Date:
On Thu, 13 Feb 2025 at 15:20, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Tue, Feb 11, 2025 at 9:56 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > On Tue, 11 Feb 2025 at 09:51, Shubham Khanna > > <khannashubham1197@gmail.com> wrote: > > > > > > On Fri, Feb 7, 2025 at 7:46 AM Hayato Kuroda (Fujitsu) > > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > > > Dear Shubham, > > > > > > > > Thanks for updating the patch. > > > > > > > > Previously you told that you had a plan to extend the patch to drop other replication > > > > objects [1], but I think it is not needed. pg_createsubscriber has already been > > > > able to drop the existing subscrisubscriptions in check_and_drop_existing_subscriptions(). > > > > As for the replication slot, I have told in [2], it would be created intentionally > > > > thus I feel it should not be dropped. > > > > Thus I regard the patch does not have concrete extending plan. > > > > > > > > Below part contains my review comment. > > > > > > > > 01. Option name > > > > > > > > Based on the above discussion, "--cleanup-publisher-objects" is not suitable because > > > > it won't drop replication slots. How about "--cleanup-publications"? > > > > > > > > > > I have changed the name of the option to "--cleanup-existing-publications" > > > > > > > 02. usage() > > > > ``` > > > > + printf(_(" -C --cleanup-publisher-objects drop all publications on the logical replica\n")); > > > > ``` > > > > > > Fixed. > > > > > > > s/logical replica/subscriber > > > > > > > > 03. drop_all_publications > > > > ``` > > > > +/* Drops all existing logical replication publications from all subscriber > > > > + * databases > > > > + */ > > > > +static void > > > > ``` > > > > > > > > Initial line of the comment must be blank [3]. > > > > > > > > > > Removed this function. > > > > > > > 04. main > > > > ``` > > > > + {"cleanup-publisher-objects", no_argument, NULL, 'C'}, > > > > ``` > > > > > > > > Is there a reason why upper case is used? I feel lower one is enough. > > > > > > > > > > Fixed. > > > > > > > 05. main > > > > ``` > > > > + /* Drop publications from the subscriber if requested */ > > > > + if (opt.cleanup_publisher_objects) > > > > + drop_all_publications(dbinfo); > > > > ``` > > > > > > > > After considering more, I noticed that we have already called drop_publication() > > > > in the setup_subscriber(). Can we call drop_all_publications() there instead when > > > > -C is specified? > > > > > > > > > > I agree with you on this. I have removed the drop_all_publication() > > > and added the "--cleanup-existing-publications" option to the > > > drop_publication() > > > > > > > 06. 040_pg_createsubscriber.pl > > > > > > > > ``` > > > > +$node_s->start; > > > > +# Create publications to test it's removal > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > > > + > > > > +# Verify the existing publications > > > > +my $pub_count_before = > > > > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > > > > +is($pub_count_before, '2', > > > > + 'two publications created before --cleanup-publisher-objects is run'); > > > > + > > > > +$node_s->stop; > > > > ``` > > > > > > > > I feel it requires unnecessary startup and shutdown. IIUC, creating publications and check > > > > counts can be before stopping the node_s, around line 331. > > > > > > > > > > Fixed. > > > > > > > 07. 040_pg_createsubscriber.pl > > > > > > > > ``` > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > > > + > > > > +# Verify the existing publications > > > > +my $pub_count_before = > > > > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > > > > +is($pub_count_before, '2', > > > > + 'two publications created before --cleanup-publisher-objects is run'); > > > > + > > > > ``` > > > > > > > > Also, there is a possibility that CREATE PUBLICATION on node_p is not replicated yet > > > > when SELECT COUNT(*) is executed. Please wait for the replay. > > > > > > > > [1]: https://www.postgresql.org/message-id/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg%40mail.gmail.com > > > > [2]: https://www.postgresql.org/message-id/OSCPR01MB1496664FDC38DA40A441F449FF5EE2%40OSCPR01MB14966.jpnprd01.prod.outlook.com > > > > [3]: https://www.postgresql.org/docs/devel/source-format.html > > > > > > > > > > Fixed. > > > > > > The attached Patch contains the suggested changes. > > > > > > > Hi Shubham, > > > > I have some comments for v4 patch: > > 1. I think we should update the comment for the function > > 'drop_publication'. As its usage is changed with this patch > > Currently it states: > > /* > > * Remove publication if it couldn't finish all steps. > > */ > > > > Fixed. > > > 2. In case when --cleanup_existing_publications is not specified the > > info message has two double quotes. > > > > pg_createsubscriber: dropping publication > > ""pg_createsubscriber_5_aa3c31f2"" in database "postgres" > > > > The code: > > + appendPQExpBufferStr(targets, > > + PQescapeIdentifier(conn, dbinfo->pubname, > > + strlen(dbinfo->pubname))); > > > > It is appending the value along with the double quotes. I think we > > should assign the value of 'PQescapeIdentifier(conn, dbinfo->pubname, > > strlen(dbinfo->pubname)' to a string and then use it. > > > > Fixed. > > The attached patch contains the suggested changes. > Hi, I reviewed v5 patch, I have some comments: 1. I feel that from the description it is not clear from which node we are removing the publications. + Remove all existing publications from specified databases. Should we write it something like: Remove all existing publications from specified databases on the target server. Thoughts? 2. Based on observation in other files, I feel the description can be in next line: printf(_("\nOptions:\n")); + printf(_(" -c --cleanup-existing-publications drop all publications on the subscriber\n")); printf(_(" -d, --database=DBNAME database in which to create a subscription\n")); Something like + printf(_(" -c --cleanup-existing-publications\n" + drop all publications on the subscriber\n")); 3. Why are we using 'poll_query_until' +ok( $node_s->poll_query_until( + $db1, "SELECT COUNT(*) = 2 FROM pg_publication"), + 'two publications created before --cleanup-existing-publications is run'); + Should we use 'safe_psql'? Thanks and Regards, Shlok Kyal
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shlok Kyal
Date:
On Fri, 14 Feb 2025 at 10:50, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 5:36 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > On Thu, 13 Feb 2025 at 15:20, Shubham Khanna > > <khannashubham1197@gmail.com> wrote: > > > > > > On Tue, Feb 11, 2025 at 9:56 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > > > On Tue, 11 Feb 2025 at 09:51, Shubham Khanna > > > > <khannashubham1197@gmail.com> wrote: > > > > > > > > > > On Fri, Feb 7, 2025 at 7:46 AM Hayato Kuroda (Fujitsu) > > > > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > > > > > > > Dear Shubham, > > > > > > > > > > > > Thanks for updating the patch. > > > > > > > > > > > > Previously you told that you had a plan to extend the patch to drop other replication > > > > > > objects [1], but I think it is not needed. pg_createsubscriber has already been > > > > > > able to drop the existing subscrisubscriptions in check_and_drop_existing_subscriptions(). > > > > > > As for the replication slot, I have told in [2], it would be created intentionally > > > > > > thus I feel it should not be dropped. > > > > > > Thus I regard the patch does not have concrete extending plan. > > > > > > > > > > > > Below part contains my review comment. > > > > > > > > > > > > 01. Option name > > > > > > > > > > > > Based on the above discussion, "--cleanup-publisher-objects" is not suitable because > > > > > > it won't drop replication slots. How about "--cleanup-publications"? > > > > > > > > > > > > > > > > I have changed the name of the option to "--cleanup-existing-publications" > > > > > > > > > > > 02. usage() > > > > > > ``` > > > > > > + printf(_(" -C --cleanup-publisher-objects drop all publications on the logical replica\n")); > > > > > > ``` > > > > > > > > > > Fixed. > > > > > > > > > > > s/logical replica/subscriber > > > > > > > > > > > > 03. drop_all_publications > > > > > > ``` > > > > > > +/* Drops all existing logical replication publications from all subscriber > > > > > > + * databases > > > > > > + */ > > > > > > +static void > > > > > > ``` > > > > > > > > > > > > Initial line of the comment must be blank [3]. > > > > > > > > > > > > > > > > Removed this function. > > > > > > > > > > > 04. main > > > > > > ``` > > > > > > + {"cleanup-publisher-objects", no_argument, NULL, 'C'}, > > > > > > ``` > > > > > > > > > > > > Is there a reason why upper case is used? I feel lower one is enough. > > > > > > > > > > > > > > > > Fixed. > > > > > > > > > > > 05. main > > > > > > ``` > > > > > > + /* Drop publications from the subscriber if requested */ > > > > > > + if (opt.cleanup_publisher_objects) > > > > > > + drop_all_publications(dbinfo); > > > > > > ``` > > > > > > > > > > > > After considering more, I noticed that we have already called drop_publication() > > > > > > in the setup_subscriber(). Can we call drop_all_publications() there instead when > > > > > > -C is specified? > > > > > > > > > > > > > > > > I agree with you on this. I have removed the drop_all_publication() > > > > > and added the "--cleanup-existing-publications" option to the > > > > > drop_publication() > > > > > > > > > > > 06. 040_pg_createsubscriber.pl > > > > > > > > > > > > ``` > > > > > > +$node_s->start; > > > > > > +# Create publications to test it's removal > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > > > > > + > > > > > > +# Verify the existing publications > > > > > > +my $pub_count_before = > > > > > > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > > > > > > +is($pub_count_before, '2', > > > > > > + 'two publications created before --cleanup-publisher-objects is run'); > > > > > > + > > > > > > +$node_s->stop; > > > > > > ``` > > > > > > > > > > > > I feel it requires unnecessary startup and shutdown. IIUC, creating publications and check > > > > > > counts can be before stopping the node_s, around line 331. > > > > > > > > > > > > > > > > Fixed. > > > > > > > > > > > 07. 040_pg_createsubscriber.pl > > > > > > > > > > > > ``` > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > > > > > + > > > > > > +# Verify the existing publications > > > > > > +my $pub_count_before = > > > > > > + $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"); > > > > > > +is($pub_count_before, '2', > > > > > > + 'two publications created before --cleanup-publisher-objects is run'); > > > > > > + > > > > > > ``` > > > > > > > > > > > > Also, there is a possibility that CREATE PUBLICATION on node_p is not replicated yet > > > > > > when SELECT COUNT(*) is executed. Please wait for the replay. > > > > > > > > > > > > [1]: https://www.postgresql.org/message-id/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg%40mail.gmail.com > > > > > > [2]: https://www.postgresql.org/message-id/OSCPR01MB1496664FDC38DA40A441F449FF5EE2%40OSCPR01MB14966.jpnprd01.prod.outlook.com > > > > > > [3]: https://www.postgresql.org/docs/devel/source-format.html > > > > > > > > > > > > > > > > Fixed. > > > > > > > > > > The attached Patch contains the suggested changes. > > > > > > > > > > > > > Hi Shubham, > > > > > > > > I have some comments for v4 patch: > > > > 1. I think we should update the comment for the function > > > > 'drop_publication'. As its usage is changed with this patch > > > > Currently it states: > > > > /* > > > > * Remove publication if it couldn't finish all steps. > > > > */ > > > > > > > > > > Fixed. > > > > > > > 2. In case when --cleanup_existing_publications is not specified the > > > > info message has two double quotes. > > > > > > > > pg_createsubscriber: dropping publication > > > > ""pg_createsubscriber_5_aa3c31f2"" in database "postgres" > > > > > > > > The code: > > > > + appendPQExpBufferStr(targets, > > > > + PQescapeIdentifier(conn, dbinfo->pubname, > > > > + strlen(dbinfo->pubname))); > > > > > > > > It is appending the value along with the double quotes. I think we > > > > should assign the value of 'PQescapeIdentifier(conn, dbinfo->pubname, > > > > strlen(dbinfo->pubname)' to a string and then use it. > > > > > > > > > > Fixed. > > > > > > The attached patch contains the suggested changes. > > > > > > > Hi, > > > > I reviewed v5 patch, I have some comments: > > > > 1. I feel that from the description it is not clear from which node we > > are removing the publications. > > + Remove all existing publications from specified databases. > > > > Should we write it something like: > > Remove all existing publications from specified databases on the target server. > > > > Thoughts? > > > > 2. Based on observation in other files, I feel the description can be > > in next line: > > printf(_("\nOptions:\n")); > > + printf(_(" -c --cleanup-existing-publications drop all > > publications on the subscriber\n")); > > printf(_(" -d, --database=DBNAME database in which to > > create a subscription\n")); > > > > Something like > > > > + printf(_(" -c --cleanup-existing-publications\n" > > + drop all publications > > on the subscriber\n")); > > > > 3. Why are we using 'poll_query_until' > > > > +ok( $node_s->poll_query_until( > > + $db1, "SELECT COUNT(*) = 2 FROM pg_publication"), > > + 'two publications created before --cleanup-existing-publications is run'); > > + > > > > Should we use 'safe_psql'? > > > > Fixed the given comments. The attached patch contains the suggested changes. > Hi Shubham, I reviewed the v6 patch and here are some of my comments. 1. I think we do not need 'search_query' variable. We can directly combine it in PQexec. + const char *search_query = + "SELECT pubname FROM pg_catalog.pg_publication;"; - pg_log_info("dropping publication \"%s\" in database \"%s\"", - dbinfo->pubname, dbinfo->dbname); + res = PQexec(conn, search_query); 2. + * Remove all existing publications from specified databases on the target + * server. The previous comment in v5 for function 'drop_publication' was more appropriate. I think you mistakenly changed it here. I suggested changing it in pg_createsubscriber.sgml in point 1 of [1]. [1]: https://www.postgresql.org/message-id/CANhcyEUo7F954LULk859xs6FtwQ5USH6C2tiBbGwpihU2yHmAQ%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
Hi Shubham Some review comments for v7-0001. (I am late to this thread. If some of my comments have already been discussed and rejected please let me know). ====== 1 GENERAL. Option Name? Wondering why the patch is introducing more terminology like "cleanup"; if we are dropping publications then why not say "drop"? Also I am not sure if "existing" means anything because you cannot cleanup/drop something that is not "existing". IOW, why not call this the "--drop-publications" option? ====== Commit message 2. These publications, replicated during streaming replication, become redundant after converting to logical replication and serve no further purpose. ~ From this description it seems there is an assumption that the only publications on the target server are those that were physically replicated to the standby. Is that strictly true? Isn't it also possible that a user might have created their own publication on the target server prior to running the pg_createsubscriber. So even if they want all the physically replicated ones to be removed, they would NOT want their own new publication to also get removed at the same time. E.g. The original motivation thread [1] for this patch only said "But what good is having the SAME publications as primary also on logical replica?" (my emphasis of "same") so it seems there should be some sort of name matching before just dropping everything. Actually.... The code looks like it might be doing the correct thing already and only fetching the publication names from the source server and then deleting only those names from the target server. But this comment message didn't describe this clearly. ====== doc/src/sgml/ref/pg_createsubscriber.sgml 3. + <para> + Remove all existing publications from specified databases on tne target + server. + </para> typo "tne" ====== src/bin/pg_basebackup/pg_createsubscriber.c struct CreateSubscriberOptions: 4. + bool cleanup_existing_publications; /* drop all publications */ The field name seems overkill. e.g. As mentioned for the option, it could be called 'drop' instead of cleanup. And the 'existing' seems superfluous because you can only drop something that exists. So why not just 'drop_publications'. Won't that have the same meaning? ~~~ 5. static void -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn) +setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn, + bool cleanup_existing_publications) The setup_subscriber's function comment does not say anything about this function potentially also dropping publications at the subscriber. ~~~ 6. static void -drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo) +drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo, + bool cleanup_existing_publications) 6a. This arrangement seems complicated to me. IMO it would be more natural to have 2 separate functions, and just call the appropriate one. drop_publication() drop_all_publications() instead of trying to make this function do everything. ~ 6b. Furthermore, can't you just have a common helper function to DROP a single PUBLICATION by name? Then the code that drops all publications can just loop to call this common dropper for each iteration. Code should be much simpler. I don't see the efficiency of this operation is really a factor, pg_createsubscriber is rarely used, so IMO a better goal is code simplicity/maintenance. e.g. drop_publication() --> _drop_one_publication() e.g drop_all_publications() --> LOOP (pub list) { _drop_one_publication() } ====== .../t/040_pg_createsubscriber.pl 7. + +# Create publications to test it's removal +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); + /it's/their/ ~~~ 8. Maybe also do a CREATE PUBLICATION at node_s, prior to the pg_createsubvscript, so then you can verify that the user-created one is unaffected by the cleanup of all the others. ====== [1] https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
vignesh C
Date:
On Mon, 17 Feb 2025 at 09:49, Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shubham > > Some review comments for v7-0001. > > (I am late to this thread. If some of my comments have already been > discussed and rejected please let me know). > > ====== > 1 GENERAL. Option Name? > > Wondering why the patch is introducing more terminology like > "cleanup"; if we are dropping publications then why not say "drop"? > Also I am not sure if "existing" means anything because you cannot > cleanup/drop something that is not "existing". > > IOW, why not call this the "--drop-publications" option? We should keep this option generic as this same option should be enhanced further to handle cleaning other objects which was suggested earlier at [1]. How about something like: remove-non-logical-objects/clean-non-logical-objects/purge-non-replicated/discard-non-replicated or something better? [1] - https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com Regards, Vignesh
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Mon, Feb 17, 2025 at 9:49 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shubham > > Some review comments for v7-0001. > > (I am late to this thread. If some of my comments have already been > discussed and rejected please let me know). > > ====== > 1 GENERAL. Option Name? > > Wondering why the patch is introducing more terminology like > "cleanup"; if we are dropping publications then why not say "drop"? > Also I am not sure if "existing" means anything because you cannot > cleanup/drop something that is not "existing". > > IOW, why not call this the "--drop-publications" option? > I have retained the option name as '--cleanup-existing-publications' for now. However, I understand the concern regarding terminology and will revise it in the next patch version once there is a consensus on the final name. > ====== > Commit message > > 2. > These publications, replicated during streaming replication, become redundant > after converting to logical replication and serve no further purpose. > > ~ > > From this description it seems there is an assumption that the only > publications on the target server are those that were physically > replicated to the standby. Is that strictly true? Isn't it also > possible that a user might have created their own publication on the > target server prior to running the pg_createsubscriber. So even if > they want all the physically replicated ones to be removed, they would > NOT want their own new publication to also get removed at the same > time. > > E.g. The original motivation thread [1] for this patch only said "But > what good is having the SAME publications as primary also on logical > replica?" (my emphasis of "same") so it seems there should be some > sort of name matching before just dropping everything. > > Actually.... The code looks like it might be doing the correct thing > already and only fetching the publication names from the source server > and then deleting only those names from the target server. But this > comment message didn't describe this clearly. > > ====== Modified the commit message. > doc/src/sgml/ref/pg_createsubscriber.sgml > > 3. > + <para> > + Remove all existing publications from specified databases on tne target > + server. > + </para> > > typo "tne" > > ====== Fixed. > src/bin/pg_basebackup/pg_createsubscriber.c > > struct CreateSubscriberOptions: > > 4. > + bool cleanup_existing_publications; /* drop all publications */ > > The field name seems overkill. e.g. As mentioned for the option, it > could be called 'drop' instead of cleanup. And the 'existing' seems > superfluous because you can only drop something that exists. So why > not just 'drop_publications'. Won't that have the same meaning? > > ~~~ > Fixed. > 5. > static void > -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn) > +setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn, > + bool cleanup_existing_publications) > > The setup_subscriber's function comment does not say anything about > this function potentially also dropping publications at the > subscriber. > > ~~~ > Fixed. > 6. > static void > -drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo) > +drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo, > + bool cleanup_existing_publications) > > 6a. > This arrangement seems complicated to me. > > IMO it would be more natural to have 2 separate functions, and just > call the appropriate one. > drop_publication() > drop_all_publications() > > instead of trying to make this function do everything. > > ~ > > 6b. > Furthermore, can't you just have a common helper function to DROP a > single PUBLICATION by name? > > Then the code that drops all publications can just loop to call this > common dropper for each iteration. Code should be much simpler. I > don't see the efficiency of this operation is really a factor, > pg_createsubscriber is rarely used, so IMO a better goal is code > simplicity/maintenance. > > e.g. drop_publication() --> _drop_one_publication() > e.g drop_all_publications() --> LOOP (pub list) { _drop_one_publication() } > > ====== Fixed. > .../t/040_pg_createsubscriber.pl > > 7. > + > +# Create publications to test it's removal > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > + > > /it's/their/ > > ~~~ > Fixed. > 8. > Maybe also do a CREATE PUBLICATION at node_s, prior to the > pg_createsubvscript, so then you can verify that the user-created one > is unaffected by the cleanup of all the others. > > ====== Since $node_s is a streaming standby, it does not allow object creation. As a result, publications cannot be created on $node_s. > [1] https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com > The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
On Thu, Feb 20, 2025 at 5:44 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Mon, Feb 17, 2025 at 9:49 AM Peter Smith <smithpb2250@gmail.com> wrote: > > ... > > ====== > > 1 GENERAL. Option Name? > > > > Wondering why the patch is introducing more terminology like > > "cleanup"; if we are dropping publications then why not say "drop"? > > Also I am not sure if "existing" means anything because you cannot > > cleanup/drop something that is not "existing". > > > > IOW, why not call this the "--drop-publications" option? > > > > I have retained the option name as '--cleanup-existing-publications' > for now. However, I understand the concern regarding terminology and > will revise it in the next patch version once there is a consensus on > the final name. > BTW, Is it even correct to assume that the user has to choose between (a) clean everything or (b) clean nothing? That is why I felt something that mentions only ‘publication’ gives more flexibility. Later, when some DROP <other_stuff> feature might be added then we can make additional --drop-XXX switches, or a --drop-all switch for cleaning everything ////// Some more review comments for patch v8-0001 ====== Commit message 1. To prevent accidental removal of user-created publications on the subscriber, this cleanup process only targets publications that existed on the source server and were replicated to the subscriber. Any manually created publications on the subscriber will remain intact. ~ Is this scenario (manually created publications on the subscriber) possible to do? If yes, then there should be a test case for it If not, then maybe this whole paragraph needs rewording. ~~~ 2. This feature is optional and only takes effect when the '--cleanup-existing-publications' option is explicitly specified. Since publication cleanup is not required when upgrading streaming replication clusters, this option provides users with the flexibility to enable or skip the cleanup process as needed. ~ AFAICT, this is pretty much just saying that the switch is optional and that the feature only does something when the switch is specified by the user. But, doesn't all that just go without saying? ====== src/bin/pg_basebackup/pg_createsubscriber.c 3. + /* Drop all existing publications if requested */ + if (drop_publications) + { + pg_log_info("Dropping all existing publications in database \"%s\"", + dbinfo[i].dbname); + drop_all_publications(conn, dbinfo[i].dbname); + } + /* * Since the publication was created before the consistent LSN, it is * available on the subscriber when the physical replica is promoted. * Remove publications from the subscriber because it has no use. + * Additionally, drop publications existed before this command if + * requested. */ drop_publication(conn, &dbinfo[i]); ~ 3a. The logic of this code seems strange - e.g. it is optionally dropping all publications, and then dropping yet another one again. Won't that be already dropped as part of the drop_all? ~ 3b. Anyway, perhaps the logic should be encapsulated in a new helper check_and_drop_existing_publications() function to match the existing 'check_and_drop_existing_subscriptions() ~ 3c. I felt logs like "Dropping all existing publications in database \"%s\"" should be logged within the function drop_all_publications, not at the caller. ~~~ _drop_one_publication: 4. -/* - * Remove publication if it couldn't finish all steps. - */ +/* Helper function to drop a single publication by name. */ static void -drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo) +_drop_one_publication(PGconn *conn, const char *pubname, const char *dbname) A better name for this helper might be drop_publication_by_name() ~~~ 5. + appendPQExpBuffer(query, "DROP PUBLICATION %s", pubname); + pg_log_debug("Executing: %s", query->data); + pg_log_info("Dropping publication %s in database \"%s\"", pubname, dbinfo->dbname); That debug seems more typically written as: pg_log_debug("command is: %s", query->data); ~~~ drop_all_publications: 6. +/* Drop all publications in the database. */ +static void +drop_all_publications(PGconn *conn, const char *dbname) +{ + PGresult *res; + int count = 0; The 'count' variable seems unused. ~~~ 7. + for (int i = 0; i < PQntuples(res); i++) + { + char *pubname_esc = PQescapeIdentifier(conn, PQgetvalue(res, i, 0), + strlen(PQgetvalue(res, i, 0))); + + _drop_one_publication(conn, pubname_esc, dbname); + PQfreemem(pubname_esc); + count++; + } Instead of escaping the name here, and also escaping it in drop_publication(), that could be done inside the common helper function. ====== .../t/040_pg_createsubscriber.pl 8. +# Create publications to test their removal +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); 8a. If you are going to number the publications then it would be better to number all of them so there is a consistent naming. e.g. test_pub1,test_pub2; instead of test_pub,test_pub2 ~ 8b. Should share the same safe_psql for all DDL when it is possible. ~ 8c. Maybe the comment should give a bit more explanation of what is going on here. e.g. this is preparation for the following --cleanup-existing-publications test. Something that conveys the following (choose your own wording): Create some user-defined publications. After waiting for the streaming replication to replicate these to the standby, we can use the pg_createsubscriber to check that the --cleanup-existing-publications switch causes them to be removed. ~~~ 9. I wonder if you need 2 test cases i) use --cleanup-existing-publications and verify publication are removed ii) DON'T use --cleanup-existing-publications and verify publication still exist ====== Kind Regards, Peter Smith. Fujitsu Australia
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, Thanks for updating the patch. Here are comments. 01. ``` /* * Create the subscriptions, adjust the initial location for logical * replication and enable the subscriptions. That's the last step for logical * replication setup. If 'drop_publications' is true, existing publications on * the subscriber will be dropped before creating new subscriptions. */ static void setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn, bool drop_publications) ``` Even drop_publications is false, at least one publication would be dropped. The argument is not correct. I prefer previous name. 02. ``` /* Drop all existing publications if requested */ if (drop_publications) { pg_log_info("Dropping all existing publications in database \"%s\"", dbinfo[i].dbname); drop_all_publications(conn, dbinfo[i].dbname); } /* * Since the publication was created before the consistent LSN, it is * available on the subscriber when the physical replica is promoted. * Remove publications from the subscriber because it has no use. * Additionally, drop publications existed before this command if * requested. */ drop_publication(conn, &dbinfo[i]); ``` It is quite strange that drop_publication() is called even when drop_all_publications() is called. 03. Also, I'm not sure pg_log_info() is needed here. _drop_one_publication() outputs dropping publications. 04. ``` /* Helper function to drop a single publication by name. */ static void _drop_one_publication(PGconn *conn, const char *pubname, const char *dbname) ``` Functions recently added do not have prefix "_". I think it can be removed. 05. ``` pg_log_debug("Executing: %s", query->data); pg_log_info("Dropping publication %s in database \"%s\"", pubname, dbinfo->dbname); ``` In _drop_one_publication(), dbname is used only for the message. Can we move to pg_log_info() outside the function and reduce the argument? 06. Also, please start with lower case. 07. Also, please preserve the message as much as possible. Previously they are: ``` pg_log_info("dropping publication \"%s\" in database \"%s\"", dbinfo->pubname, dbinfo->dbname); pg_log_debug("command is: %s", str->data); ``` 08. I feel we must update made_publication. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Fri, Feb 21, 2025 at 8:31 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 5:44 PM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > On Mon, Feb 17, 2025 at 9:49 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > ... > > > ====== > > > 1 GENERAL. Option Name? > > > > > > Wondering why the patch is introducing more terminology like > > > "cleanup"; if we are dropping publications then why not say "drop"? > > > Also I am not sure if "existing" means anything because you cannot > > > cleanup/drop something that is not "existing". > > > > > > IOW, why not call this the "--drop-publications" option? > > > > > > > I have retained the option name as '--cleanup-existing-publications' > > for now. However, I understand the concern regarding terminology and > > will revise it in the next patch version once there is a consensus on > > the final name. > > > > BTW, Is it even correct to assume that the user has to choose between > (a) clean everything or (b) clean nothing? That is why I felt > something that mentions only ‘publication’ gives more flexibility. > Later, when some DROP <other_stuff> feature might be added then we can > make additional --drop-XXX switches, or a --drop-all switch for > cleaning everything > > ////// > > Some more review comments for patch v8-0001 > > ====== > Commit message > > 1. > To prevent accidental removal of user-created publications on the subscriber, > this cleanup process only targets publications that existed on the source > server and were replicated to the subscriber. > Any manually created publications on the subscriber will remain intact. > > ~ > > Is this scenario (manually created publications on the subscriber) > possible to do? > If yes, then there should be a test case for it > If not, then maybe this whole paragraph needs rewording. > > ~~~ Fixed. > > 2. > This feature is optional and only takes effect when the > '--cleanup-existing-publications' option is explicitly specified. > Since publication cleanup is not required when upgrading streaming replication > clusters, this option provides users with the flexibility to enable or skip the > cleanup process as needed. > > ~ > > AFAICT, this is pretty much just saying that the switch is optional > and that the feature only does something when the switch is specified > by the user. But, doesn't all that just go without saying? > > Fixed. > ====== > src/bin/pg_basebackup/pg_createsubscriber.c > > 3. > + /* Drop all existing publications if requested */ > + if (drop_publications) > + { > + pg_log_info("Dropping all existing publications in database \"%s\"", > + dbinfo[i].dbname); > + drop_all_publications(conn, dbinfo[i].dbname); > + } > + > /* > * Since the publication was created before the consistent LSN, it is > * available on the subscriber when the physical replica is promoted. > * Remove publications from the subscriber because it has no use. > + * Additionally, drop publications existed before this command if > + * requested. > */ > drop_publication(conn, &dbinfo[i]); > ~ > > 3a. > The logic of this code seems strange - e.g. it is optionally dropping > all publications, and then dropping yet another one again. Won't that > be already dropped as part of the drop_all? > > ~ > Fixed. > 3b. > Anyway, perhaps the logic should be encapsulated in a new helper > check_and_drop_existing_publications() function to match the existing > 'check_and_drop_existing_subscriptions() > > ~ > Added a new function 'check_and_drop_exixsting_publications()' > 3c. > I felt logs like "Dropping all existing publications in database > \"%s\"" should be logged within the function drop_all_publications, > not at the caller. > > ~~~ Fixed. > > _drop_one_publication: > > 4. > -/* > - * Remove publication if it couldn't finish all steps. > - */ > +/* Helper function to drop a single publication by name. */ > static void > -drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo) > +_drop_one_publication(PGconn *conn, const char *pubname, const char *dbname) > > A better name for this helper might be drop_publication_by_name() > > ~~~ Removed this function. > > 5. > + appendPQExpBuffer(query, "DROP PUBLICATION %s", pubname); > + pg_log_debug("Executing: %s", query->data); > + pg_log_info("Dropping publication %s in database \"%s\"", pubname, > dbinfo->dbname); > > That debug seems more typically written as: > pg_log_debug("command is: %s", query->data); > > ~~~ Fixed. > > drop_all_publications: > > 6. > +/* Drop all publications in the database. */ > +static void > +drop_all_publications(PGconn *conn, const char *dbname) > +{ > + PGresult *res; > + int count = 0; > > The 'count' variable seems unused. > > ~~~ Removed the usage of 'count'. > > 7. > + for (int i = 0; i < PQntuples(res); i++) > + { > + char *pubname_esc = PQescapeIdentifier(conn, PQgetvalue(res, i, 0), > + strlen(PQgetvalue(res, i, 0))); > + > + _drop_one_publication(conn, pubname_esc, dbname); > + PQfreemem(pubname_esc); > + count++; > + } > > Instead of escaping the name here, and also escaping it in > drop_publication(), that could be done inside the common helper > function. > Fixed. > ====== > .../t/040_pg_createsubscriber.pl > > 8. > +# Create publications to test their removal > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;"); > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;"); > > 8a. > If you are going to number the publications then it would be better to > number all of them so there is a consistent naming. > > e.g. test_pub1,test_pub2; instead of test_pub,test_pub2 > > ~ Fixed. > > 8b. > Should share the same safe_psql for all DDL when it is possible. > > ~ Fixed. > > 8c. > Maybe the comment should give a bit more explanation of what is going > on here. e.g. this is preparation for the following > --cleanup-existing-publications test. > > Something that conveys the following (choose your own wording): > > Create some user-defined publications. After waiting for the streaming > replication to replicate these to the standby, we can use the > pg_createsubscriber to check that the --cleanup-existing-publications > switch causes them to be removed. > > ~~~ Fixed. > > 9. > I wonder if you need 2 test cases > i) use --cleanup-existing-publications and verify publication are removed > ii) DON'T use --cleanup-existing-publications and verify publication still exist Added a new test case where the option is not being used and the verification of the user created publications is done. I have created two new nodes for this test case. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Fri, Feb 21, 2025 at 3:06 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shubham, > > Thanks for updating the patch. Here are comments. > > 01. > ``` > /* > * Create the subscriptions, adjust the initial location for logical > * replication and enable the subscriptions. That's the last step for logical > * replication setup. If 'drop_publications' is true, existing publications on > * the subscriber will be dropped before creating new subscriptions. > */ > static void > setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn, > bool drop_publications) > ``` > > Even drop_publications is false, at least one publication would be dropped. The > argument is not correct. I prefer previous name. > Fixed. > 02. > ``` > /* Drop all existing publications if requested */ > if (drop_publications) > { > pg_log_info("Dropping all existing publications in database \"%s\"", > dbinfo[i].dbname); > drop_all_publications(conn, dbinfo[i].dbname); > } > > /* > * Since the publication was created before the consistent LSN, it is > * available on the subscriber when the physical replica is promoted. > * Remove publications from the subscriber because it has no use. > * Additionally, drop publications existed before this command if > * requested. > */ > drop_publication(conn, &dbinfo[i]); > ``` > > It is quite strange that drop_publication() is called even when drop_all_publications() is called. > Fixed. > 03. > Also, I'm not sure pg_log_info() is needed here. _drop_one_publication() outputs dropping publications. > Fixed. > 04. > ``` > /* Helper function to drop a single publication by name. */ > static void > _drop_one_publication(PGconn *conn, const char *pubname, const char *dbname) > ``` > > Functions recently added do not have prefix "_". I think it can be removed. > Removed this function. > 05. > ``` > pg_log_debug("Executing: %s", query->data); > pg_log_info("Dropping publication %s in database \"%s\"", pubname, dbinfo->dbname); > ``` > In _drop_one_publication(), dbname is used only for the message. Can we move to pg_log_info() > outside the function and reduce the argument? > Fixed. > 06. > Also, please start with lower case. > Fixed. > 07. > Also, please preserve the message as much as possible. Previously they are: > ``` > pg_log_info("dropping publication \"%s\" in database \"%s\"", dbinfo->pubname, dbinfo->dbname); > pg_log_debug("command is: %s", str->data); > ``` > Fixed. > 08. > I feel we must update made_publication. > Fixed. The attached patch at [1] contains the suggested changes. [1] - https://www.postgresql.org/message-id/CAHv8Rj%2BzmkwunNubo%2B_Gp26S_qXbD%3Dp%2BrMfEnPnjiEE1A6GTXQ%40mail.gmail.com Thanks and regards, Shubham Khanna.
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, Thanks for updating the patch. Few comments. 01. CreateSubscriberOptions ``` + bool cleanup_existing_publications; /* drop all publications */ ``` My previous comment [1] #1 did not intend to update attributes. My point was only for the third argument of setup_subscriber(). 02. setup_subscriber() ``` + pg_log_info("cleaning up existing publications on the subscriber"); ``` I don't think this output is needed. check_and_drop_existing_publications() has the similar output. 03. drop_publication_by_name() ``` + + appendPQExpBuffer(query, "DROP PUBLICATION %s", pubname_esc); + pg_log_info("dropping publication %s in database \"%s\"", pubname, dbname); + pg_log_debug("command is: %s", query->data); ``` Currently, appendPQExpBuffer() is done after the pg_log_info(). Please reserve current ordering if there are no reasons. Also, variable "str" is currently used instead of query, please follow. 04. drop_publication_by_name() ``` if (!dry_run) { - res = PQexec(conn, str->data); + res = PQexec(conn, query->data); if (PQresultStatus(res) != PGRES_COMMAND_OK) + dbinfo->made_publication = false; + else { - pg_log_error("could not drop publication \"%s\" in database \"%s\": %s", - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); - dbinfo->made_publication = false; /* don't try again. */ - - /* - * Don't disconnect and exit here. This routine is used by primary - * (cleanup publication / replication slot due to an error) and - * subscriber (remove the replicated publications). In both cases, - * it can continue and provide instructions for the user to remove - * it later if cleanup fails. - */ + pg_log_error("could not drop publication %s in database \"%s\": %s", + pubname, dbname, PQresultErrorMessage(res)); } ``` pg_log_error() is exected when the command succeed: This is not correct. 05. 040_pg_createsubscriber.pl ``` +# Set up node A as primary +my $node_a = PostgreSQL::Test::Cluster->new('node_a'); +my $aconnstr = $node_a->connstr; +$node_a->init(allows_streaming => 'logical'); +$node_a->append_conf('postgresql.conf', 'autovacuum = off'); +$node_a->start; ``` I don't think new primary is not needed. Can't you reuse node_p? [1]: https://www.postgresql.org/message-id/OSCPR01MB14966D845AC77FC46ECEECCE4F5C72%40OSCPR01MB14966.jpnprd01.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Feb 25, 2025 at 4:50 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shubham, > > Thanks for updating the patch. Few comments. > > 01. CreateSubscriberOptions > ``` > + bool cleanup_existing_publications; /* drop all publications */ > ``` > > My previous comment [1] #1 did not intend to update attributes. My point was > only for the third argument of setup_subscriber(). > Fixed. > 02. setup_subscriber() > ``` > + pg_log_info("cleaning up existing publications on the subscriber"); > ``` > > I don't think this output is needed. check_and_drop_existing_publications() has the similar output. > Fixed. > 03. drop_publication_by_name() > ``` > + > + appendPQExpBuffer(query, "DROP PUBLICATION %s", pubname_esc); > + pg_log_info("dropping publication %s in database \"%s\"", pubname, dbname); > + pg_log_debug("command is: %s", query->data); > > ``` > > Currently, appendPQExpBuffer() is done after the pg_log_info(). Please reserve current ordering if > there are no reasons. Also, variable "str" is currently used instead of query, please follow. > Fixed. > 04. drop_publication_by_name() > ``` > if (!dry_run) > { > - res = PQexec(conn, str->data); > + res = PQexec(conn, query->data); > if (PQresultStatus(res) != PGRES_COMMAND_OK) > + dbinfo->made_publication = false; > + else > { > - pg_log_error("could not drop publication \"%s\" in database \"%s\": %s", > - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); > - dbinfo->made_publication = false; /* don't try again. */ > - > - /* > - * Don't disconnect and exit here. This routine is used by primary > - * (cleanup publication / replication slot due to an error) and > - * subscriber (remove the replicated publications). In both cases, > - * it can continue and provide instructions for the user to remove > - * it later if cleanup fails. > - */ > + pg_log_error("could not drop publication %s in database \"%s\": %s", > + pubname, dbname, PQresultErrorMessage(res)); > } > ``` > > pg_log_error() is exected when the command succeed: This is not correct. > Fixed. > 05. 040_pg_createsubscriber.pl > ``` > +# Set up node A as primary > +my $node_a = PostgreSQL::Test::Cluster->new('node_a'); > +my $aconnstr = $node_a->connstr; > +$node_a->init(allows_streaming => 'logical'); > +$node_a->append_conf('postgresql.conf', 'autovacuum = off'); > +$node_a->start; > ``` > > I don't think new primary is not needed. Can't you reuse node_p? > Fixed. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Feb 25, 2025 at 4:50 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shubham, > > Thanks for updating the patch. Few comments. > > 01. CreateSubscriberOptions > ``` > + bool cleanup_existing_publications; /* drop all publications */ > ``` > > My previous comment [1] #1 did not intend to update attributes. My point was > only for the third argument of setup_subscriber(). > > 02. setup_subscriber() > ``` > + pg_log_info("cleaning up existing publications on the subscriber"); > ``` > > I don't think this output is needed. check_and_drop_existing_publications() has the similar output. > > 03. drop_publication_by_name() > ``` > + > + appendPQExpBuffer(query, "DROP PUBLICATION %s", pubname_esc); > + pg_log_info("dropping publication %s in database \"%s\"", pubname, dbname); > + pg_log_debug("command is: %s", query->data); > > ``` > > Currently, appendPQExpBuffer() is done after the pg_log_info(). Please reserve current ordering if > there are no reasons. Also, variable "str" is currently used instead of query, please follow. > > 04. drop_publication_by_name() > ``` > if (!dry_run) > { > - res = PQexec(conn, str->data); > + res = PQexec(conn, query->data); > if (PQresultStatus(res) != PGRES_COMMAND_OK) > + dbinfo->made_publication = false; > + else > { > - pg_log_error("could not drop publication \"%s\" in database \"%s\": %s", > - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); > - dbinfo->made_publication = false; /* don't try again. */ > - > - /* > - * Don't disconnect and exit here. This routine is used by primary > - * (cleanup publication / replication slot due to an error) and > - * subscriber (remove the replicated publications). In both cases, > - * it can continue and provide instructions for the user to remove > - * it later if cleanup fails. > - */ > + pg_log_error("could not drop publication %s in database \"%s\": %s", > + pubname, dbname, PQresultErrorMessage(res)); > } > ``` > > pg_log_error() is exected when the command succeed: This is not correct. > > 05. 040_pg_createsubscriber.pl > ``` > +# Set up node A as primary > +my $node_a = PostgreSQL::Test::Cluster->new('node_a'); > +my $aconnstr = $node_a->connstr; > +$node_a->init(allows_streaming => 'logical'); > +$node_a->append_conf('postgresql.conf', 'autovacuum = off'); > +$node_a->start; > ``` > > I don't think new primary is not needed. Can't you reuse node_p? > > [1]: https://www.postgresql.org/message-id/OSCPR01MB14966D845AC77FC46ECEECCE4F5C72%40OSCPR01MB14966.jpnprd01.prod.outlook.com > The v10 patch at [1] required rebasing on the latest HEAD. I have prepared a rebased version and updated the patch accordingly. [1] - https://www.postgresql.org/message-id/CAHv8RjJtzXc4BWoKTyTB-9WEcAJ4N5qsD6YuXK3EAmsT6237yw%40mail.gmail.com The attached patch includes the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, I did a quick check and have a comment in drop_publication(). ``` + /* For cleanup purposes */ + dbinfos.dbinfo->made_publication = true; + ``` Please do not modify randomly. Currently, made_publication is set to false when the command fails. No need to set to true here - Please follow that. Also, please do not remove comments within the if statement. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Mon, Mar 3, 2025 at 5:59 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shubham, > > I did a quick check and have a comment in drop_publication(). > > ``` > + /* For cleanup purposes */ > + dbinfos.dbinfo->made_publication = true; > + > ``` > > Please do not modify randomly. Currently, made_publication is set to false when > the command fails. No need to set to true here - Please follow that. Also, please > do not remove comments within the if statement. > Fixed the suggested changes. The attached patch contains the required changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Nisha Moond
Date:
On Tue, Mar 4, 2025 at 10:30 AM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > Fixed the suggested changes. The attached patch contains the required changes. > Hi Shubham, Thanks for the patch! I tested its functionality and didn't find any issues so far. I'll continue with further testing. Here are some review comments for v12 patch: src/bin/pg_basebackup/pg_createsubscriber.c 1) + printf(_(" -c --cleanup-existing-publications\n" + " drop all publications on the subscriber\n")); Similar to docs, here too, we should clarify that publications will be dropped only from the specified databases, not all databases. Suggestion - "drop all publications from specified databases on the subscriber\n" 2) @@ -1171,10 +1179,12 @@ check_and_drop_existing_subscriptions(PGconn *conn, /* * Create the subscriptions, adjust the initial location for logical * replication and enable the subscriptions. That's the last step for logical - * replication setup. + * replication setup. If 'drop_publications' is true, existing publications on + * the subscriber will be dropped before creating new subscriptions. */ static void -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn) +setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn, + bool cleanup_existing_publications) { It is not clear that the 'drop_publications' value comes from the command. How about replacing it with: /If 'drop_publications' is/If 'drop_publications' option is/ OR If you meant to refer to the specific parameter 'cleanup_existing_publications', please use the exact name. 3) @@ -1195,8 +1205,13 @@ setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn) * Since the publication was created before the consistent LSN, it is * available on the subscriber when the physical replica is promoted. * Remove publications from the subscriber because it has no use. + * Additionally, drop publications existed before this command if + * requested. */ - drop_publication(conn, &dbinfo[i]); + if (cleanup_existing_publications) + check_and_drop_existing_publications(conn, dbinfo[i].dbname); + else + drop_publication_by_name(conn, dbinfo[i].pubname, dbinfo[i].dbname); The existing comment only refers to removing the new publication created for the current process and does not mention existing ones. With this patch changes, it is unclear which publications are being referenced when saying "Remove publications ..."(plural), and the phrase "before this command" is ambiguous—it's not clear which command is being referred to. The comment should be reworded for better clarity. Suggestion - "Since the publication was created before the consistent LSN, it remains on the subscriber even after the physical replica is promoted. Remove this publication from the subscriber because it has no use. Additionally, if requested, drop all pre-existing publications." 4) +/* Drop a single publication, given in dbinfo. */ +static void +drop_publication_by_name(PGconn *conn, const char *pubname, const char *dbname) +{ Since "dbinfo" is no longer passed to this function, the function comments should be updated accordingly. Also, use the same format as other function comments. Suggestion- /* * Drop the specified publication of the given database. */ 5) + pg_log_info("dropping publication %s in database \"%s\"", pubname, dbname); + Publication names should be enclosed in double quotes ("") in logs, as previously done. 6) + pg_log_error("could not drop publication %s in database \"%s\": %s", + pubname, dbname, PQresultErrorMessage(res)); Same as #5, use double quotes for the publication name. -- Thanks, Nisha
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Mar 4, 2025 at 4:31 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Tue, Mar 4, 2025 at 10:30 AM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > Fixed the suggested changes. The attached patch contains the required changes. > > > > Hi Shubham, > > Thanks for the patch! I tested its functionality and didn't find any > issues so far. I'll continue with further testing. > Here are some review comments for v12 patch: > > src/bin/pg_basebackup/pg_createsubscriber.c > > 1) > + printf(_(" -c --cleanup-existing-publications\n" > + " drop all publications on the > subscriber\n")); > > Similar to docs, here too, we should clarify that publications will be > dropped only from the specified databases, not all databases. > Suggestion - > "drop all publications from specified databases on the subscriber\n" > The suggested message was exceeding 95 words, so I have modified it to:- "drop all publications from specified subscriber databases\n" > 2) > @@ -1171,10 +1179,12 @@ check_and_drop_existing_subscriptions(PGconn *conn, > /* > * Create the subscriptions, adjust the initial location for logical > * replication and enable the subscriptions. That's the last step for logical > - * replication setup. > + * replication setup. If 'drop_publications' is true, existing publications on > + * the subscriber will be dropped before creating new subscriptions. > */ > static void > -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn) > +setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn, > + bool cleanup_existing_publications) > { > > It is not clear that the 'drop_publications' value comes from the > command. How about replacing it with: > /If 'drop_publications' is/If 'drop_publications' option is/ > > OR > > If you meant to refer to the specific parameter > 'cleanup_existing_publications', please use the exact name. > Fixed. > 3) > @@ -1195,8 +1205,13 @@ setup_subscriber(struct LogicalRepInfo *dbinfo, > const char *consistent_lsn) > * Since the publication was created before the consistent LSN, it is > * available on the subscriber when the physical replica is promoted. > * Remove publications from the subscriber because it has no use. > + * Additionally, drop publications existed before this command if > + * requested. > */ > - drop_publication(conn, &dbinfo[i]); > + if (cleanup_existing_publications) > + check_and_drop_existing_publications(conn, dbinfo[i].dbname); > + else > + drop_publication_by_name(conn, dbinfo[i].pubname, dbinfo[i].dbname); > > The existing comment only refers to removing the new publication > created for the current process and does not mention existing ones. > With this patch changes, it is unclear which publications are being > referenced when saying "Remove publications ..."(plural), and the > phrase "before this command" is ambiguous—it's not clear which command > is being referred to. The comment should be reworded for better > clarity. > > Suggestion - > > "Since the publication was created before the consistent LSN, it > remains on the subscriber even after the physical replica is > promoted. Remove this publication from the subscriber because > it has no use. Additionally, if requested, drop all pre-existing > publications." > Fixed. > 4) > +/* Drop a single publication, given in dbinfo. */ > +static void > +drop_publication_by_name(PGconn *conn, const char *pubname, const char *dbname) > +{ > > Since "dbinfo" is no longer passed to this function, the function > comments should be updated accordingly. Also, use the same format as > other function comments. > Suggestion- > /* > * Drop the specified publication of the given database. > */ > Fixed. > 5) > + pg_log_info("dropping publication %s in database \"%s\"", pubname, dbname); > + > > Publication names should be enclosed in double quotes ("") in logs, as > previously done. > Fixed. > 6) > + pg_log_error("could not drop publication %s in database \"%s\": %s", > + pubname, dbname, PQresultErrorMessage(res)); > > Same as #5, use double quotes for the publication name. > > -- Fixed. The attached Patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Nisha Moond
Date:
On Tue, Mar 4, 2025 at 8:05 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > The attached Patch contains the suggested changes. > Hi Shubham, Here are few comments for 040_pg_createsubscriber.pl 1) +# Run pg_createsubscriber on node S using '--cleanup-existing-publications'. +# --verbose is used twice to show more information. 1a) /node S/ node S1 1b) Also, can we keep the comment in the same pattern as it was earlier - # Run pg_createsubscriber on node S1. --verbose is used twice # to show more information. # In passing, also test the --enable-two-phase and --cleanup-existing-publications # options. 2) +# Reuse P as primary +# Set up node S2 as standby linking to node P +$node_p->backup('backup_3'); /Reuse P as/ Reuse node P as/ 3) +$node_s2->append_conf( + 'postgresql.conf', qq[ + primary_conninfo = '$pconnstr' + hot_standby_feedback = on + max_logical_replication_workers = 5 + ]); Do we need "hot_standby_feedback = on" on node_s2? I think we can remove it. 4) +# Create user-defined publications +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub3 FOR ALL TABLES;"); +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub4 FOR ALL TABLES;"); Can create both publications under one safe_psql as - $node_p->safe_psql($db3, qq[CREATE PUBLICATION test_pub3 FOR ALL TABLES; CREATE PUBLICATION test_pub4 FOR ALL TABLES; ]); 5) +# Run pg_createsubscriber on node A without using +# '--cleanup-existing-publications'. +# --verbose is used twice to show more information. 5a) /node A/node S2/ 5b) /without using '--cleanup-existing-publications' / without '--cleanup-existing-publications' option 6) + ], + 'run pg_createsubscriber without --cleanup-existing-publications on node A' +); /node A/node S2/ 7) +# Drop the newly created publications +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub3;"); +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub4;"); Similar to #4, use single safe_psql to drop both the publications. OTOH, do we really need to drop the publications here? I think we can remove this part since we're performing cleanup right after. ~~~~ -- Thanks, Nisha
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
Hi Shubham. Some review comments for patch v13-0001. ====== GENERAL 1. --cleanup-existing-publications I've never liked this proposed switch name much. e.g. why say "cleanup" instead of "drop"? What is the difference? Saying drop is very explicit about what the switch will do. e.g. why say "existing"? It seems redundant; you can't cleanup/drop something that does not exist. My preference is one of: --drop-publications --drop-all-publications either of which seem nicely aligned with the descriptions in the usage and docs. Yeah, I know I have raised this same point before, but AFAICT the reply was like "will revise it in the next patch version", but that was many versions ago. I think it is important to settle the switch name earlier than later because there will be many tentacles into the code (vars, params, fields, comments) and docs if anything changes -- so it is not a decision you want to leave till the end because it could destabilise everything at the last minute. ====== Commit message 2. By default, publications are preserved to avoid unintended data loss. ~ Was there supposed to be a blank line before this text, or should this text be wrapped into the preceding paragraph? ====== src/bin/pg_basebackup/pg_createsubscriber.c setup_subscriber: 3. /* * Create the subscriptions, adjust the initial location for logical * replication and enable the subscriptions. That's the last step for logical - * replication setup. + * replication setup. If 'drop_publications' options is true, existing + * publications on the subscriber will be dropped before creating new + * subscriptions. */ There are multiple things amiss with this comment. - 'drop_publications' is not the parameter name - 'drop_publications' options [sic plural??]. It is not an option here; it is a parameter ~~~ check_and_drop_existing_publications: 4. /* - * Remove publication if it couldn't finish all steps. + * Check and drop existing publications on the subscriber if requested. */ There is no need to say "if requested.". It is akin to saying this function does XXX if this function is called. ~~~ drop_publication_by_name: 5. +/* Drop the specified publication of the given database. */ +static void +drop_publication_by_name(PGconn *conn, const char *pubname, const char *dbname) 5a. I think it is better to define this function before check_and_drop_existing_publications. YMMV. ~ 5b. IMO the parameters should be reordered (PGconn *conn, const char *dbname, const char *pubname). It seems more natural and would be consistent with check_and_drop_existing_publications. YMMV. ~~~ 6. - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); - dbinfo->made_publication = false; /* don't try again. */ + pubname, dbname, PQresultErrorMessage(res)); + dbinfos.dbinfo->made_publication = false; /* don't try again. */ Something about this flag assignment seems odd to me. IIUC 'made_publications' is for removing the cleanup_objects_atexit to be able to remove the special publication implicitly made by the pg_createsubscriber. But, AFAIK we can also get to this code via the --cleanup-existing-publication switch, so actually it might be one of the "existing" publication DROPS that has failed. If so, then the "don't try again comment" is misleading because it may not be that same publication "again" at all. ====== .../t/040_pg_createsubscriber.pl GENERAL. 7. Most of the changes to this test code are just changing node name S -> S1. It's not much to do with the patch other than tidying it in preparation for a new node S2. OTOH it makes the review far harder because there are so many changes. IMO all this S->S1 stuff should be done as a separate pre-requisite patch and then it will be far easier to see what changes are added for the --clean-existing-publications testing. ~~~ 8. # Set up node S as standby linking to node P $node_p->backup('backup_1'); -my $node_s = PostgreSQL::Test::Cluster->new('node_s'); -$node_s->init_from_backup($node_p, 'backup_1', has_streaming => 1); -$node_s->append_conf( +my $node_s1 = PostgreSQL::Test::Cluster->new('node_s1'); +$node_s1->init_from_backup($node_p, 'backup_1', has_streaming => 1); +$node_s1->append_conf( The comment should refer to S1, not S. ~~~ 9. # Set up node C as standby linking to node S -$node_s->backup('backup_2'); +$node_s1->backup('backup_2'); my $node_c = PostgreSQL::Test::Cluster->new('node_c'); -$node_c->init_from_backup($node_s, 'backup_2', has_streaming => 1); +$node_c->init_from_backup($node_s1, 'backup_2', has_streaming => 1); The comment should refer to S1, not S. ~~~ 10. # Check some unmet conditions on node S -$node_s->append_conf( +$node_s1->append_conf( The comment should refer to S1, not S. (note... there are lots of these. You should search/fix them all; these review comments might miss some) ~~~ 11. + '--socketdir' => $node_s1->host, + '--subscriber-port' => $node_s1->port, '--database' => $db1, '--database' => $db2, ], 'standby contains unmet conditions on node S'); The message should refer to S1, not S. (note... there are lots of these. You should search/fix them all; these review comments might miss some) ~~~ 12. # dry run mode on node S command_ok( @@ -338,10 +353,10 @@ command_ok( '--verbose', '--dry-run', '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default, - '--pgdata' => $node_s->data_dir, + '--pgdata' => $node_s1->data_dir, '--publisher-server' => $node_p->connstr($db1), - '--socketdir' => $node_s->host, - '--subscriber-port' => $node_s->port, + '--socketdir' => $node_s1->host, + '--subscriber-port' => $node_s1->port, '--publication' => 'pub1', '--publication' => 'pub2', '--subscription' => 'sub1', @@ -352,10 +367,11 @@ command_ok( 'run pg_createsubscriber --dry-run on node S'); Comment and Message should refer to S1, not S. ~~~ 13. # Check if node S is still a standby -$node_s->start; -is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'), - 't', 'standby is in recovery'); -$node_s->stop; +$node_s1->start; +is( $node_s1->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'), + 't', + 'standby is in recovery'); +$node_s1->stop; The comment should refer to S1, not S. ~~~ 14. -# Run pg_createsubscriber on node S. --verbose is used twice -# to show more information. +# Run pg_createsubscriber on node S using '--cleanup-existing-publications'. +# --verbose is used twice to show more information. # In passing, also test the --enable-two-phase option command_ok( [ 'pg_createsubscriber', '--verbose', '--verbose', '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default, - '--pgdata' => $node_s->data_dir, + '--pgdata' => $node_s1->data_dir, '--publisher-server' => $node_p->connstr($db1), - '--socketdir' => $node_s->host, - '--subscriber-port' => $node_s->port, + '--socketdir' => $node_s1->host, + '--subscriber-port' => $node_s1->port, '--publication' => 'pub1', '--publication' => 'pub2', '--replication-slot' => 'replslot1', '--replication-slot' => 'replslot2', '--database' => $db1, '--database' => $db2, - '--enable-two-phase' + '--enable-two-phase', + '--cleanup-existing-publications', ], 'run pg_createsubscriber on node S'); Comment and Message should refer to S1, not S. ~~~ 15. +# Create user-defined publications +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub3 FOR ALL TABLES;"); +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub4 FOR ALL TABLES;"); Probably these can be combined. ~~~ 16. +# Drop the newly created publications +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub3;"); +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub4;"); + Probably these can be combined. ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 5, 2025 at 3:55 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Tue, Mar 4, 2025 at 8:05 PM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > The attached Patch contains the suggested changes. > > > > Hi Shubham, > > Here are few comments for 040_pg_createsubscriber.pl > > 1) > +# Run pg_createsubscriber on node S using '--cleanup-existing-publications'. > +# --verbose is used twice to show more information. > > 1a) /node S/ node S1 > > 1b) Also, can we keep the comment in the same pattern as it was earlier - > # Run pg_createsubscriber on node S1. --verbose is used twice > # to show more information. > # In passing, also test the --enable-two-phase and > --cleanup-existing-publications > # options. > Fixed. > 2) > +# Reuse P as primary > +# Set up node S2 as standby linking to node P > +$node_p->backup('backup_3'); > > /Reuse P as/ Reuse node P as/ > Fixed. > 3) > +$node_s2->append_conf( > + 'postgresql.conf', qq[ > + primary_conninfo = '$pconnstr' > + hot_standby_feedback = on > + max_logical_replication_workers = 5 > + ]); > > Do we need "hot_standby_feedback = on" on node_s2? I think we can remove it. > Removed and updated the configurations. > 4) > +# Create user-defined publications > +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub3 FOR ALL TABLES;"); > +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub4 FOR ALL TABLES;"); > > Can create both publications under one safe_psql as - > > $node_p->safe_psql($db3, qq[CREATE PUBLICATION test_pub3 FOR ALL TABLES; > CREATE PUBLICATION test_pub4 FOR ALL TABLES; > ]); > Fixed. > 5) > +# Run pg_createsubscriber on node A without using > +# '--cleanup-existing-publications'. > +# --verbose is used twice to show more information. > > 5a) /node A/node S2/ > 5b) /without using '--cleanup-existing-publications' / without > '--cleanup-existing-publications' option > Fixed. > 6) > + ], > + 'run pg_createsubscriber without --cleanup-existing-publications on node A' > +); > > /node A/node S2/ > Fixed. > 7) > +# Drop the newly created publications > +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub3;"); > +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub4;"); > > Similar to #4, use single safe_psql to drop both the publications. > OTOH, do we really need to drop the publications here? I think we can > remove this part since we're performing cleanup right after. > ~~~~ > > -- Fixed. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Thu, Mar 6, 2025 at 9:27 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shubham. > > Some review comments for patch v13-0001. > > ====== > GENERAL > > 1. > --cleanup-existing-publications > > I've never liked this proposed switch name much. > > e.g. why say "cleanup" instead of "drop"? What is the difference? > Saying drop is very explicit about what the switch will do. > e.g. why say "existing"? It seems redundant; you can't cleanup/drop > something that does not exist. > > My preference is one of: > --drop-publications > --drop-all-publications > > either of which seem nicely aligned with the descriptions in the usage and docs. > > Yeah, I know I have raised this same point before, but AFAICT the > reply was like "will revise it in the next patch version", but that > was many versions ago. I think it is important to settle the switch > name earlier than later because there will be many tentacles into the > code (vars, params, fields, comments) and docs if anything changes -- > so it is not a decision you want to leave till the end because it > could destabilise everything at the last minute. > I have updated the option name to '--drop-all-publications'. > ====== > Commit message > > 2. > By default, publications are preserved to avoid unintended data loss. > > ~ > > Was there supposed to be a blank line before this text, or should this > text be wrapped into the preceding paragraph? > Fixed. > ====== > src/bin/pg_basebackup/pg_createsubscriber.c > > setup_subscriber: > > 3. > /* > * Create the subscriptions, adjust the initial location for logical > * replication and enable the subscriptions. That's the last step for logical > - * replication setup. > + * replication setup. If 'drop_publications' options is true, existing > + * publications on the subscriber will be dropped before creating new > + * subscriptions. > */ > > There are multiple things amiss with this comment. > - 'drop_publications' is not the parameter name > - 'drop_publications' options [sic plural??]. It is not an option > here; it is a parameter > Fixed. > ~~~ > > check_and_drop_existing_publications: > > 4. > /* > - * Remove publication if it couldn't finish all steps. > + * Check and drop existing publications on the subscriber if requested. > */ > > There is no need to say "if requested.". It is akin to saying this > function does XXX if this function is called. > Fixed. > ~~~ > > drop_publication_by_name: > > 5. > +/* Drop the specified publication of the given database. */ > +static void > +drop_publication_by_name(PGconn *conn, const char *pubname, const char *dbname) > > 5a. > I think it is better to define this function before > check_and_drop_existing_publications. YMMV. > > ~ > > 5b. > IMO the parameters should be reordered (PGconn *conn, const char > *dbname, const char *pubname). It seems more natural and would be > consistent with check_and_drop_existing_publications. YMMV. > > ~~~ Fixed. > > 6. > - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); > - dbinfo->made_publication = false; /* don't try again. */ > + pubname, dbname, PQresultErrorMessage(res)); > + dbinfos.dbinfo->made_publication = false; /* don't try again. */ > > Something about this flag assignment seems odd to me. IIUC > 'made_publications' is for removing the cleanup_objects_atexit to be > able to remove the special publication implicitly made by the > pg_createsubscriber. But, AFAIK we can also get to this code via the > --cleanup-existing-publication switch, so actually it might be one of > the "existing" publication DROPS that has failed. If so, then the > "don't try again comment" is misleading because it may not be that > same publication "again" at all. > I agree with your point. The current comment is misleading, as the failure could be related to an existing publication drop via --cleanup-existing-publications now --drop-all-publications, not just the publication created by pg_createsubscriber. This issue is still open, and I will address it in the next version of the patch. > ====== > .../t/040_pg_createsubscriber.pl > > GENERAL. > > 7. > Most of the changes to this test code are just changing node name S -> > S1. It's not much to do with the patch other than tidying it in > preparation for a new node S2. OTOH it makes the review far harder > because there are so many changes. IMO all this S->S1 stuff should be > done as a separate pre-requisite patch and then it will be far easier > to see what changes are added for the --clean-existing-publications > testing. > > ~~~ > I understand your concern. Since I am using two nodes (node_s1 and node_s2), I will work on creating a separate prerequisite patch for renaming S -> S1. This should make it easier to review the actual changes related to --cleanup-existing-publications now --drop-all-publications testing. > 8. > # Set up node S as standby linking to node P > $node_p->backup('backup_1'); > -my $node_s = PostgreSQL::Test::Cluster->new('node_s'); > -$node_s->init_from_backup($node_p, 'backup_1', has_streaming => 1); > -$node_s->append_conf( > +my $node_s1 = PostgreSQL::Test::Cluster->new('node_s1'); > +$node_s1->init_from_backup($node_p, 'backup_1', has_streaming => 1); > +$node_s1->append_conf( > > The comment should refer to S1, not S. > > ~~~ > Fixed. > 9. > # Set up node C as standby linking to node S > -$node_s->backup('backup_2'); > +$node_s1->backup('backup_2'); > my $node_c = PostgreSQL::Test::Cluster->new('node_c'); > -$node_c->init_from_backup($node_s, 'backup_2', has_streaming => 1); > +$node_c->init_from_backup($node_s1, 'backup_2', has_streaming => 1); > > The comment should refer to S1, not S. > > ~~~ > Fixed. > 10. > # Check some unmet conditions on node S > -$node_s->append_conf( > +$node_s1->append_conf( > > The comment should refer to S1, not S. > > (note... there are lots of these. You should search/fix them all; > these review comments might miss some) > > ~~~ > Fixed. > 11. > + '--socketdir' => $node_s1->host, > + '--subscriber-port' => $node_s1->port, > '--database' => $db1, > '--database' => $db2, > ], > 'standby contains unmet conditions on node S'); > > The message should refer to S1, not S. > > (note... there are lots of these. You should search/fix them all; > these review comments might miss some) > > ~~~ > Fixed. > 12. > # dry run mode on node S > command_ok( > @@ -338,10 +353,10 @@ command_ok( > '--verbose', > '--dry-run', > '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default, > - '--pgdata' => $node_s->data_dir, > + '--pgdata' => $node_s1->data_dir, > '--publisher-server' => $node_p->connstr($db1), > - '--socketdir' => $node_s->host, > - '--subscriber-port' => $node_s->port, > + '--socketdir' => $node_s1->host, > + '--subscriber-port' => $node_s1->port, > '--publication' => 'pub1', > '--publication' => 'pub2', > '--subscription' => 'sub1', > @@ -352,10 +367,11 @@ command_ok( > 'run pg_createsubscriber --dry-run on node S'); > Comment and Message should refer to S1, not S. > > ~~~ Fixed. > > 13. > # Check if node S is still a standby > -$node_s->start; > -is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'), > - 't', 'standby is in recovery'); > -$node_s->stop; > +$node_s1->start; > +is( $node_s1->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'), > + 't', > + 'standby is in recovery'); > +$node_s1->stop; > > The comment should refer to S1, not S. > > ~~~ Fixed. > > 14. > -# Run pg_createsubscriber on node S. --verbose is used twice > -# to show more information. > +# Run pg_createsubscriber on node S using '--cleanup-existing-publications'. > +# --verbose is used twice to show more information. > # In passing, also test the --enable-two-phase option > command_ok( > [ > 'pg_createsubscriber', > '--verbose', '--verbose', > '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default, > - '--pgdata' => $node_s->data_dir, > + '--pgdata' => $node_s1->data_dir, > '--publisher-server' => $node_p->connstr($db1), > - '--socketdir' => $node_s->host, > - '--subscriber-port' => $node_s->port, > + '--socketdir' => $node_s1->host, > + '--subscriber-port' => $node_s1->port, > '--publication' => 'pub1', > '--publication' => 'pub2', > '--replication-slot' => 'replslot1', > '--replication-slot' => 'replslot2', > '--database' => $db1, > '--database' => $db2, > - '--enable-two-phase' > + '--enable-two-phase', > + '--cleanup-existing-publications', > ], > 'run pg_createsubscriber on node S'); > > Comment and Message should refer to S1, not S. > > ~~~ > Fixed. > 15. > +# Create user-defined publications > +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub3 FOR ALL TABLES;"); > +$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub4 FOR ALL TABLES;"); > > Probably these can be combined. > > ~~~ > Fixed. > 16. > +# Drop the newly created publications > +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub3;"); > +$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub4;"); > + > > Probably these can be combined. > Fixed. The attached patch at [1] contains the required changes. [1] - https://www.postgresql.org/message-id/CAHv8RjLCZyzFMGR8SBdvSocRZGAAr_eRd4ht48kXCp9oEaKQeQ%40mail.gmail.com Thanks and regards, Shubham Khanna.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Nisha Moond
Date:
On Mon, Mar 10, 2025 at 12:11 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Thu, Mar 6, 2025 at 9:27 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > 6. > > - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); > > - dbinfo->made_publication = false; /* don't try again. */ > > + pubname, dbname, PQresultErrorMessage(res)); > > + dbinfos.dbinfo->made_publication = false; /* don't try again. */ > > > > Something about this flag assignment seems odd to me. IIUC > > 'made_publications' is for removing the cleanup_objects_atexit to be > > able to remove the special publication implicitly made by the > > pg_createsubscriber. But, AFAIK we can also get to this code via the > > --cleanup-existing-publication switch, so actually it might be one of > > the "existing" publication DROPS that has failed. If so, then the > > "don't try again comment" is misleading because it may not be that > > same publication "again" at all. > > > > I agree with your point. The current comment is misleading, as the > failure could be related to an existing publication drop via > --cleanup-existing-publications now --drop-all-publications, not just > the publication created by pg_createsubscriber. > This issue is still open, and I will address it in the next version of > the patch. > 1) We should set "made_publication = false" only if the failure occurs for the special publication implicitly created by pg_createsubscriber. If the error is in any other publication, this special publication should still be cleaned up by the cleanup_objects_atexit. 2) This part of code has another bug: "dbinfos.dbinfo->made_publication = false;" incorrectly modifies dbinfo[0], even if the failure occurs in other databases (dbinfo[1], etc.). Since cleanup_objects_atexit() checks made_publication per database, this could lead to incorrect behavior. Solution: Pass the full 'dbinfo' structure to drop_publication_by_name() , and use it accordingly. -- Thanks, Nisha
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Mon, Mar 10, 2025 at 3:09 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Mon, Mar 10, 2025 at 12:11 PM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > On Thu, Mar 6, 2025 at 9:27 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > 6. > > > - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); > > > - dbinfo->made_publication = false; /* don't try again. */ > > > + pubname, dbname, PQresultErrorMessage(res)); > > > + dbinfos.dbinfo->made_publication = false; /* don't try again. */ > > > > > > Something about this flag assignment seems odd to me. IIUC > > > 'made_publications' is for removing the cleanup_objects_atexit to be > > > able to remove the special publication implicitly made by the > > > pg_createsubscriber. But, AFAIK we can also get to this code via the > > > --cleanup-existing-publication switch, so actually it might be one of > > > the "existing" publication DROPS that has failed. If so, then the > > > "don't try again comment" is misleading because it may not be that > > > same publication "again" at all. > > > > > > > I agree with your point. The current comment is misleading, as the > > failure could be related to an existing publication drop via > > --cleanup-existing-publications now --drop-all-publications, not just > > the publication created by pg_createsubscriber. > > This issue is still open, and I will address it in the next version of > > the patch. > > > > 1) We should set "made_publication = false" only if the failure occurs > for the special publication implicitly created by pg_createsubscriber. > If the error is in any other publication, this special publication > should still be cleaned up by the cleanup_objects_atexit. > Fixed. > 2) This part of code has another bug: > "dbinfos.dbinfo->made_publication = false;" incorrectly modifies > dbinfo[0], even if the failure occurs in other databases (dbinfo[1], > etc.). Since cleanup_objects_atexit() checks made_publication per > database, this could lead to incorrect behavior. > Solution: Pass the full 'dbinfo' structure to > drop_publication_by_name() , and use it accordingly. > > -- Fixed. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Nisha Moond
Date:
On Mon, Mar 10, 2025 at 4:31 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > > 2) This part of code has another bug: > > "dbinfos.dbinfo->made_publication = false;" incorrectly modifies > > dbinfo[0], even if the failure occurs in other databases (dbinfo[1], > > etc.). Since cleanup_objects_atexit() checks made_publication per > > database, this could lead to incorrect behavior. > > Solution: Pass the full 'dbinfo' structure to > > drop_publication_by_name() , and use it accordingly. > > > > -- > > Fixed. > > The attached patch contains the suggested changes. > Thanks for the patch. Here are few comments for the new change - 1) +static void check_and_drop_existing_publications(PGconn *conn, struct LogicalRepInfo *dbinfo); - move the second parameter to the next line to maintain 80 char length. 2) pg_log_error("could not drop publication \"%s\" in database \"%s\": %s", - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); + pubname, dbinfo->dbname, + PQresultErrorMessage(res)); You can keep "PQresultErrorMessage(res));" in the previous line, it will still be < 80 chars. 3) The new IF check - + if (strcmp(pubname, dbinfos.dbinfo->pubname) == 0) + dbinfo->made_publication = false; /* don't try again. */ always compares dbinfo[0]->pubname, but it should compare 'pubname' with the respective database's publication. Please correct it as - if (strcmp(pubname, dbinfo->pubname) == 0) -- Thanks, Nisha
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Mon, Mar 10, 2025 at 5:02 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Mon, Mar 10, 2025 at 4:31 PM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > > 2) This part of code has another bug: > > > "dbinfos.dbinfo->made_publication = false;" incorrectly modifies > > > dbinfo[0], even if the failure occurs in other databases (dbinfo[1], > > > etc.). Since cleanup_objects_atexit() checks made_publication per > > > database, this could lead to incorrect behavior. > > > Solution: Pass the full 'dbinfo' structure to > > > drop_publication_by_name() , and use it accordingly. > > > > > > -- > > > > Fixed. > > > > The attached patch contains the suggested changes. > > > > Thanks for the patch. Here are few comments for the new change - > 1) > +static void check_and_drop_existing_publications(PGconn *conn, struct > LogicalRepInfo *dbinfo); > - move the second parameter to the next line to maintain 80 char length. > Fixed. > 2) > pg_log_error("could not drop publication \"%s\" in database \"%s\": %s", > - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); > + pubname, dbinfo->dbname, > + PQresultErrorMessage(res)); > > You can keep "PQresultErrorMessage(res));" in the previous line, it > will still be < 80 chars. > Fixed. > 3) The new IF check - > + if (strcmp(pubname, dbinfos.dbinfo->pubname) == 0) > + dbinfo->made_publication = false; /* don't try again. */ > > always compares dbinfo[0]->pubname, but it should compare 'pubname' > with the respective database's publication. Please correct it as - > if (strcmp(pubname, dbinfo->pubname) == 0) > > > -- Fixed. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
Hi Shubham. Some review comments for patch v16-0001. ====== doc/src/sgml/ref/pg_createsubscriber.sgml 1. + <term><option>-c</option></term> + <term><option>--drop-all-publications</option></term> Is 'c' the best switch choice letter for this option? It doesn't seem intuitive, but unfortunately, I don't have any better ideas since d/D and p/P are already being used. ====== src/bin/pg_basebackup/pg_createsubscriber.c CreateSubscriberOptions: 2. SimpleStringList sub_names; /* list of subscription names */ SimpleStringList replslot_names; /* list of replication slot names */ int recovery_timeout; /* stop recovery after this time */ + bool drop_publications; /* drop all publications */ Now that you have modified everywhere to rename the parameter as 'drop_all_publications', I think you should change this field name too so everything is consistent. ~~~ 3. static void create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo); -static void drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo); +static void drop_publication_by_name(PGconn *conn, const char *pubname, + struct LogicalRepInfo *dbinfo); +static void check_and_drop_existing_publications(PGconn *conn, + struct LogicalRepInfo *dbinfo); I think the parameters of drop_publication_by_name should be reordered like "(PGconn *conn, struct LogicalRepInfo *dbinfo, const char *pubname)", to make the 1st 2 params consistent with other functions. I wrote this already in my previous review. You replied "Fixed" [1-#5b], but it isn't. ~~~ setup_subscriber: 4. /* * Create the subscriptions, adjust the initial location for logical * replication and enable the subscriptions. That's the last step for logical - * replication setup. + * replication setup. If 'drop_publications' parameter is true, existing + * publications on the subscriber will be dropped before creating new + * subscriptions. */ The comment refers to 'drop_publications', but the parameter name is 'drop_all_publications' ~~~ 5. if (PQresultStatus(res) != PGRES_COMMAND_OK) { pg_log_error("could not drop publication \"%s\" in database \"%s\": %s", - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res)); - dbinfo->made_publication = false; /* don't try again. */ + pubname, dbinfo->dbname, PQresultErrorMessage(res)); + + if (strcmp(pubname, dbinfo->pubname) == 0) + dbinfo->made_publication = false; /* don't try again. */ 5a. IMO this strcmp code needs a comment to say what this logic is doing. e.g. IIUC dbinfo->pubname is the name of the internal FOR ALL TABLES publication that the tool had created. ~ 5b. This logic seems flawed to me. Maybe I am mistaken, but AFAIK when you say '--drop-all-publication' that is going to drop all the existing publications but it is *also* going to drop the internally created FOR ALL TABLES publication. Therefore, to prevent the exit handler from attempting to double-delete the same internal publication don't you need to set dbinfo->made_publication = false also in the case of *successful* drop of dbinfo->pubname? ~ 5c. Consider maybe also checking if made_publication is true, to avoid making unnecessary calls to strcmp e.g. if (dbinfo->made_publication && strcmp(pubname, dbinfo->pubname) == 0) ====== .../t/040_pg_createsubscriber.pl 6. The test code remains difficult to review because I can't see the forest for the trees due to the dozens of S->S1 node name changes. These name changes are unrelated to the new feature so please separate them into a different prerequisite patch so we can just focus on what changes were made just for --drop-all-publications. I know you already said you are working on it [1-#7], but multiple patch versions have been posted since you said that. ====== [1] v13 review https://www.postgresql.org/message-id/CAHv8RjKmf5bAcgTmGToWSn_Fx%2BO2y8qCiLScBXvBTD0D5gX2sw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Mon, Mar 10, 2025 at 5:00 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Shubham.
Some review comments for patch v16-0001.
======
doc/src/sgml/ref/pg_createsubscriber.sgml
1.
+ <term><option>-c</option></term>
+ <term><option>--drop-all-publications</option></term>
Is 'c' the best switch choice letter for this option? It doesn't seem
intuitive, but unfortunately, I don't have any better ideas since d/D
and p/P are already being used.
Agreed. Better to just not assign a short name to this.
The description for the sgml docs needs to frame up this option's purpose.
How exactly does one go about backing up a publication? You discuss the topic in the commit message but that definitely seems user-facing.
If we aren't expecting lots of publications maybe name them individually instead of logging "all publications"? Possibly one info line each but even just comma separated would work.
The name of this is shock-inducing. Admittedly, after pondering things, it is fairly obvious that only the target is going to be affected, but there is a source database involved here as well. It is also unclear on whether it would happen before or after, which is less problematic since it would only impact failure modes anyway - when all is said and done with this specified upon restart following the pg_resetwal the server will have no publications, right?
Maybe: --drop-target-publications-first ?
If we do want a letter either "X" or "Z" probably works for an English-speaking audience probably. X is how one denotes removing something; and Z is a mnemonic for "Zap" which is a synonym for "Drop". "R" for "Remove".
Can you briefly recap how this is different than the automatic behavior described in the existing Step 6?
"Drop publications on the target server that were replicated because they were created before the replication start location. It has no use on the subscriber."
-R --remove-target-publications; I fine with answering the timing question to the description.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
On Tue, Mar 11, 2025 at 11:32 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Mon, Mar 10, 2025 at 5:00 PM Peter Smith <smithpb2250@gmail.com> wrote: >> >> Hi Shubham. >> >> Some review comments for patch v16-0001. >> >> ====== >> doc/src/sgml/ref/pg_createsubscriber.sgml >> >> 1. >> + <term><option>-c</option></term> >> + <term><option>--drop-all-publications</option></term> >> >> Is 'c' the best switch choice letter for this option? It doesn't seem >> intuitive, but unfortunately, I don't have any better ideas since d/D >> and p/P are already being used. > > > Agreed. Better to just not assign a short name to this. > > The description for the sgml docs needs to frame up this option's purpose. > > How exactly does one go about backing up a publication? You discuss the topic in the commit message but that definitelyseems user-facing. > > If we aren't expecting lots of publications maybe name them individually instead of logging "all publications"? Possiblyone info line each but even just comma separated would work. > > The name of this is shock-inducing. Admittedly, after pondering things, it is fairly obvious that only the target is goingto be affected, but there is a source database involved here as well. It is also unclear on whether it would happenbefore or after, which is less problematic since it would only impact failure modes anyway - when all is said and donewith this specified upon restart following the pg_resetwal the server will have no publications, right? > > Maybe: --drop-target-publications-first ? > > If we do want a letter either "X" or "Z" probably works for an English-speaking audience probably. X is how one denotesremoving something; and Z is a mnemonic for "Zap" which is a synonym for "Drop". "R" for "Remove". > > Can you briefly recap how this is different than the automatic behavior described in the existing Step 6? > "Drop publications on the target server that were replicated because they were created before the replication start location.It has no use on the subscriber." > > -R --remove-target-publications; I fine with answering the timing question to the description. > Hi David. This feature is all about removing some objects that were replicated during the streaming replication before pg_createsubscriber was run, but that are no longer needed/wanted afterwards. Although the current thread/patch is just for removing existing publications from the target server, in the future there might be some enhancements to remove other kinds of unwanted objects that were replicated to the target server by streaming replication -- e.g. remove subscriptions, or remove replication slots, or whatever. Unfortunately, we are spinning in circles a bit trying to come up with a good way to represent the option needed for this, while at the same time trying to be future-proof. I see 3 choices... ====== Choice 1. Generic option Implement a single boolean option to remove everything. In this case, the option would need to have some generic name like "--remove-target-existing-object". For now (current patch), it would be implemented to remove only existing publications. In the future, it could remove more things. But, there isn't much flexibility: IIUC this option can only remove either all or none. ~ Choice 2. Different options for removing different things (the current patch is like this) "--remove-target-publications" is what this patch is doing. In future, if we want to remove more things then we would need to add more options like "--remove-target-subscriptions", "--remove-target-replication-slots" etc. ~ Choice 3. Implement some option that has an argument saying what to delete Implement an option that takes some argument saying what objects to remove. Here, the current patch would be something like: "--remove-target-objects=publications". In future, this option could be enhanced to accept other values like "--remove-target-objects=publications,subscriptions" etc. ~~~ Do you have any thoughts on what kind of option is best here? ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Tue, Mar 11, 2025 at 12:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
Unfortunately, we are spinning in circles a bit trying to come up with
a good way to represent the option needed for this, while at the same
time trying to be future-proof. I see 3 choices...
======
Choice 1. Generic option
Implement a single boolean option to remove everything.
Do you have any thoughts on what kind of option is best here?
Option 1 by far. Though personally I'd rather do something like what pg_upgrade does and output a script with drop commands that can be executed via psql instead of having pg_createsubscriber execute said commands. I'd call it "pruning the target".
Any automated bash script I'd write would just do:
pg_createsubscriber ... --prune-target-script=prune.sql
psql --file prune.sql
And if I'm working interactively I'd want to spend the extra minute or so reviewing the commands in prune.sql to make sure I know what is going away. I can also edit said file, or use something like grep, if I want to limit what is dropped.
In particular any database that isn't turned into a logical replica would be added to this list; in addition to all the publication/subscription stuff that is likely broken at this point.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Tue, Mar 11, 2025 at 8:21 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tue, Mar 11, 2025 at 12:20 AM Peter Smith <smithpb2250@gmail.com> wrote: >> >> >> Unfortunately, we are spinning in circles a bit trying to come up with >> a good way to represent the option needed for this, while at the same >> time trying to be future-proof. I see 3 choices... >> >> ====== >> >> Choice 1. Generic option >> >> Implement a single boolean option to remove everything. >> >> >> Do you have any thoughts on what kind of option is best here? >> > > Option 1 by far. > Among the current discussed options, I would prefer "--remove-all-publications", or "--drop-all-publications" without any shorthand notation. We don't need to add target as we already have options like "--socketdir", or "--config-file" that apply to the target server and are explained in the text. It is better to follow the same to keep the consistency. Also, the new subscriber can act as a publisher, so it won't be a good idea to remove the existing copied publications by default. Similarly, users may want to retain copied subscriptions if they want the new target server to keep receiving the changes from previous publishers. I feel giving individual options and then a remove-all-objects that pre-exist on the target server would be a good direction to improve this tool. > Though personally I'd rather do something like what pg_upgrade does and output a script with drop commands that can be executed via psql instead of having pg_createsubscriber execute said commands. I'd call it "pruning the target". > Yes, this is a good idea, and IIRC, we discussed in brief on this in the original thread where we developed this tool but left it as a future improvement. We should definitely go in this direction in the future, but let's improve this tool incrementally, otherwise, there is a risk of making no improvements. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Tue, Mar 11, 2025 at 6:02 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > Can you briefly recap how this is different than the automatic behavior described in the existing Step 6? > "Drop publications on the target server that were replicated because they were created before the replication start location.It has no use on the subscriber." > These are the publications this tool has created on publisher to make standby a subscriber, but they have replicated to standby before the standby is converted to subscriber. As per my understanding, the patch gives an option to remove publications that are pre-existing (aka not created by this tool). -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Tuesday, March 11, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 11, 2025 at 8:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 12:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
>>
>>
>> Unfortunately, we are spinning in circles a bit trying to come up with
>> a good way to represent the option needed for this, while at the same
>> time trying to be future-proof. I see 3 choices...
>>
>> ======
>>
>> Choice 1. Generic option
>>
>> Implement a single boolean option to remove everything.
>>
>>
>> Do you have any thoughts on what kind of option is best here?
>>
>
> Option 1 by far.
>
>
Though personally I'd rather do something like what pg_upgrade does
and output a script with drop commands that can be executed via psql
instead of having pg_createsubscriber execute said commands. I'd call
it "pruning the target".
>
Yes, this is a good idea, and IIRC, we discussed in brief on this in
the original thread where we developed this tool but left it as a
future improvement. We should definitely go in this direction in the
future, but let's improve this tool incrementally, otherwise, there is
a risk of making no improvements.
I’m against a UX that drops database objects selected by algorithm without informing the user what those objects are and giving them a chance to abort. The output file method makes that very simple to implement. I think an interactive command to accomplish that requirement would put commit more at risk. And a constant increase of options for each object type seems annoying given the alternative of simply doing them all as we get around to them and letting the file and the user remove the ones they wish to keep.
I’m honestly question how much value this provides - no improvement here seems like a viable choice. Let them issue the drop commands they desire manually. This doesn’t feel like something one should/would automate.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Tuesday, March 11, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 11, 2025 at 6:02 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> Can you briefly recap how this is different than the automatic behavior described in the existing Step 6?
> "Drop publications on the target server that were replicated because they were created before the replication start location. It has no use on the subscriber."
>
These are the publications this tool has created on publisher to make
standby a subscriber, but they have replicated to standby before the
standby is converted to subscriber. As per my understanding, the patch
gives an option to remove publications that are pre-existing (aka not
created by this tool).
Thank you. I’ll factor that into my suggested changes patch for the reference page.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Wed, Mar 12, 2025 at 9:43 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > I’m honestly question how much value this provides - no improvement here seems like a viable choice. Let them issue thedrop commands they desire manually. This doesn’t feel like something one should/would automate. > There is a good chance of mistakenly dropping the required objects manually after the subscriber is converted. One can mistakenly drop the required publication unless they have some strict rule to do it immediately after the tool has converted standby to subscriber. Providing the commands via file is a way, but as a user, I would prefer to get the things done automatically via a tool instead of me doing it afterward unless there is no other way.. -- With Regards, Amit Kapila.
Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Tuesday, March 11, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 12, 2025 at 9:43 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> I’m honestly question how much value this provides - no improvement here seems like a viable choice. Let them issue the drop commands they desire manually. This doesn’t feel like something one should/would automate.
>
There is a good chance of mistakenly dropping the required objects
manually after the subscriber is converted. One can mistakenly drop
the required publication unless they have some strict rule to do it
immediately after the tool has converted standby to subscriber.
If you are referring to “step 6”, that bypasses this requirement because pg_createsubscriber created it, knows the exact name, and it is well defined why that specific publication should be removed.
Providing the commands via file is a way, but as a user, I would
prefer to get the things done automatically via a tool instead of me
doing it afterward unless there is no other way..
Given that the fallback position is to restore the physical standby and do the transformation again I suppose it isn’t too bad if we or the user messes up a run. But you can still get the tool to basically do it for you automatically by just blindly sending the through psql. The people who want the safety net have no option. I suspect the percentage of DBAs that would appreciate the option would be meaningful. After all, we felt it necessary to add a —dry-run option.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Wed, Mar 12, 2025 at 10:42 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tuesday, March 11, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Wed, Mar 12, 2025 at 9:43 AM David G. Johnston >> <david.g.johnston@gmail.com> wrote: >> > >> > I’m honestly question how much value this provides - no improvement here seems like a viable choice. Let them issuethe drop commands they desire manually. This doesn’t feel like something one should/would automate. >> > >> >> There is a good chance of mistakenly dropping the required objects >> manually after the subscriber is converted. One can mistakenly drop >> the required publication unless they have some strict rule to do it >> immediately after the tool has converted standby to subscriber. > > > If you are referring to “step 6”, that bypasses this requirement because pg_createsubscriber created it, knows the exactname, and it is well defined why that specific publication should be removed. > I am not referring to step 6. I am saying that users could have created the publications by themselves after converting the standby to subscriber. It won't be convenient to remember those and distinguish them from the existing publications as there won't be any particular pattern. The point is that there is a chance of misuse or data loss with manual steps, which is not there if done via this option. >> >> Providing the commands via file is a way, but as a user, I would >> prefer to get the things done automatically via a tool instead of me >> doing it afterward unless there is no other way.. >> > > Given that the fallback position is to restore the physical standby and do the transformation again I suppose it isn’ttoo bad if we or the user messes up a run. > As mentioned above, there is always a chance of data loss if it is done manually, even if we restore to a backup of physical standby. > But you can still get the tool to basically do it for you automatically by just blindly sending the through psql. The people who want the safety net have no option. I suspect the percentage of DBAs that would appreciate the option would be meaningful. After all, we felt it necessary to add a —dry-run option. > This could be another option, but I feel an option with a tool is more meaningful than allowing users to do afterward steps. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Euler Taveira"
Date:
On Wed, Mar 12, 2025, at 3:41 AM, Amit Kapila wrote:
This could be another option, but I feel an option with a tool is moremeaningful than allowing users to do afterward steps.
I wasn't paying much attention to this discussion. The thread title refers to a
general option to clean publisher objects which includes non specified objects.
I was expecting a general solution but it seems to include only publications.
Why? I envision in the future adding an option to publish only a set of tables.
Will this proposal remove tables that were not published and its dependent
objects (data types, functions, ...)? When we add the initial schema
synchronization for logical replication, will this proposal be aligned with it?
It is a mistake not to explore a general solution because you risk shooting
yourself in the foot. If we consider that (a) the start point is a standby
(physical copy) and (b) in most scenarios to setup a logical replica you use
pg_dump that dumps the publications by default, it seems these additional
objects will be expected by the user.
I have some questions, fixes and concerns about this patch.
+ <term><option>-c</option></term>
+ <term><option>--drop-all-publications</option></term>
-c has nothing to do with the long option. As Amit suggested in a previous
email, it is perfectly fine to not include a short option here.
static void setup_subscriber(struct LogicalRepInfo *dbinfo,
- const char *consistent_lsn);
+ const char *consistent_lsn,
+ bool drop_all_publications);
Isn't it better to add this extra option into LogicalRepInfo? It avoids
changing this function signature and the following one.
-static void drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo);
+static void drop_publication_by_name(PGconn *conn, const char *pubname,
+ struct LogicalRepInfo *dbinfo);
By the way, why did you rename the function? It still removes the publication
by name, although, the function name does not contain "_by_name". The proposed
signature also has redundant argument usage.
- drop_publication(conn, &dbinfos.dbinfo[i]);
+ drop_publication_by_name(conn, dbinfos.dbinfo[i].pubname,
+ &dbinfos.dbinfo[i]);
Even with this proposal, the following change is imprecise. It is still
removing publications (plural form). If you still want to include the fact that
pre existing publications will also be removed, say "Remove publications
(including pre existing ones)".
/*
- * Since the publication was created before the consistent LSN, it is
- * available on the subscriber when the physical replica is promoted.
- * Remove publications from the subscriber because it has no use.
+ * Since the publication was created before the consistent LSN, it
+ * remains on the subscriber even after the physical replica is
+ * promoted. Remove this publication from the subscriber because it
+ * has no use. Additionally, if requested, drop all pre-existing
+ * publications.
*/
-drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
+drop_publication_by_name(PGconn *conn, const char *pubname,
+ struct LogicalRepInfo *dbinfo)
Keep the current function name.
- char *pubname_esc;
-
- Assert(conn != NULL);
- pubname_esc = PQescapeIdentifier(conn, dbinfo->pubname, strlen(dbinfo->pubname));
+ char *pubname_esc = PQescapeIdentifier(conn, pubname, strlen(pubname));
Why did you remove the assert?
- PQfreemem(pubname_esc);
-
Why did you reallocate this function call?
+/*
+ * Check and drop existing publications on the subscriber.
+ */
+static void
+check_and_drop_existing_publications(PGconn *conn, struct LogicalRepInfo *dbinfo)
AFAICS there is no "check" in this function. I would go with
drop_all_publications().
-my $node_s = PostgreSQL::Test::Cluster->new('node_s');
-$node_s->init_from_backup($node_p, 'backup_1', has_streaming => 1);
-$node_s->append_conf(
+my $node_s1 = PostgreSQL::Test::Cluster->new('node_s1');
+$node_s1->init_from_backup($node_p, 'backup_1', has_streaming => 1);
+$node_s1->append_conf(
Why did you rename node_s? You just increased the number of hunks someone needs
to review. If you need a new standby, fine. Create another node: node_u.
+$node_s2->init_from_backup($node_p, 'backup_3', has_streaming => 1);
+$node_s2->append_conf(
+ 'postgresql.conf', qq[
+ primary_conninfo = '$pconnstr'
+ max_logical_replication_workers = 5
+ ]);
+$node_s2->set_standby_mode();
+$node_s2->start;
Do you really need to create another node just to confirm the current behavior?
I think you can inspect the logfile after a dry run mode to see if some message
pops up. This addition increases the pg_createsubscriber runtime from
Files=1, Tests=37, 10 wallclock secs ( 0.02 usr 0.00 sys + 1.41 cusr 1.67 csys = 3.10 CPU)
to
Files=1, Tests=43, 16 wallclock secs ( 0.03 usr 0.01 sys + 1.70 cusr 2.10 csys = 3.84 CPU)
That's unacceptable.
I'm still concerned about the fact that we are adding one option that is
specific and have to add one per object as soon as someone has another
proposal. We need to decide if we want multiple options to clean up objects in
the future or a general option that will be incrementally adding objects to
remove. Multiple options are more granular and can avoid breaking backward
compatibility (if you don't want) but can increase the list of options you need
to inform if you want to clean multiple object types. A single option is not
flexible and breaks backward compatibility but it does exactly what you want
with a few characters. In most scenarios, if you want to have a clean
subscriber, you will remove *all* possible objects, hence, a single option is
your choice.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Fri, Mar 14, 2025 at 3:06 AM Euler Taveira <euler@eulerto.com> wrote: > > On Wed, Mar 12, 2025, at 3:41 AM, Amit Kapila wrote: > > This could be another option, but I feel an option with a tool is more > meaningful than allowing users to do afterward steps. > > > I wasn't paying much attention to this discussion. The thread title refers to a > general option to clean publisher objects which includes non specified objects. > I was expecting a general solution but it seems to include only publications. > Why? I envision in the future adding an option to publish only a set of tables. > Will this proposal remove tables that were not published and its dependent > objects (data types, functions, ...)? > We can add new publications via tool or once the subscriber is created. This option and any other options are to remove objects that are no longer needed from the previous set up of standby. One is publications present which user may or may not need but it is difficult to distinguish the publications copied from primary before we have a subscriber. Now, the other things could be replication slots or probably even databases (if user consider subscriber to create subscriber from a specified set of databases). The idea to make this solution general is that we provide switches like the current one for different objects and then a common switch to remove all pre-existing objects (like --remove-all-existing-objects). I am not sure a generic switch like --remove-all-existing-objects is good enough because users may want to retain few pre-existing objects like subscriptions so that the new subscriber continure to get data from other publishers. > When we add the initial schema > synchronization for logical replication, will this proposal be aligned with it? > Can you please explain this a bit more to state what you have in mind for this? > It is a mistake not to explore a general solution because you risk shooting > yourself in the foot. If we consider that (a) the start point is a standby > (physical copy) and (b) in most scenarios to setup a logical replica you use > pg_dump that dumps the publications by default, it seems these additional > objects will be expected by the user. > It is possible and that is why we are giving it as an option rather than removing publications or other objects by default. > > I'm still concerned about the fact that we are adding one option that is > specific and have to add one per object as soon as someone has another > proposal. We need to decide if we want multiple options to clean up objects in > the future or a general option that will be incrementally adding objects to > remove. Multiple options are more granular and can avoid breaking backward > compatibility (if you don't want) but can increase the list of options you need > to inform if you want to clean multiple object types. A single option is not > flexible and breaks backward compatibility but it does exactly what you want > with a few characters. In most scenarios, if you want to have a clean > subscriber, you will remove *all* possible objects, hence, a single option is > your choice. > I agree that we need a single option, but I feel we need granular options as well to allow users to retain objects selectively. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Tuesday, March 11, 2025, Peter Smith <smithpb2250@gmail.com> wrote:
Choice 3. Implement some option that has an argument saying what to delete
Implement an option that takes some argument saying what objects to remove.
Here, the current patch would be something like:
"--remove-target-objects=publications". In future, this option could
be enhanced to accept other values like
"--remove-target-objects=publications,subscriptions" etc.
I’m changing my vote to option 3. With a requirement that dropping is done post-recovery by the user via psql - we just provide the commands we would have executed in a script.
—prune-file=‘drop-commands.psql’
—prune=publications
—prune=subscriptions
Etc…
I’d rather do multiple specifications than commas separated. It matches what we already do with databases, which was done this way I suspect for the same reasons - length of the parameters value when we have:
Publications
Slots
Subscriptions
Databases
Roles
Tablespaces
Redefining All over the course of years as we decide to add object types is unappealing. Even if we expect the DBA to check the drop-commands.sql file before executing it. Which I’d still require them to do instead of providing CLI options to specify, e.g., slot names or database names to not drop if they want some subset of All.
David J.
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Zhijie Hou (Fujitsu)"
Date:
On 2025/03/14 15:25:00 </o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=3da34e28d9d240d4abadbbb549f9ff21-houzj.fnst>wrote: On Wed, Mar 12, 2025 at 12:13 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > On Tuesday, March 11, 2025, Amit Kapila <amit.kapila16@gmail.com <mailto:amit.kapila16@gmail.com> > wrote: > > On Tue, Mar 11, 2025 at 8:21 PM David G. Johnston > > <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com> > wrote: > > > > > > On Tue, Mar 11, 2025 at 12:20 AM Peter Smith <smithpb2250@gmail.com <mailto:smithpb2250@gmail.com> > wrote: > > >> > > >> > > >> Unfortunately, we are spinning in circles a bit trying to come up with > > >> a good way to represent the option needed for this, while at the same > > >> time trying to be future-proof. I see 3 choices... > > >> > > >> ====== > > >> > > >> Choice 1. Generic option > > >> > > >> Implement a single boolean option to remove everything. > > >> > > >> > > >> Do you have any thoughts on what kind of option is best here? > > >> > > > > > > Option 1 by far. > > > > > > > > > > Though personally I'd rather do something like what pg_upgrade does > > and output a script with drop commands that can be executed via psql > > instead of having pg_createsubscriber execute said commands. I'd call > > it "pruning the target". > > > > > > > Yes, this is a good idea, and IIRC, we discussed in brief on this in > > the original thread where we developed this tool but left it as a > > future improvement. We should definitely go in this direction in the > > future, but let's improve this tool incrementally, otherwise, there is > > a risk of making no improvements. > > I’m against a UX that drops database objects selected by algorithm without > informing the user what those objects are and giving them a chance to abort. > The output file method makes that very simple to implement. I think an > interactive command to accomplish that requirement would put commit more at > risk. And a constant increase of options for each object type seems annoying > given the alternative of simply doing them all as we get around to them and > letting the file and the user remove the ones they wish to keep. > > I’m honestly question how much value this provides - no improvement here seems > like a viable choice. Let them issue the drop commands they desire manually. > This doesn’t feel like something one should/would automate. For logical replication, there is a frequent need to automatically delete all objects (including publications) on replicas that are no longer needed. This requirement comes from a common use case in logical replication, which is to replicate only a subset of database objects. Personally, I am uncertain about the use case for retaining only some of the publications while dropping others. Besides, I am unsure if the rationale behind pg_upgrade's output script is applicable here. AFAIK, The scripts output by pg_upgrade, such as reindex_hash.sql and update_extensions.sql are to address major version differences. The script delete_old_cluster.sh is generated instead of directly removing the instance to provide an option to revert if any issues arise during the upgrade. I think the option proposed in this thread is not to handle the version mismatch between the publisher and the subscriber. The primary goal of pg_createsubscriber is simply to create a subscriber. Even if problems occur after dropping the publication, creating another subscriber using the existing publisher is straightforward and we can't get back to standby whether we remove publications or not. This concern is minimized since we do not delete publications by default (only when --drop-xx option is specified). Overall, I favor automatically deleting publications rather than offering a script for manual execution. Best Regards, Hou zj
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Fri, Mar 14, 2025 at 12:26 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
For logical replication, there is a frequent need to automatically delete all
objects (including publications) on replicas that are no longer needed. This
requirement comes from a common use case in logical replication, which is to
replicate only a subset of database objects.
I don't see us implementing an API to provide an alternative to "ALL TABLES"...
Personally, I am uncertain about
the use case for retaining only some of the publications while dropping others.
To assume there is nothing between All or None seems unwise even if we cannot imagine what that may be.
Besides, I am unsure if the rationale behind pg_upgrade's output script is
applicable here.
Yeah, I probably shouldn't have mentioned it. I don't need it to explain or support my case.
I think the option proposed in this thread is not to handle the
version mismatch between the publisher and the subscriber.
Certainly not given that this requires a physical standby as a base and those are not cross-major-version capable.
Overall,
I favor automatically deleting publications rather than offering a script for
manual execution.
I'm done arguing the counter-point to this. I cannot give this a +1 unless the DBA is given a chance to review and edit the drop commands that we create. But as of now I'm in the minority and this has a committer willing to proceed without this capability. I take some comfort in that it seems at least if they use --dry-run they can see what objects will be dropped. But the editor capability seems more useful. There is always v19 and beyond; at least nothing here precludes adding such a feature in the future.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Mon, Mar 10, 2025 at 5:00 PM Peter Smith <smithpb2250@gmail.com> wrote:
6.
The test code remains difficult to review because I can't see the
forest for the trees due to the dozens of S->S1 node name changes.
These name changes are unrelated to the new feature so please separate
them into a different prerequisite patch so we can just focus on what
changes were made just for --drop-all-publications. I know you already
said you are working on it [1-#7], but multiple patch versions have
been posted since you said that.
I don't see the point of renaming S to S1.
I also don't get re-defining the existing S tests to include --drop-all-publications and then go and add a new test that excludes --drop-all-publications.
Just name the new subscriber D (or pick some other random letter, it isn't like we are encouraging meaningful variable names here) and have it test the behavior of --drop-all-publications. Given the existing design, the introduction of two new user publications on P initially shouldn't impact existing tests (and if it does, seeing that change in the existing tests would be a good thing). An extra check or two against S should suffice to prove non-deletion.
In short, this probably needn't touch any existing tests, just the shared setup.
Also, can we explain why it is important to use --verbose --verbose? Obviously it shows more information, that what repeating --verbose is defined to do. (Sure, that isn't new to this patch.) But why is it important that this specific command use it when none of the others do?
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
On Fri, Mar 14, 2025 at 5:39 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tuesday, March 11, 2025, Peter Smith <smithpb2250@gmail.com> wrote: >> >> >> Choice 3. Implement some option that has an argument saying what to delete >> >> Implement an option that takes some argument saying what objects to remove. >> >> Here, the current patch would be something like: >> "--remove-target-objects=publications". In future, this option could >> be enhanced to accept other values like >> "--remove-target-objects=publications,subscriptions" etc. > > > I’m changing my vote to option 3. With a requirement that dropping is done post-recovery by the user via psql - we justprovide the commands we would have executed in a script. > > —prune-file=‘drop-commands.psql’ > —prune=publications > —prune=subscriptions > Etc… > > I’d rather do multiple specifications than commas separated. It matches what we already do with databases, which was donethis way I suspect for the same reasons - length of the parameters value when we have: > Publications > Slots > Subscriptions > Databases > Roles > Tablespaces > OK. Here are some examples... style #1: --prune=Publications --prune=Slots --prune=Subscriptions --prune=Databases --prune=Tablespaces versus style #2: --prune=Publications,Slots,Subscriptions,Databases,Tablespaces David, I understand you are saying that you prefer style #2 because of the potentially cumbersome length of the value part if there are many future object kinds (e.g. "Publications,Slots,Subscriptions,Databases,Tablespaces" in the above examples). Does anyone else here have any preference about these styles they wish to voice? > Redefining All over the course of years as we decide to add object types is unappealing. Even if we expect the DBA tocheck the drop-commands.sql file before executing it. Which I’d still require them to do instead of providing CLI optionsto specify, e.g., slot names or database names to not drop if they want some subset of All. ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
On Sat, Mar 15, 2025 at 4:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Fri, Mar 14, 2025 at 5:39 PM David G. Johnston > <david.g.johnston@gmail.com> wrote: > > > > On Tuesday, March 11, 2025, Peter Smith <smithpb2250@gmail.com> wrote: > >> > >> > >> Choice 3. Implement some option that has an argument saying what to delete > >> > >> Implement an option that takes some argument saying what objects to remove. > >> > >> Here, the current patch would be something like: > >> "--remove-target-objects=publications". In future, this option could > >> be enhanced to accept other values like > >> "--remove-target-objects=publications,subscriptions" etc. > > > > > > I’m changing my vote to option 3. With a requirement that dropping is done post-recovery by the user via psql - we justprovide the commands we would have executed in a script. > > > > —prune-file=‘drop-commands.psql’ > > —prune=publications > > —prune=subscriptions > > Etc… > > > > I’d rather do multiple specifications than commas separated. It matches what we already do with databases, which wasdone this way I suspect for the same reasons - length of the parameters value when we have: > > Publications > > Slots > > Subscriptions > > Databases > > Roles > > Tablespaces > > > (I'm sorry, my previous post included a confusing typo. Here is the correction) OK. Here are some examples... style #1: --prune=Publications --prune=Slots --prune=Subscriptions --prune=Databases --prune=Tablespaces versus style #2: --prune=Publications,Slots,Subscriptions,Databases,Tablespaces David, I understand you are saying that you prefer style #1 because of the potentially cumbersome length of the csv value part of style #2 if there are many future object kinds (e.g. "Publications,Slots,Subscriptions,Databases,Tablespaces" in the above examples). Does anyone else here have any preference about these styles they wish to voice? > > > Redefining All over the course of years as we decide to add object types is unappealing. Even if we expect the DBA tocheck the drop-commands.sql file before executing it. Which I’d still require them to do instead of providing CLI optionsto specify, e.g., slot names or database names to not drop if they want some subset of All. > ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Sat, Mar 15, 2025 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Mar 15, 2025 at 4:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Fri, Mar 14, 2025 at 5:39 PM David G. Johnston > > <david.g.johnston@gmail.com> wrote: > > > > > > On Tuesday, March 11, 2025, Peter Smith <smithpb2250@gmail.com> wrote: > > >> > > >> > > >> Choice 3. Implement some option that has an argument saying what to delete > > >> > > >> Implement an option that takes some argument saying what objects to remove. > > >> > > >> Here, the current patch would be something like: > > >> "--remove-target-objects=publications". In future, this option could > > >> be enhanced to accept other values like > > >> "--remove-target-objects=publications,subscriptions" etc. > > > > > > > > > I’m changing my vote to option 3. With a requirement that dropping is done post-recovery by the user via psql - wejust provide the commands we would have executed in a script. > > > > > > —prune-file=‘drop-commands.psql’ > > > —prune=publications > > > —prune=subscriptions > > > Etc… > > > > > > I’d rather do multiple specifications than commas separated. It matches what we already do with databases, which wasdone this way I suspect for the same reasons - length of the parameters value when we have: > > > Publications > > > Slots > > > Subscriptions > > > Databases > > > Roles > > > Tablespaces > > > > > > > (I'm sorry, my previous post included a confusing typo. Here is the correction) > > OK. Here are some examples... > > style #1: > --prune=Publications --prune=Slots --prune=Subscriptions > --prune=Databases --prune=Tablespaces > > versus > > style #2: > --prune=Publications,Slots,Subscriptions,Databases,Tablespaces > > David, I understand you are saying that you prefer style #1 because of > the potentially cumbersome length of the csv value part of style #2 if > there are many future object kinds (e.g. > "Publications,Slots,Subscriptions,Databases,Tablespaces" in the above > examples). > Style-1 sounds reasonable to me, but how exactly we want to do. One idea is to have short and long switches like -r and --remove_exiting_object=publication. The user would be allowed to give multiple options like -r publications -r slots, etc. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Friday, March 14, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 15, 2025 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Sat, Mar 15, 2025 at 4:52 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Fri, Mar 14, 2025 at 5:39 PM David G. Johnston
> > <david.g.johnston@gmail.com> wrote:
> > >
> > > On Tuesday, March 11, 2025, Peter Smith <smithpb2250@gmail.com> wrote:
> > >>
> > >>
> > >> Choice 3. Implement some option that has an argument saying what to delete
> > >>
> > >> Implement an option that takes some argument saying what objects to remove.
> > >>
> > >> Here, the current patch would be something like:
> > >> "--remove-target-objects=publications". In future, this option could
> > >> be enhanced to accept other values like
> > >> "--remove-target-objects=publications,subscriptions" etc.
> > >
> > >
> > > I’m changing my vote to option 3. With a requirement that dropping is done post-recovery by the user via psql - we just provide the commands we would have executed in a script.
> > >
> > > —prune-file=‘drop-commands.psql’
> > > —prune=publications
> > > —prune=subscriptions
> > > Etc…
> > >
> > > I’d rather do multiple specifications than commas separated. It matches what we already do with databases, which was done this way I suspect for the same reasons - length of the parameters value when we have:
> > > Publications
> > > Slots
> > > Subscriptions
> > > Databases
> > > Roles
> > > Tablespaces
> > >
> >
>
> (I'm sorry, my previous post included a confusing typo. Here is the correction)
>
> OK. Here are some examples...
>
> style #1:
> --prune=Publications --prune=Slots --prune=Subscriptions
> --prune=Databases --prune=Tablespaces
>
> versus
>
> style #2:
> --prune=Publications,Slots,Subscriptions,Databases, Tablespaces
>
> David, I understand you are saying that you prefer style #1 because of
> the potentially cumbersome length of the csv value part of style #2 if
> there are many future object kinds (e.g.
> "Publications,Slots,Subscriptions,Databases, Tablespaces" in the above
> examples).
>
Style-1 sounds reasonable to me, but how exactly we want to do. One
idea is to have short and long switches like -r and
--remove_exiting_object=publication. The user would be allowed to give
multiple options like -r publications -r slots, etc.
Either “existing” nor “object” are needed, a one-word long form suffices. Drop, remove, or prune. If we want a short form option we should choose Remove and use -r; both D and P are already taken.
So, I marginally prefer —prune with no short-form option; followed by —remove/-r
Communicating the semantic meaning of “prune” in the option name, we aren’t removing all objects of the given type, tips the balance for me. But that can just be communicated in the description so it isn’t a strong desire.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Sat, Mar 15, 2025 at 8:03 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Friday, March 14, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> >> Style-1 sounds reasonable to me, but how exactly we want to do. One >> idea is to have short and long switches like -r and >> --remove_exiting_object=publication. The user would be allowed to give >> multiple options like -r publications -r slots, etc. > > > Either “existing” nor “object” are needed, a one-word long form suffices. Drop, remove, or prune. If we want a shortform option we should choose Remove and use -r; both D and P are already taken. > > So, I marginally prefer —prune with no short-form option; followed by —remove/-r > I am inclined towards "--remove/-r" as that will be relatively more straightforward to follow for users. > Communicating the semantic meaning of “prune” in the option name, we aren’t removing all objects of the given type, tipsthe balance for me. But that can just be communicated in the description so it isn’t a strong desire. > BTW, with this option, we will be removing all the publications present on the subscriber because on standby there shouldn't be any more. But that may not be true for other objects, so we must communicate it via the description. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Mon, Mar 17, 2025 at 8:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Mar 15, 2025 at 8:03 PM David G. Johnston > <david.g.johnston@gmail.com> wrote: > > > > On Friday, March 14, 2025, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> > >> Style-1 sounds reasonable to me, but how exactly we want to do. One > >> idea is to have short and long switches like -r and > >> --remove_exiting_object=publication. The user would be allowed to give > >> multiple options like -r publications -r slots, etc. > > > > > > Either “existing” nor “object” are needed, a one-word long form suffices. Drop, remove, or prune. If we want a shortform option we should choose Remove and use -r; both D and P are already taken. > > > > So, I marginally prefer —prune with no short-form option; followed by —remove/-r > > > > I am inclined towards "--remove/-r" as that will be relatively more > straightforward to follow for users. > > > Communicating the semantic meaning of “prune” in the option name, we aren’t removing all objects of the given type, tipsthe balance for me. But that can just be communicated in the description so it isn’t a strong desire. > > > > BTW, with this option, we will be removing all the publications > present on the subscriber because on standby there shouldn't be any > more. But that may not be true for other objects, so we must > communicate it via the description. > > -- I have incorporated the "--remove/-r" parameter in the attached patch, as it seems more intuitive and straightforward for users. The attached patch contains the latest changes. Thanks and regards, Shubham Khanna,
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Euler Taveira"
Date:
On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:
I have incorporated the "--remove/-r" parameter in the attached patch,as it seems more intuitive and straightforward for users.The attached patch contains the latest changes.
There were a lot of discussion around the single vs multiple options since my
last review [1] so I'm answering some of the questions here.
Due to the nature of removing multiple objects, I would say that a general
option that has a value and can be informed multiple times is the way to go. I
saw that the discussion led to this. Regarding the name, my preference is
--drop since we already have other binaries with similar options (pg_receivewal
and pg_recvlogical have --drop-slot).
The commit message needs some adjustments. There are redundant information (1st
and last paragraph).
+ <literal>publications</literal>. This option is useful when
+ transitioning from streaming replication to logical replication as
+ existing objects may no longer be needed. Before using this option,
Use physical replication. That's what we used in the current
pg_createsubscriber documentation and it is also the terminology referred in
the glossary (see Replication).
bool two_phase; /* enable-two-phase option */
+ uint32 remove_objects; /* flag to remove objects on subscriber */
uint32 -> bits32.
-static void setup_subscriber(struct LogicalRepInfo *dbinfo,
- const char *consistent_lsn);
+static void setup_subscriber(const char *consistent_lsn);
This is unrelated to this patch.
- * replication setup.
+ * replication setup. If 'remove' parameter is specified, existing publications
+ * on the subscriber will be dropped before creating new subscriptions.
*/
static void
-setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn)
+setup_subscriber(const char *consistent_lsn)
There is no parameter 'remove' in this function. I understand that you want to
provide information about the remove option but it is not the right place to
add it. If for some reason you need to change or remove such parameter, it is
easier to left this comment because it is not near the place you are removing
some lines of code.
+ struct LogicalRepInfo *dbinfo = &dbinfos.dbinfo[i];
/* Connect to subscriber. */
- conn = connect_database(dbinfo[i].subconninfo, true);
+ conn = connect_database(dbinfo->subconninfo, true);
This new dbinfo pointer is just a way to increase your patch size without
improving readability.
- drop_publication(conn, &dbinfo[i]);
+ if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)
+ drop_all_publications(conn, dbinfo);
+ else
+ drop_publication(conn, dbinfo, dbinfo->pubname);
At first glance, I didn't like this change. You inform dbinfo->pubname as a 3rd
parameter but the 2nd parameter is dbinfo. After reading
drop_all_publication(), I realized that's the cause for this change. Is there a
better way to do it?
+static void
+drop_all_publications(PGconn *conn, struct LogicalRepInfo *dbinfo)
+{
+ PGresult *res;
+
I would add an Assert(conn != NULL) here to follow the same pattern as the
other functions.
+ res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;");
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not obtain publication information: %s",
+ PQresultErrorMessage(res));
+ PQclear(res);
+ return;
+ }
Shouldn't it disconnect_database() here? See the pattern in other functions
that error out while querying the catalog.
+ SimpleStringListCell *cell;
+
+ for (cell = opt.remove_objects.head; cell; cell = cell->next)
+ {
You could use SimpleStringListCell in the for loop without a separate declaration.
+ pg_log_error("wrong remove object is specified");
+ pg_log_error_hint("Only \"publications\" can be removed.");
+ exit(1);
The main message sounds strange. Did you mean 'wrong object is specified'?
+ok($node_s->safe_psql($db1, "SELECT COUNT(*) = 2 FROM pg_publication"),
+ 'two publications created before --remove is run');
+
two pre-existing publications on subscriber ?
+is($node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"),
+ '0', 'all publications dropped after --remove is run');
+
all publications on subscriber have been removed ?
+ primary_conninfo = '$pconnstr'
+ max_logical_replication_workers = 5
Do you need to set m_l_r_w here?
+# --verbose is used twice to show more information.
This comment is superfluous. Remove it.
+# Confirm publications still remain after running 'pg_createsubscriber'
+is($node_u->safe_psql($db3, "SELECT COUNT(*) FROM pg_publication;"),
+ '2', 'all publications remain after running pg_createsubscriber');
I would remove the semicolon here because none of the SQL commands in this test
file includes it.
Files=1, Tests=43, 14 wallclock secs ( 0.04 usr 0.01 sys + 1.65 cusr 2.06 csys = 3.76 CPU)
Result: PASS
Do we really need this extra test? It increases the test duration by 4 seconds.
Again, that's still unacceptable for such an optional feature. Maybe you should
experiment my suggestion in the previous review. You don't need a real run to
prove that pg_createsubscriber is not removing unintended objects.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Mon, Mar 17, 2025 at 4:01 PM Euler Taveira <euler@eulerto.com> wrote:
On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:I have incorporated the "--remove/-r" parameter in the attached patch,as it seems more intuitive and straightforward for users.The attached patch contains the latest changes.There were a lot of discussion around the single vs multiple options since mylast review [1] so I'm answering some of the questions here.
Regarding the name, my preference is--drop since we already have other binaries with similar options (pg_receivewaland pg_recvlogical have --drop-slot).
A short form seems desired here and we cannot use -d/-D. Also, the "act and quit" nature of the command-like options in those two client applications leads me to believe that this server application modifier-like option, which behaves differently than a simple "drop named object and return", should not have the same naming as those others.
We are not dropping named objects - the wording "Remove all given objects" is incorrect.
"Remove all objects of the specified type from specified databases on the target server."
"Multiple object types can be specified by writing multiple --remove switches." (accepting switches instead of options pending bulk change)
More changes of this sort are needed.
- drop_publication(conn, &dbinfo[i]);+ if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)+ drop_all_publications(conn, dbinfo);+ else+ drop_publication(conn, dbinfo, dbinfo->pubname);At first glance, I didn't like this change. You inform dbinfo->pubname as a 3rdparameter but the 2nd parameter is dbinfo. After readingdrop_all_publication(), I realized that's the cause for this change. Is there abetter way to do it?
I had the same impression. I'm inclined to accept it as-is and let whoever writes the next --remove object type implementation deal with cleaning it up. This is clear enough when talking only about publications since whether you just remove the one or all of them the special one we created goes away.
+ pg_log_error("wrong remove object is specified");+ pg_log_error_hint("Only \"publications\" can be removed.");+ exit(1);The main message sounds strange. Did you mean 'wrong object is specified'?
Error: invalid object type "%s" specified for --remove
Hint: The valid options are: "publications"
Hint: The valid options are: "publications"
(yes, plural for a single option is "wrong", but extensible...)
If we want to just give the DBA the choice to say "all" now we could solve two annoyances at once.
The valid options are: "all", "publications"
Should we be accepting these case-insensitively?
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Mar 18, 2025 at 4:31 AM Euler Taveira <euler@eulerto.com> wrote: > > On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote: > > I have incorporated the "--remove/-r" parameter in the attached patch, > as it seems more intuitive and straightforward for users. > The attached patch contains the latest changes. > > > There were a lot of discussion around the single vs multiple options since my > last review [1] so I'm answering some of the questions here. > > Due to the nature of removing multiple objects, I would say that a general > option that has a value and can be informed multiple times is the way to go. I > saw that the discussion led to this. Regarding the name, my preference is > --drop since we already have other binaries with similar options (pg_receivewal > and pg_recvlogical have --drop-slot). > > The commit message needs some adjustments. There are redundant information (1st > and last paragraph). > Fixed. > + <literal>publications</literal>. This option is useful when > + transitioning from streaming replication to logical replication as > + existing objects may no longer be needed. Before using this option, > > Use physical replication. That's what we used in the current > pg_createsubscriber documentation and it is also the terminology referred in > the glossary (see Replication). > Fixed. > bool two_phase; /* enable-two-phase option */ > + uint32 remove_objects; /* flag to remove objects on subscriber */ > > uint32 -> bits32. > Fixed. > -static void setup_subscriber(struct LogicalRepInfo *dbinfo, > - const char *consistent_lsn); > +static void setup_subscriber(const char *consistent_lsn); > > This is unrelated to this patch. > Fixed. > - * replication setup. > + * replication setup. If 'remove' parameter is specified, existing publications > + * on the subscriber will be dropped before creating new subscriptions. > */ > static void > -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn) > +setup_subscriber(const char *consistent_lsn) > > There is no parameter 'remove' in this function. I understand that you want to > provide information about the remove option but it is not the right place to > add it. If for some reason you need to change or remove such parameter, it is > easier to left this comment because it is not near the place you are removing > some lines of code. > Fixed. > + struct LogicalRepInfo *dbinfo = &dbinfos.dbinfo[i]; > > /* Connect to subscriber. */ > - conn = connect_database(dbinfo[i].subconninfo, true); > + conn = connect_database(dbinfo->subconninfo, true); > > This new dbinfo pointer is just a way to increase your patch size without > improving readability. > Fixed. > - drop_publication(conn, &dbinfo[i]); > + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS) > + drop_all_publications(conn, dbinfo); > + else > + drop_publication(conn, dbinfo, dbinfo->pubname); > > At first glance, I didn't like this change. You inform dbinfo->pubname as a 3rd > parameter but the 2nd parameter is dbinfo. After reading > drop_all_publication(), I realized that's the cause for this change. Is there a > better way to do it? > I see your point. The use of dbinfo->pubname as the third parameter while passing dbinfo as the second parameter does feel a bit inconsistent. However, since drop_all_publications() relies on dbinfo for context, this approach seemed necessary to keep the interface consistent with drop_publication(). Currently, I do not have a better alternative that maintains clarity and consistency, but I am open to suggestions if anyone has ideas to improve this. > +static void > +drop_all_publications(PGconn *conn, struct LogicalRepInfo *dbinfo) > +{ > + PGresult *res; > + > > I would add an Assert(conn != NULL) here to follow the same pattern as the > other functions. > Fixed. > + res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;"); > + if (PQresultStatus(res) != PGRES_TUPLES_OK) > + { > + pg_log_error("could not obtain publication information: %s", > + PQresultErrorMessage(res)); > + PQclear(res); > + return; > + } > > Shouldn't it disconnect_database() here? See the pattern in other functions > that error out while querying the catalog. > Fixed. > + SimpleStringListCell *cell; > + > + for (cell = opt.remove_objects.head; cell; cell = cell->next) > + { > > You could use SimpleStringListCell in the for loop without a separate declaration. > Fixed. > + pg_log_error("wrong remove object is specified"); > + pg_log_error_hint("Only \"publications\" can be removed."); > + exit(1); > > The main message sounds strange. Did you mean 'wrong object is specified'? > Fixed. > +ok($node_s->safe_psql($db1, "SELECT COUNT(*) = 2 FROM pg_publication"), > + 'two publications created before --remove is run'); > + > > two pre-existing publications on subscriber ? > Fixed. > +is($node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"), > + '0', 'all publications dropped after --remove is run'); > + > > all publications on subscriber have been removed ? > Fixed. > + primary_conninfo = '$pconnstr' > + max_logical_replication_workers = 5 > > Do you need to set m_l_r_w here? > > +# --verbose is used twice to show more information. > > This comment is superfluous. Remove it. > > +# Confirm publications still remain after running 'pg_createsubscriber' > +is($node_u->safe_psql($db3, "SELECT COUNT(*) FROM pg_publication;"), > + '2', 'all publications remain after running pg_createsubscriber'); > > I would remove the semicolon here because none of the SQL commands in this test > file includes it. > > Files=1, Tests=43, 14 wallclock secs ( 0.04 usr 0.01 sys + 1.65 cusr 2.06 csys = 3.76 CPU) > Result: PASS > > Do we really need this extra test? It increases the test duration by 4 seconds. > Again, that's still unacceptable for such an optional feature. Maybe you should > experiment my suggestion in the previous review. You don't need a real run to > prove that pg_createsubscriber is not removing unintended objects. > > > [1] http://postgr.es/m/6f266ce2-38d4-433f-afc9-b47c48c17509@app.fastmail.com > > I have removed the additional test case and the new node (node_u) that was added for it. This should help reduce the overall test duration without compromising the coverage of the new feature. > -- The attached patch contains the suggested changes, Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Mar 18, 2025 at 5:48 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Mon, Mar 17, 2025 at 4:01 PM Euler Taveira <euler@eulerto.com> wrote: >> >> On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote: >> >> I have incorporated the "--remove/-r" parameter in the attached patch, >> as it seems more intuitive and straightforward for users. >> The attached patch contains the latest changes. >> >> >> There were a lot of discussion around the single vs multiple options since my >> last review [1] so I'm answering some of the questions here. >> > >> >> Regarding the name, my preference is >> --drop since we already have other binaries with similar options (pg_receivewal >> and pg_recvlogical have --drop-slot). > > > A short form seems desired here and we cannot use -d/-D. Also, the "act and quit" nature of the command-like options inthose two client applications leads me to believe that this server application modifier-like option, which behaves differentlythan a simple "drop named object and return", should not have the same naming as those others. > > We are not dropping named objects - the wording "Remove all given objects" is incorrect. > > "Remove all objects of the specified type from specified databases on the target server." > > "Multiple object types can be specified by writing multiple --remove switches." (accepting switches instead of optionspending bulk change) > > More changes of this sort are needed. > >> >> >> - drop_publication(conn, &dbinfo[i]); >> + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS) >> + drop_all_publications(conn, dbinfo); >> + else >> + drop_publication(conn, dbinfo, dbinfo->pubname); >> >> At first glance, I didn't like this change. You inform dbinfo->pubname as a 3rd >> parameter but the 2nd parameter is dbinfo. After reading >> drop_all_publication(), I realized that's the cause for this change. Is there a >> better way to do it? > > > I had the same impression. I'm inclined to accept it as-is and let whoever writes the next --remove object type implementationdeal with cleaning it up. This is clear enough when talking only about publications since whether you justremove the one or all of them the special one we created goes away. > > >> + pg_log_error("wrong remove object is specified"); >> + pg_log_error_hint("Only \"publications\" can be removed."); >> + exit(1); >> The main message sounds strange. Did you mean 'wrong object is specified'? > > > Error: invalid object type "%s" specified for --remove > Hint: The valid options are: "publications" > > (yes, plural for a single option is "wrong", but extensible...) > > If we want to just give the DBA the choice to say "all" now we could solve two annoyances at once. > > The valid options are: "all", "publications" > > Should we be accepting these case-insensitively? > I have added validation for "all" to address both issues at once. Additionally, the option parsing is now case-insensitive for better usability. The attached patch at [1] contains the suggested changes. [1] - https://www.postgresql.org/message-id/CAHv8RjK8q%2BmWPWPh9K7CeH2tKF31vGn%2BoPV3qY7pdPCmtbjzkg%40mail.gmail.com Thanks and regards, Shubham Khanna.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Monday, March 17, 2025, Shubham Khanna <khannashubham1197@gmail.com> wrote:
I have added validation for "all" to address both issues at once.
Usage needs to be changed to refer to object types and we should try and specify which are valid there too.
The sgml docs need to mention “all” as a valid value and what it means (namely, if new object types are added and “all” is specified those new types will be removed as well). Suggest not using it in scripts.
There are still more places where “object type” is needed instead of just “object”. I’ll pin-point or fix tomorrow if you don’t beat me to them.
It would be good if we could get this to play nicely with —dry-run; maybe connecting to the source for the queries instead of the target. That would help alleviate my issue with the current auto-drop behavior.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Monday, March 17, 2025, Shubham Khanna <khannashubham1197@gmail.com> wrote: >> >> >> I have added validation for "all" to address both issues at once. >> > > Usage needs to be changed to refer to object types and we should try and specify which are valid there too. > > The sgml docs need to mention “all” as a valid value and what it means (namely, if new object types are added and “all”is specified those new types will be removed as well). Suggest not using it in scripts. > Hmm, I think at this stage, we don't need to add the provision for 'all'. We can add it when more than one object 'publication' will be allowed to be removed. With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Monday, March 17, 2025, Shubham Khanna <khannashubham1197@gmail.com> wrote: >> >> >> I have added validation for "all" to address both issues at once. >> > > Usage needs to be changed to refer to object types and we should try and specify which are valid there too. > Fixed. > The sgml docs need to mention “all” as a valid value and what it means (namely, if new object types are added and “all”is specified those new types will be removed as well). Suggest not using it in scripts. > As suggested by Amit in [1], I have removed the provision for "all" as a valid option for --remove. This keeps the implementation focused on the current supported object type (publications). > There are still more places where “object type” is needed instead of just “object”. I’ll pin-point or fix tomorrow ifyou don’t beat me to them. > I identified and fixed a few instances where "object type" was needed instead of just "object." Please let me know if I missed any other occurrences. > It would be good if we could get this to play nicely with —dry-run; maybe connecting to the source for the queries insteadof the target. That would help alleviate my issue with the current auto-drop behavior. > In the v18 patch attached at [2], I have removed the second test that verified publications are not removed when --remove is not specified. Now, the test suite only verifies that pg_createsubscriber with --remove correctly removes publications from the subscriber node. This reduces redundancy while still validating the core functionality of the --remove option. IIUC, for testing with --dry-run, we can directly check the relevant stdout logs (e.g., "dropping publication 'test_pub1' ...") to verify the call without actually dropping the publications. However, IMO, using --dry-run alone would miss code coverage for the actual drop publication execution part. Please let me know if I misunderstood your point. I will update this accordingly once I have more clarity. The attached patch contains the suggested changes, [1] - https://www.postgresql.org/message-id/CAA4eK1JHD0fmyMkTe_y84gJ--WWPLVFo6kJMNxvFdcDs7nXjbQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAHv8RjK8q%2BmWPWPh9K7CeH2tKF31vGn%2BoPV3qY7pdPCmtbjzkg%40mail.gmail.com Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Tue, Mar 18, 2025 at 4:01 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston > <david.g.johnston@gmail.com> wrote: > > > > > It would be good if we could get this to play nicely with —dry-run; maybe connecting to the source for the queries insteadof the target. That would help alleviate my issue with the current auto-drop behavior. > > > ... > IIUC, for testing with --dry-run, we can directly check the relevant > stdout logs (e.g., "dropping publication 'test_pub1' ...") to verify > the call without actually dropping the publications. > However, IMO, using --dry-run alone would miss code coverage for the > actual drop publication execution part. > I don't understand --dry-run part of conversation here. As per existing code (or with the patch), we seem to be already printing the publications to be dropped in dry-run mode. * - drop_publication(conn, &dbinfo[i]); + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS) + drop_all_publications(conn, &dbinfo[i]); + else + drop_publication(conn, &dbinfo[i], dbinfo[i].pubname); I was expecting a call like the existing function check_and_drop_existing_subscriptions(). The new function should check if the user requested to remove the publication then it should query the publisher, otherwise, just remove the one specified by dbinfo. Also, the core drop_publication() function should take the required parameters instead of dbinfo after this patch. That would simplify the code. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"David G. Johnston"
Date:
On Tue, Mar 18, 2025 at 4:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't understand --dry-run part of conversation here. As per
existing code (or with the patch), we seem to be already printing the
publications to be dropped in dry-run mode.
Sorry, that was me making a bad assumption rather than checking first.
I'm still bothered by the last paragraph of the commit message saying to backup these publications when the preceding one claims strongly and unconditionally that they are redundant and serve no purpose. See my doc changes below (the docs, whatever the form, should suffice for motivating this feature and trying to explain it again in the commit message is redundant).
#define OBJECTTYPE_PUBLICATIONS 0x1
- SimpleStringList remove_objects; /* list of object types to remove */
+ SimpleStringList objecttypes_to_remove; /* list of object types to remove */
- bits32 remove_objects; /* flag to remove objects on subscriber */
+bits32 objecttypes_to_remove; /* flags indicating which object types to remove on subscriber */
printf(_(" -r, --remove=OBJECTTYPE remove all objects of the specified type from specified\n"
" databases on the subscriber; accepts: publications\n"));
" databases on the subscriber; accepts: publications\n"));
+ * Drop the specified publication of the given database. s/of/in/ ? publication in the given database
* Drop all publications on the database. s/on/in ? publications in the given database
sgml:
How about this flow?
<para>
Remove all objects of the specified type from specified databases on the
target server. Multiple object types can be specified by using multiple
<option>--remove</option> switches.
target server. Multiple object types can be specified by using multiple
<option>--remove</option> switches.
</para>
<para>
publications: The "all tables" publications established for this subscriber are
always removed; specifying this object type causes all other publications
replicated from the source server to be dropped as well.
</para>
<para>
The objects selected to be dropped are individually logged and do show
up in a --dry-run. There is no opportunity to affect or stop the dropping
of the selected objects so consider taking a backup of them using pg_dump.
</para>
Just add more paragraphs next to "publications:" as we add more object types.
David J.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Wed, Mar 19, 2025 at 2:11 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 4:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> I don't understand --dry-run part of conversation here. As per >> existing code (or with the patch), we seem to be already printing the >> publications to be dropped in dry-run mode. > > > Sorry, that was me making a bad assumption rather than checking first. > > > I'm still bothered by the last paragraph of the commit message saying to backup these publications when the preceding oneclaims strongly and unconditionally that they are redundant and serve no purpose. > Agreed. I suggest we remove that part of the paragraph (Users should back up any manually created publications before running this command. By default, publications are preserved to avoid unintended data loss.) from the commit message. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Tue, Mar 18, 2025 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 4:01 PM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston > > <david.g.johnston@gmail.com> wrote: > > > > > > > > It would be good if we could get this to play nicely with —dry-run; maybe connecting to the source for the queriesinstead of the target. That would help alleviate my issue with the current auto-drop behavior. > > > > > > ... > > IIUC, for testing with --dry-run, we can directly check the relevant > > stdout logs (e.g., "dropping publication 'test_pub1' ...") to verify > > the call without actually dropping the publications. > > However, IMO, using --dry-run alone would miss code coverage for the > > actual drop publication execution part. > > > > I don't understand --dry-run part of conversation here. As per > existing code (or with the patch), we seem to be already printing the > publications to be dropped in dry-run mode. > > * > - drop_publication(conn, &dbinfo[i]); > + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS) > + drop_all_publications(conn, &dbinfo[i]); > + else > + drop_publication(conn, &dbinfo[i], dbinfo[i].pubname); > > I was expecting a call like the existing function > check_and_drop_existing_subscriptions(). The new function should check > if the user requested to remove the publication then it should query > the publisher, otherwise, just remove the one specified by dbinfo. > Also, the core drop_publication() function should take the required > parameters instead of dbinfo after this patch. That would simplify the > code. > > -- I have updated the function drop_all_publications() to check_and_drop_publications(), which now checks if the user requested to remove the publication; if so, it queries the publisher. Otherwise, it removes the one specified by dbinfo. Additionally, the drop_publication() function now takes the required parameters directly instead of dbinfo, simplifying the code. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 2:11 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 4:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> I don't understand --dry-run part of conversation here. As per >> existing code (or with the patch), we seem to be already printing the >> publications to be dropped in dry-run mode. > > > Sorry, that was me making a bad assumption rather than checking first. > > > I'm still bothered by the last paragraph of the commit message saying to backup these publications when the preceding oneclaims strongly and unconditionally that they are redundant and serve no purpose. See my doc changes below (the docs,whatever the form, should suffice for motivating this feature and trying to explain it again in the commit message isredundant). > Fixed. > #define OBJECTTYPE_PUBLICATIONS 0x1 > Fixed. > - SimpleStringList remove_objects; /* list of object types to remove */ > + SimpleStringList objecttypes_to_remove; /* list of object types to remove */ > Fixed. > - bits32 remove_objects; /* flag to remove objects on subscriber */ > +bits32 objecttypes_to_remove; /* flags indicating which object types to remove on subscriber */ > Fixed. > printf(_(" -r, --remove=OBJECTTYPE remove all objects of the specified type from specified\n" > " databases on the subscriber; accepts: publications\n")); > Fixed. > + * Drop the specified publication of the given database. s/of/in/ ? publication in the given database > Fixed. > * Drop all publications on the database. s/on/in ? publications in the given database > Fixed. > sgml: > How about this flow? > > <para> > Remove all objects of the specified type from specified databases on the > target server. Multiple object types can be specified by using multiple > <option>--remove</option> switches. > </para> > <para> > publications: The "all tables" publications established for this subscriber are > always removed; specifying this object type causes all other publications > replicated from the source server to be dropped as well. > </para> > <para> > The objects selected to be dropped are individually logged and do show > up in a --dry-run. There is no opportunity to affect or stop the dropping > of the selected objects so consider taking a backup of them using pg_dump. > </para> > > Just add more paragraphs next to "publications:" as we add more object types. > Fixed. The attached patch at [1] contains the suggested changes. [1] - https://www.postgresql.org/message-id/CAHv8RjJNE1ZvWhsgL54iPsJhomhcG%2B-SGPN8AnnwdLmWt6A44A%40mail.gmail.com Thanks and regards, Shubham Khanna.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 9:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 19, 2025 at 2:11 AM David G. Johnston > <david.g.johnston@gmail.com> wrote: > > > > On Tue, Mar 18, 2025 at 4:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> I don't understand --dry-run part of conversation here. As per > >> existing code (or with the patch), we seem to be already printing the > >> publications to be dropped in dry-run mode. > > > > > > Sorry, that was me making a bad assumption rather than checking first. > > > > > > I'm still bothered by the last paragraph of the commit message saying to backup these publications when the precedingone claims strongly and unconditionally that they are redundant and serve no purpose. > > > > Agreed. I suggest we remove that part of the paragraph (Users should > back up any manually created publications before running this command. > By default, publications are preserved to avoid unintended data loss.) > from the commit message. > I have updated the commit message according to the suggestions. The part about backing up manually created publications has been removed to align with the statement that they are redundant and serve no purpose. The attached patch at [1] contains the suggested changes. [1] - https://www.postgresql.org/message-id/CAHv8RjJNE1ZvWhsgL54iPsJhomhcG%2B-SGPN8AnnwdLmWt6A44A%40mail.gmail.com Thanks and regards, Shubham Khanna.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
vignesh C
Date:
On Wed, 19 Mar 2025 at 10:39, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Mar 18, 2025 at 4:01 PM Shubham Khanna > > <khannashubham1197@gmail.com> wrote: > > > > > > On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston > > > <david.g.johnston@gmail.com> wrote: > > > > > > > > > > > It would be good if we could get this to play nicely with —dry-run; maybe connecting to the source for the queriesinstead of the target. That would help alleviate my issue with the current auto-drop behavior. > > > > > > > > > ... > > > IIUC, for testing with --dry-run, we can directly check the relevant > > > stdout logs (e.g., "dropping publication 'test_pub1' ...") to verify > > > the call without actually dropping the publications. > > > However, IMO, using --dry-run alone would miss code coverage for the > > > actual drop publication execution part. > > > > > > > I don't understand --dry-run part of conversation here. As per > > existing code (or with the patch), we seem to be already printing the > > publications to be dropped in dry-run mode. > > > > * > > - drop_publication(conn, &dbinfo[i]); > > + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS) > > + drop_all_publications(conn, &dbinfo[i]); > > + else > > + drop_publication(conn, &dbinfo[i], dbinfo[i].pubname); > > > > I was expecting a call like the existing function > > check_and_drop_existing_subscriptions(). The new function should check > > if the user requested to remove the publication then it should query > > the publisher, otherwise, just remove the one specified by dbinfo. > > Also, the core drop_publication() function should take the required > > parameters instead of dbinfo after this patch. That would simplify the > > code. > > > > -- > > I have updated the function drop_all_publications() to > check_and_drop_publications(), which now checks if the user requested > to remove the publication; if so, it queries the publisher. Otherwise, > it removes the one specified by dbinfo. Additionally, the > drop_publication() function now takes the required parameters directly > instead of dbinfo, simplifying the code. > > The attached patch contains the suggested changes. Few comments 1)How about we change the below: #define DEFAULT_SUB_PORT "50432" +#define OBJECTTYPE_PUBLICATIONS 0x1 to #define OBJECTTYPE_PUBLICATIONS 0x0001 That way it will not require to be changed when we add more remove options in the future. 2) Since this is a global variable, I felt this initialization is not required: + /* Verify the object types specified for removal from the subscriber */ + dbinfos.objecttypes_to_remove = 0x0; + if (opt.objecttypes_to_remove.head != NULL) 3) Since this is a single statement, braces is not required: + if (pg_strcasecmp(cell->val, "publications") == 0) + { + dbinfos.objecttypes_to_remove |= OBJECTTYPE_PUBLICATIONS; + } 4) Similarly here too: + { + simple_string_list_append(&opt.objecttypes_to_remove, optarg); + } I have attached the changes for the same, if you are ok with the changes kindly merge it. Regards, Vignesh
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 12:09 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, 19 Mar 2025 at 10:39, Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > On Tue, Mar 18, 2025 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Mar 18, 2025 at 4:01 PM Shubham Khanna > > > <khannashubham1197@gmail.com> wrote: > > > > > > > > On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston > > > > <david.g.johnston@gmail.com> wrote: > > > > > > > > > > > > > > It would be good if we could get this to play nicely with —dry-run; maybe connecting to the source for the queriesinstead of the target. That would help alleviate my issue with the current auto-drop behavior. > > > > > > > > > > > > ... > > > > IIUC, for testing with --dry-run, we can directly check the relevant > > > > stdout logs (e.g., "dropping publication 'test_pub1' ...") to verify > > > > the call without actually dropping the publications. > > > > However, IMO, using --dry-run alone would miss code coverage for the > > > > actual drop publication execution part. > > > > > > > > > > I don't understand --dry-run part of conversation here. As per > > > existing code (or with the patch), we seem to be already printing the > > > publications to be dropped in dry-run mode. > > > > > > * > > > - drop_publication(conn, &dbinfo[i]); > > > + if (dbinfos.remove_objects & OBJECT_PUBLICATIONS) > > > + drop_all_publications(conn, &dbinfo[i]); > > > + else > > > + drop_publication(conn, &dbinfo[i], dbinfo[i].pubname); > > > > > > I was expecting a call like the existing function > > > check_and_drop_existing_subscriptions(). The new function should check > > > if the user requested to remove the publication then it should query > > > the publisher, otherwise, just remove the one specified by dbinfo. > > > Also, the core drop_publication() function should take the required > > > parameters instead of dbinfo after this patch. That would simplify the > > > code. > > > > > > -- > > > > I have updated the function drop_all_publications() to > > check_and_drop_publications(), which now checks if the user requested > > to remove the publication; if so, it queries the publisher. Otherwise, > > it removes the one specified by dbinfo. Additionally, the > > drop_publication() function now takes the required parameters directly > > instead of dbinfo, simplifying the code. > > > > The attached patch contains the suggested changes. > > Few comments > 1)How about we change the below: > #define DEFAULT_SUB_PORT "50432" > +#define OBJECTTYPE_PUBLICATIONS 0x1 > > to > #define OBJECTTYPE_PUBLICATIONS 0x0001 > > That way it will not require to be changed when we add more remove > options in the future. > > 2) Since this is a global variable, I felt this initialization is not required: > + /* Verify the object types specified for removal from the subscriber */ > + dbinfos.objecttypes_to_remove = 0x0; > + if (opt.objecttypes_to_remove.head != NULL) > > 3) Since this is a single statement, braces is not required: > + if (pg_strcasecmp(cell->val, "publications") == 0) > + { > + dbinfos.objecttypes_to_remove |= > OBJECTTYPE_PUBLICATIONS; > + } > > 4) Similarly here too: > + { > + > simple_string_list_append(&opt.objecttypes_to_remove, optarg); > + } > > I have attached the changes for the same, if you are ok with the > changes kindly merge it. > Thank you for the suggestions. I agree with the changes you have attached — defining OBJECTTYPE_PUBLICATIONS as 0x0001 makes future extensions easier, and removing unnecessary braces and redundant initialization simplifies the code. I have merged the changes and prepared the latest patch. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Shubham, One comment for the API. Currently the patch introduces new option '--remove', and the short term is '-r'. My suggestion is to use '-R' instead. Background ========== IIUC, we've planned to put outut-logs in some files on the directory [1]. This follows what pg_ugprade does. In pg_uprade, -r means "retain SQL and log files even after successful completion" [2]. Normally logs would be removed after the sucessful upgrade, but they would not with the option. If we use -r for removing objects in pg_createsubscriber, it would be conflicted with the short term of --retain. This is confusing for users. [1]: https://www.postgresql.org/message-id/60b45b8a-3047-4a21-ba2a-ddb15daa638f%40eisentraut.org [2]: https://www.postgresql.org/docs/devel/pgupgrade.html Best regards, Hayato Kuroda FUJITSU LIMITED
RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
"Zhijie Hou (Fujitsu)"
Date:
On Wed, Mar 19, 2025 at 2:55 PM Shubham Khanna wrote: > > I have merged the changes and prepared the latest patch. The attached > patch contains the suggested changes. Thanks for updating the patch. Here are few comments: 1. pg_log_error("object type \"%s\" is specified more than once for --remove", optarg); exit(1); Consider using pg_fatal for simplicity. 2. + /* Fetch all publication names */ + res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not obtain publication information: %s", + PQresultErrorMessage(res)); + PQclear(res); + disconnect_database(conn, false); + return; + } I think we should exit here for consistency, as performed in similar cases. 3. + pg_log_info("dropped all publications in database \"%s\"", dbinfo->dbname); This message may be misleading if some publications were not dropped successfully, as drop_publication does not exit on a drop failure. 4. if (opt.remove_objects.head != NULL) { for (SimpleStringListCell *cell = opt.remove_objects.head; cell; cell = cell->next) { I think the first null test is redundant. I have attached a patch with the proposed changes. If you agree with these modifications, please merge them. Best Regards, Hou zj
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 12:44 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Wed, Mar 19, 2025 at 2:55 PM Shubham Khanna wrote: > > > > I have merged the changes and prepared the latest patch. The attached > > patch contains the suggested changes. > > Thanks for updating the patch. Here are few comments: > Thank you for the suggestions. I agree with the changes you have attached. > 1. > pg_log_error("object type \"%s\" is specified more than once for --remove", optarg); > exit(1); > > Consider using pg_fatal for simplicity. > Replacing pg_log_error with pg_fatal makes the error handling more consistent. > > 2. > > + /* Fetch all publication names */ > + res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;"); > + if (PQresultStatus(res) != PGRES_TUPLES_OK) > + { > + pg_log_error("could not obtain publication information: %s", > + PQresultErrorMessage(res)); > + PQclear(res); > + disconnect_database(conn, false); > + return; > + } > > I think we should exit here for consistency, as performed in similar cases. > Exiting on failure when fetching publication names improves consistency with similar cases. > > 3. > > + pg_log_info("dropped all publications in database \"%s\"", dbinfo->dbname); > > This message may be misleading if some publications were not dropped > successfully, as drop_publication does not exit on a drop failure. > Updating the log message to reflect partial drop failures avoids misleading information. > 4. > > if (opt.remove_objects.head != NULL) > { > for (SimpleStringListCell *cell = opt.remove_objects.head; cell; cell = cell->next) > { > > I think the first null test is redundant. > > I have attached a patch with the proposed changes. If you agree with these > modifications, please merge them. > Removing the redundant NULL check simplifies the logic. I have merged the changes and prepared the latest patch. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Wed, Mar 19, 2025 at 12:29 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > One comment for the API. Currently the patch introduces new option '--remove', > and the short term is '-r'. My suggestion is to use '-R' instead. > > Background > ========== > IIUC, we've planned to put outut-logs in some files on the directory [1]. > This follows what pg_ugprade does. > > In pg_uprade, -r means "retain SQL and log files even after successful completion" [2]. > Normally logs would be removed after the sucessful upgrade, but they would not with > the option. > > If we use -r for removing objects in pg_createsubscriber, it would be conflicted > with the short term of --retain. This is confusing for users. > We have discussed in the past and in this thread as well to have a functionality like pg_upgrade to have LOG and or SQL files in which case we may need to have an option for --retain (-r) similar to pg_upgrade. So, using -R for this option sounds reasonable to me. -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 12:29 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shubham, > > One comment for the API. Currently the patch introduces new option '--remove', > and the short term is '-r'. My suggestion is to use '-R' instead. > > Background > ========== > IIUC, we've planned to put outut-logs in some files on the directory [1]. > This follows what pg_ugprade does. > > In pg_uprade, -r means "retain SQL and log files even after successful completion" [2]. > Normally logs would be removed after the sucessful upgrade, but they would not with > the option. > > If we use -r for removing objects in pg_createsubscriber, it would be conflicted > with the short term of --retain. This is confusing for users. > > [1]: https://www.postgresql.org/message-id/60b45b8a-3047-4a21-ba2a-ddb15daa638f%40eisentraut.org > [2]: https://www.postgresql.org/docs/devel/pgupgrade.html > Changed -r to -R based on the shared analysis to avoid conflict with the --retain option used in pg_upgrade and to maintain consistency across tools. The attached patch contains the suggested change. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 19, 2025 at 12:29 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > One comment for the API. Currently the patch introduces new option '--remove', > > and the short term is '-r'. My suggestion is to use '-R' instead. > > > > Background > > ========== > > IIUC, we've planned to put outut-logs in some files on the directory [1]. > > This follows what pg_ugprade does. > > > > In pg_uprade, -r means "retain SQL and log files even after successful completion" [2]. > > Normally logs would be removed after the sucessful upgrade, but they would not with > > the option. > > > > If we use -r for removing objects in pg_createsubscriber, it would be conflicted > > with the short term of --retain. This is confusing for users. > > > > We have discussed in the past and in this thread as well to have a > functionality like pg_upgrade to have LOG and or SQL files in which > case we may need to have an option for --retain (-r) similar to > pg_upgrade. So, using -R for this option sounds reasonable to me. > > -- Fixed the mentioned issue by changing -r to -R to avoid conflict with the --retain option used in pg_upgrade. The attached patch at [1] contains the required change. [1] - https://www.postgresql.org/message-id/CAHv8RjKbYaCRrzJzmU6gU0jt85xJftzVV-Hu9rXrkb1yFzYTtA%40mail.gmail.com Thanks and regards, Shubham Khanna.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
vignesh C
Date:
On Wed, 19 Mar 2025 at 14:32, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > > Changed -r to -R based on the shared analysis to avoid conflict with > the --retain option used in pg_upgrade and to maintain consistency > across tools. > > The attached patch contains the suggested change. Few suggestions: 1) I felt the comments from the function header of check_and_drop_publications and the comments of the caller should be interchanged. That will help in understanding the check_and_drop_publications function more easily: + * Since the publications were created before the consistent LSN, they + * remain on the subscriber even after the physical replica is + * promoted. Remove these publications from the subscriber because + * they have no use. Additionally, if requested, drop all pre-existing + * publications. */ - drop_publication(conn, &dbinfo[i]); + check_and_drop_publications(conn, &dbinfo[i]); +/* + * Check and drop the required publications in the given database. + */ +static void +check_and_drop_publications(PGconn *conn, struct LogicalRepInfo *dbinfo) 2) I was not sure if this line "Multiple object types can be specified by using multiple --removed" is required in documentation, as currently the only option supported is publications: + target server. Multiple object types can be specified by using multiple + <option>--remove</option> switches. + </para> I felt this should be added once at least one more object type removal is included. Regards, Vignesh
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Wed, Mar 19, 2025 at 3:15 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, 19 Mar 2025 at 14:32, Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > > > Changed -r to -R based on the shared analysis to avoid conflict with > > the --retain option used in pg_upgrade and to maintain consistency > > across tools. > > > > The attached patch contains the suggested change. > > Few suggestions: > 1) I felt the comments from the function header of > check_and_drop_publications and the comments of the caller should be > interchanged. That will help in understanding the > check_and_drop_publications function more easily: > + * Since the publications were created before the > consistent LSN, they > + * remain on the subscriber even after the physical replica is > + * promoted. Remove these publications from the > subscriber because > + * they have no use. Additionally, if requested, drop > all pre-existing > + * publications. > */ > - drop_publication(conn, &dbinfo[i]); > + check_and_drop_publications(conn, &dbinfo[i]); > > +/* > + * Check and drop the required publications in the given database. > + */ > +static void > +check_and_drop_publications(PGconn *conn, struct LogicalRepInfo *dbinfo) > Fixed. > 2) I was not sure if this line "Multiple object types can be specified > by using multiple --removed" is required in documentation, as > currently the only option supported is publications: > + target server. Multiple object types can be specified by using multiple > + <option>--remove</option> switches. > + </para> > > I felt this should be added once at least one more object type removal > is included. > Fixed. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
vignesh C
Date:
On Thu, 20 Mar 2025 at 09:06, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > The attached patch contains the suggested changes. Couple of minor comments: 1) The second and third line comments can be in same line: +# Create user-defined publications, wait for streaming replication to sync them +# to the standby, then verify that '--remove' +# removes them. 2) This comment can be in a single line: +# In passing, also test the --enable-two-phase option and +# --remove option Regards, Vignesh
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Shubham Khanna
Date:
On Thu, Mar 20, 2025 at 9:37 AM vignesh C <vignesh21@gmail.com> wrote: > > On Thu, 20 Mar 2025 at 09:06, Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > The attached patch contains the suggested changes. > > Couple of minor comments: > 1) The second and third line comments can be in same line: > +# Create user-defined publications, wait for streaming replication to sync them > +# to the standby, then verify that '--remove' > +# removes them. > Fixed. > 2) This comment can be in a single line: > +# In passing, also test the --enable-two-phase option and > +# --remove option > Fixed. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Thu, Mar 20, 2025 at 9:45 AM Shubham Khanna <khannashubham1197@gmail.com> wrote: > The patch had a bug in dry-run mode such that it was not emitting the drop-related command for publications created by the tool with the new option. I fixed that and pushed the patch. Thanks! -- With Regards, Amit Kapila.
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Peter Smith
Date:
Hi, I was away for the last few weeks when this feature got committed, so I missed the chance to post my comments earlier. It seems the final SGML docs are mostly from this [1] suggestion, but I think there are one or two more improvements that can be made to it: ====== 1. It is not at all obvious from the current Options syntax that -R/--remove must take a value. A <replaceable>objtype</replaceable> should be included here to address that. Also, putting the "publications" as a <listitem> renders the HTML better IMO, making it way easier to recognize that "publications" is an object type, and also making future object types easier to add. ~ 2. markup Use SGML <option> markup for --dry-run Use SGML <application> markup for pg_dump Use SGML <literal> markup for the specific object type value "publications" ~ 3. Instead of "all tables" publications, we can call these FOR ALL TABLES publications. That will be consistent with the Notes later on this page. ~ 4. "are individually logged and do show up in a --dry-run" I felt that "and do show up" is really just another way of saying "logged". So, maybe reword this to say "are individually logged, including during a --dry-run" ~~~ Please find attached a small patch that implements all these changes. ====== [1] https://www.postgresql.org/message-id/CAKFQuwbaYnSBc5fgHsoFLW_cUq2u466-3ZpkA%2Bu1Z%3D-medNgwg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
vignesh C
Date:
On Thu, 3 Apr 2025 at 09:07, Peter Smith <smithpb2250@gmail.com> wrote: > > Hi, > > I was away for the last few weeks when this feature got committed, so > I missed the chance to post my comments earlier. > > It seems the final SGML docs are mostly from this [1] suggestion, but > I think there are one or two more improvements that can be made to it: > > ====== > > 1. > It is not at all obvious from the current Options syntax that > -R/--remove must take a value. > A <replaceable>objtype</replaceable> should be included here to address that. > > Also, putting the "publications" as a <listitem> renders the HTML > better IMO, making it way easier to recognize that "publications" is > an object type, and also making future object types easier to add. > > ~ > > 2. markup > Use SGML <option> markup for --dry-run > Use SGML <application> markup for pg_dump > Use SGML <literal> markup for the specific object type value "publications" > > ~ > > 3. > Instead of "all tables" publications, we can call these FOR ALL TABLES > publications. That will be consistent with the Notes later on this > page. > > ~ > > 4. > "are individually logged and do show up in a --dry-run" > > I felt that "and do show up" is really just another way of saying "logged". > So, maybe reword this to say "are individually logged, including > during a --dry-run" > > ~~~ > > Please find attached a small patch that implements all these changes. Thanks for the patch, the changes look good to me. Regards, Vignesh
Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
From
Amit Kapila
Date:
On Thu, Apr 3, 2025 at 2:20 PM vignesh C <vignesh21@gmail.com> wrote: > > On Thu, 3 Apr 2025 at 09:07, Peter Smith <smithpb2250@gmail.com> wrote: > > > > Hi, > > > > I was away for the last few weeks when this feature got committed, so > > I missed the chance to post my comments earlier. > > > > It seems the final SGML docs are mostly from this [1] suggestion, but > > I think there are one or two more improvements that can be made to it: > > > > Thanks for the patch, the changes look good to me. > Pushed. -- With Regards, Amit Kapila.