Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. - Mailing list pgsql-hackers
From | Shubham Khanna |
---|---|
Subject | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |
Date | |
Msg-id | CAHv8RjKJ2kbmAM6b2M_fVD4OiO2hPK=OiMjG3CUUjNo6K8D0VA@mail.gmail.com Whole thread Raw |
In response to | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Mon, Feb 10, 2025 at 7:05 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Some review comments for v4-0001. > > ====== > Commit message > > 1. > This patch enhances the 'pg_createsubscriber' utility by adding the > '--all-databases' option, which automatically fetches all non-template > databases on the source server (publisher) and creates corresponding > subscriptions on the target server (subscriber). This simplifies the process > of converting a physical standby to a logical subscriber, particularly > during upgrades. > > When '--all-databases' is used, the tool queries the source server for all > databases and attempts to create subscriptions on the target server for > databases with matching names. The tool auto-generates subscription,publication > and replication slot names using the format: > - Subscription: '<database>_sub' > - Publication: '<database>_pub' > - Replication slot: '<database>_slot' > > ~ > > There seems a lot of duplication in the above 2 paragraphs. > Duplication can be removed. > BTW, those generated formats are suspicious -- see a later comment below. > > SUGGESTION: > This patch enhances the 'pg_createsubscriber' utility by adding the > '--all-databases' option. When '--all-databases' is specified, the tool > queries the source server (publisher) for all databases and creates > subscriptions on the target server (subscriber) for databases with matching > names. This simplifies the process of converting a physical standby to a > logical subscriber, particularly during upgrades. > > The tool auto-generates subscription, publication and replication slot > names using the format: > ... > > ~~~ > > 2. > The patch ensures that conflicting options such as '--all-databases' and > '--database', '--publication', '--subscription', or '--replication-slot' cannot > be used together > > ~ > > That wording seems misleading because it makes it sound like > '--publication' and '--database' cannot be used together. > > SUGGESTION > The options '--database', '--publication', '--subscription', and > '--replication-slot' cannot be used when '--all-databases' is > specified. > > ====== > doc/src/sgml/ref/pg_createsubscriber.sgml > > 3. > What about the synopsis (??) > > v4 did not address my previous review comment [1, #5] about the synopsis > > > 5. > > The synopsis has no mention of --all-databases. > > > > I'm not sure of the best way to do it -- if it becomes too messy to > > combine everything then maybe having several is the way to go: > > > > e.g. > > > > pg_createsubscriber [option...] { -a | --all-databases } { -D | > > --pgdata }datadir { -P | --publisher-server }connstr > > pg_createsubscriber [option...] { -d | --database }dbname { -D | > > --pgdata }datadir { -P | --publisher-server }connstr > > > > ~~~ > > 4. > + Automatically fetch all non-template databases from the source server > + and create subscriptions for databases with the same names on the > + target server. > + If a database is present on the source but missing on the target, it is > + skipped or an error is raised. > + If a database exists on the target but not on the source, no > + subscription is created for it. > > "it is skipped or an error is raised" (??) So which is it? It cannot be both. > > ~~~ > > 5. > + Subscription names, publication names, and replication slot names are > + automatically generated using the format: > + <literal><replaceable>database</replaceable>_sub</literal> > + <literal><replaceable>database</replaceable>_pub</literal> and > + <literal><replaceable>database</replaceable>_slot</literal> > respectively. > + </para> > > This seems strange to me, because according to the documentation NOTES > section when --database is used and there is no > publication/subscription name etc then all the generated name format > looks like: > > ------ > If the --publication option is not specified, the publication has the > following name pattern: “pg_createsubscriber_%u_%x” (parameter: > database oid, random int). If the --replication-slot option is not > specified, the replication slot has the following name pattern: > “pg_createsubscriber_%u_%x” (parameters: database oid, random int). > ------ > > and > > ------ > If the --subscription option is not specified, the subscription has > the following name pattern: “pg_createsubscriber_%u_%x” (parameters: > database oid, random int). > ------ > > So, why are the generated names different for '--all-databases'? > Actually, I can't see any code even doing what the new documentation > is saying so is the documentation even telling the truth? The commit > message will also be impacted. > > ~~~ > > 6. > + <para> > + <option>--all-databases</option> cannot be used with > + <option>--publication</option>, <option>--subscription</option>, > + or <option>--replication-slot</option>. > + </para> > > What about "--database". > > ~~~ > > 7. > For all the other incompatible options you wrote: > > "Cannot be used together with <option>-a</option>." > > IMO it might be nioer to give the full name of the option. > e.g. "Cannot be used together with <option>--all-databases</option>." > > ====== > src/bin/pg_basebackup/pg_createsubscriber.c > > usage: > > 8. > + printf(_(" -a, --all-databases create subscriptions for > all matching databases on the target\n")); > > That seems ambiguous to me. How about something more like below to > clarify the subscription is created on the target. > > "for all source databases create subscriptions on the same target database" > > ~~~ > > fetch_source_databases: > > 9. > + const char *query = "SELECT datname FROM pg_database WHERE > datistemplate = false"; > > Do we need this variable? It seems simpler/better just to put the SQL in-line. > > ~~~ > > main: > > 10. > switch (c) > { > + case 'a': > + opt.all_databases = true; > + break; > + > > 10a. > Missing the check for duplicate option --all-databases. OTOH maybe you > don't care about this error, because probably it is harmless if you > just ignore it. > > ~ > > 10b. > Missing the check for --database, --publication, --subscription, > --replication-slot. > > (e.g. if user specified the --all-databases *after* those other options) > > ~~~ > > 11. > + if (opt.all_databases) > + { > + pg_log_error("--replslot cannot be used with --all-databases"); > + exit(1); > + } > > Where'd this name come from? AFAICT there is no such option as "--replslot" > > ~~~ > > 12. > - * If --database option is not provided, try to obtain the dbname from > - * the publisher conninfo. If dbname parameter is not available, error > - * out. > + * If neither --database nor --all-databases option is provided, try > + * to obtain the dbname from the publisher conninfo. If dbname > + * parameter is not available, error out. > > For this comment to make sense I think the previous comment (where > fetch_source_databases is called) needs to explain that when > --all-databases is defined then it is going to treat that internally > as the equivalent of a whole lot of user-specified --database options. > > ====== > .../t/040_pg_createsubscriber.pl > > 13. > + qr/cannot specify --publication or --replication-slot when using > --all-databases/, > + 'fail if --all-databases is used with publication and slot'); > + > > How are tests expecting this even passing? > > Searching the patch I cannot find any such error! > > ====== Fixed the given comments. The attached patch at [1] contains the suggested changes. [1] - https://www.postgresql.org/message-id/CAHv8Rj%2BmhYgh8XLFM8sN8J05z0ia%2BknfB1kP6kjbnB55H-b-mw%40mail.gmail.com Thanks and regards, Shubham Khanna.
pgsql-hackers by date: