Re: pg_upgrade and logical replication - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: pg_upgrade and logical replication |
Date | |
Msg-id | CALDaNm2ssmSFs4bjpfxbkfUbPE=xFSGqxFoip87kF259FG=X2g@mail.gmail.com Whole thread Raw |
In response to | Re: pg_upgrade and logical replication (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: pg_upgrade and logical replication
|
List | pgsql-hackers |
On Thu, 16 Nov 2023 at 07:45, Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some review comments for patch v14-0001 > > ====== > src/backend/utils/adt/pg_upgrade_support.c > > 1. binary_upgrade_replorigin_advance > > + /* lock to prevent the replication origin from vanishing */ > + LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock); > + originid = replorigin_by_name(originname, false); > > Use uppercase for the lock comment. Modified > ====== > src/bin/pg_upgrade/check.c > > 2. check_for_subscription_state > > > > + prep_status("Checking for subscription state"); > > > + > > > + snprintf(output_path, sizeof(output_path), "%s/%s", > > > + log_opts.basedir, > > > + "subscription_state.txt"); > > > > > > I felt this filename ought to be more like > > > 'subscriptions_with_bad_state.txt' because the current name looks like > > > a normal logfile with nothing to indicate that it is only for the > > > states of the "bad" subscriptions. > > > > I have kept the file name intentionally shorted as we noticed that > > when the upgrade of the publisher patch used a longer name there were > > some buildfarm failures because of longer names. > > OK, but how about some other short meaningful name like 'subs_invalid.txt'? > > I also thought "state" in the original name was misleading because > this file contains not only subscriptions with bad 'state' but also > subscriptions with missing 'origin'. Modified > ~~~ > > 3. check_new_cluster_logical_replication_slots > > int nslots_on_old; > int nslots_on_new; > + int nsubs_on_old = old_cluster.subscription_count; > > I felt it might be better to make both these quantities 'unsigned' to > make it more obvious that there are no special meanings for negative > numbers. I have used int itself as all others also use int like in case of logical slots. I tried making the changes, but the code was not consistent, so used int like that is used for others. > ~~~ > > 4. check_new_cluster_logical_replication_slots > > nslots_on_old = count_old_cluster_logical_slots(); > > ~ > > IMO the 'nsubs_on_old' should be coded the same as above. AFAICT, this > is the only code where you are interested in the number of > subscribers, and furthermore, it seems you only care about that count > in the *old* cluster. This means the current implementation of > get_subscription_count() seems more generic than it needs to be and > that results in more unnecessary patch code. (I will repeat this same > review comment in the other relevant places). > > SUGGESTION > nslots_on_old = count_old_cluster_logical_slots(); > nsubs_on_old = count_old_cluster_subscriptions(); Modified to keep it similar to logical slot implementation. > ~~~ > > 5. > + /* > + * Quick return if there are no logical slots and subscriptions to be > + * migrated. > + */ > + if (nslots_on_old == 0 && nsubs_on_old == 0) > return; > > /and subscriptions/and no subscriptions/ Modified > ~~~ > > 6. > - if (nslots_on_old > max_replication_slots) > + if (nslots_on_old && nslots_on_old > max_replication_slots) > pg_fatal("max_replication_slots (%d) must be greater than or equal > to the number of " > "logical replication slots (%d) on the old cluster", > max_replication_slots, nslots_on_old); > > Neither nslots_on_old nor max_replication_slots can be < 0, so I don't > see why the additional check is needed here. > AFAICT "if (nslots_on_old > max_replication_slots)" acheives the same > thing that you want. This part of code is changed now > ~~~ > > 7. > + if (nsubs_on_old && nsubs_on_old > max_replication_slots) > + pg_fatal("max_replication_slots (%d) must be greater than or equal > to the number of " > + "subscriptions (%d) on the old cluster", > + max_replication_slots, nsubs_on_old); > > Neither nsubs_on_old nor max_replication_slots can be < 0, so I don't > see why the additional check is needed here. > AFAICT "if (nsubs_on_old > max_replication_slots)" achieves the same > thing that you want. This part of code is changed now > ====== > src/bin/pg_upgrade/info.c > > 8. get_db_rel_and_slot_infos > > + if (cluster == &old_cluster) > + get_subscription_count(cluster); > + > > I felt this is unnecessary because you only want to know the > nsubs_on_old in one place and then only for the old cluster, so > calling this to set a generic attribute for the cluster is overkill. We need to do this here because when we do the validation of new cluster the old cluster will not be running. I have made the flow similar to logical slots now. > ~~~ > > 9. > +/* > + * Get the number of subscriptions in the old cluster. > + */ > +static void > +get_subscription_count(ClusterInfo *cluster) > +{ > + PGconn *conn; > + PGresult *res; > + > + if (GET_MAJOR_VERSION(cluster->major_version) < 1700) > + return; > + > + conn = connectToServer(cluster, "template1"); > + res = executeQueryOrDie(conn, > + "SELECT oid FROM pg_catalog.pg_subscription"); > + > + cluster->subscription_count = PQntuples(res); > + > + PQclear(res); > + PQfinish(conn); > +} > > 9a. > Currently, this is needed only for the old_cluster (like the function > comment implies), so the parameter is not required. > > Also, AFAICT this number is only needed in one place > (check_new_cluster_logical_replication_slots) so IMO it would be > better to make lots of changes to simplify this code: > - change the function name to be like the other one. e.g. > count_old_cluster_subscriptions() > - function to return unsigned > > SUGGESTION (something like this...) > > unsigned > count_old_cluster_subscriptions(void) > { > unsigned nsubs = 0; > > if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) > { > PGconn *conn = connectToServer(&old_cluster, "template1"); > PGresult *res = executeQueryOrDie(conn, > "SELECT oid FROM pg_catalog.pg_subscription"); > nsubs = PQntuples(res); > PQclear(res); > PQfinish(conn); > } > > return nsubs; > } This function is not needed anymore, making the logic similar to logical slots. > ~ > > 9b. > This function is returning 0 (aka not assigning > cluster->subscription_count) for clusters before PG17. IIUC this is > effectively the same behaviour as count_old_cluster_logical_slots() > but probably it needs to be mentioned more in this function comment > why it is like this. This function is not needed anymore, making the logic similar to logical slots. > ====== > src/bin/pg_upgrade/pg_upgrade.h > > 10. > const char *tablespace_suffix; /* directory specification */ > + int subscription_count; /* number of subscriptions */ > } ClusterInfo; > > I felt this is not needed because you only need to know the > nsubs_on_old in one place, so you can just call the counting function > from there. Making this a generic attribute for the cluster seems > overkill. We need to do this here because when we do the validation of a new cluster the old cluster will not be running. I have made the flow similar to logical slots now. > ====== > src/bin/pg_upgrade/t/004_subscription.pl > > 11. TEST: Check that pg_upgrade is successful when the table is in init state. > > +$synced_query = > + "SELECT count(1) = 1 FROM pg_subscription_rel WHERE srsubstate = 'i'"; > +$old_sub1->poll_query_until('postgres', $synced_query) > + or die "Timed out while waiting for subscriber to synchronize data"; > > But it doesn't get to "synchronize data", so should that message say > more like "Timed out while waiting for the table to reach INIT state" Modified > ~ > > 12. > +command_ok( > + [ > + 'pg_upgrade', '--no-sync', '-d', $old_sub1->data_dir, > + '-D', $new_sub1->data_dir, '-b', $bindir, > + '-B', $bindir, '-s', $new_sub1->host, > + '-p', $old_sub1->port, '-P', $new_sub1->port, > + $mode, > + ], > + 'run of pg_upgrade --check for old instance when the subscription > tables are in ready state' > +); > > Should that message say "init state" instead of "ready state"? Modified > ~~~ > > 13. TEST: when the subscription's replication origin does not exist. > > +$old_sub2->safe_psql('postgres', > + "ALTER SUBSCRIPTION regress_sub2 disable"); > > /disable/DISABLE/ Modified > ~~~ > > 14. > +my $subid = $old_sub2->safe_psql('postgres', > + "SELECT oid FROM pg_subscription WHERE subname = 'regress_sub2'"); > +my $reporigin = 'pg_'.qq($subid); > +$old_sub2->safe_psql('postgres', > + "SELECT pg_replication_origin_drop('$reporigin')" > +); > > Maybe this part needs a comment to say the reason why the origin does > not exist -- it's because you found and explicitly dropped it. Modified The attached v15 version patch has the changes for the same. Regards, Vignesh
Attachment
pgsql-hackers by date: