Re: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | CAHut+Pt7Ri1r2Q7BQ3qwtGtDYFjvBOU8tUOUF-E41-jjxYSFTA@mail.gmail.com Whole thread Raw |
In response to | RE: [PoC] pg_upgrade: allow to upgrade publisher node ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
RE: [PoC] pg_upgrade: allow to upgrade publisher node
|
List | pgsql-hackers |
Here are some review comments for v22-0002 ====== Commit Message 1. This commit allows nodes with logical replication slots to be upgraded. While reading information from the old cluster, a list of logical replication slots is newly extracted. At the later part of upgrading, pg_upgrade revisits the list and restores slots by using the pg_create_logical_replication_slots() on the new clushter. ~ 1a /is newly extracted/is fetched/ ~ 1b. /using the pg_create_logical_replication_slots()/executing pg_create_logical_replication_slots()/ ~ 1c. /clushter/cluster/ ~~~ 2. Note that it must be done after the final pg_resetwal command during the upgrade because pg_resetwal will remove WALs that are required by the slots. Due to the restriction, the timing of restoring replication slots is different from other objects. ~ 2a. /it must/slot restoration/ ~ 2b. /the restriction/this restriction/ ====== doc/src/sgml/ref/pgupgrade.sgml 3. + <para> + <application>pg_upgrade</application> attempts to migrate logical + replication slots. This helps avoid the need for manually defining the + same replication slot on the new publisher. + </para> /same replication slot/same replication slots/ ~~~ 4. + <para> + Before you start upgrading the publisher cluster, ensure that the + subscription is temporarily disabled, by executing + <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>. + After the upgrade is complete, execute the + <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the + connection string, and then re-enable the subscription. + </para> On the rendered page, it looks a bit strange that DISABLE has a link but COMMENTION does not have a link. ~~~ 5. + <para> + There are some prerequisites for <application>pg_upgrade</application> to + be able to upgrade the replication slots. If these are not met an error + will be reported. + </para> + + <itemizedlist> +1 to use all the itemizedlist changes that Amit suggested [1] in his attachment. ====== src/bin/pg_upgrade/check.c 6. +static void check_for_logical_replication_slots(ClusterInfo *new_cluster); IMO the arg name should not shadow a global with the same name. See other review comment for this function signature. ~~~ 7. + /* Extract a list of logical replication slots */ + get_logical_slot_infos(&old_cluster, live_check); But 'live_check' is never used? ~~~ 8. check_for_logical_replication_slots + +/* + * Verify the parameter settings necessary for creating logical replication + * slots. + */ +static void +check_for_logical_replication_slots(ClusterInfo *new_cluster) IMO the arg name should not shadow a global with the same name. If this is never going to be called with any param other than &new_cluster then probably it is better not even to pass have that argument at all. Just refer to the global new_cluster inside the function. You can't say that 'check_for_new_tablespace_dir' does it already so it must be OK -- I think that the existing function has the same issue and it also ought to be fixed to avoid shadowing! ~~~ 9. check_for_logical_replication_slots + /* logical replication slots can be migrated since PG17. */ + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600) + return; IMO the code matches the comment better if you say < 1700 instead of <= 1600. ====== src/bin/pg_upgrade/function.c 10. get_loadable_libraries /* - * Fetch all libraries containing non-built-in C functions in this DB. + * Fetch all libraries containing non-built-in C functions or referred + * by logical replication slots in this DB. */ ress[dbnum] = executeQueryOrDie(conn, ~ /referred by/referred to by/ ====== src/bin/pg_upgrade/info.c 11. +/* + * get_logical_slot_infos() + * + * Higher level routine to generate LogicalSlotInfoArr for all databases. + */ +void +get_logical_slot_infos(ClusterInfo *cluster, bool live_check) +{ + int dbnum; + int slot_count = 0; + + if (cluster == &old_cluster) + pg_log(PG_VERBOSE, "\nsource databases:"); + else + pg_log(PG_VERBOSE, "\ntarget databases:"); + + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) + { + DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum]; + + get_logical_slot_infos_per_db(cluster, pDbInfo); + slot_count += pDbInfo->slot_arr.nslots; + + if (log_opts.verbose) + { + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name); + print_slot_infos(&pDbInfo->slot_arr); + } + } +} + 11a. Now the variable 'slot_count' is no longer being returned so it seems redundant. ~ 11b. What is the 'live_check' parameter for? Nobody is using it. ~~~ 12. count_logical_slots +int +count_logical_slots(ClusterInfo *cluster) +{ + int dbnum; + int slotnum = 0; + + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) + slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots; + + return slotnum; +} IMO this variable should be called something like 'slot_count'. This is the same review comment also made in a previous review. (See [2] comment#12). ~~~ 13. print_slot_infos + +static void +print_slot_infos(LogicalSlotInfoArr *slot_arr) +{ + int slotnum; + + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++) + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d", + slot_arr->slots[slotnum].slotname, + slot_arr->slots[slotnum].plugin, + slot_arr->slots[slotnum].two_phase); +} It might be nicer to introduce a variable, instead of all those array dereferences: LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum]; ~~~ 14. + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++) + { + /* + * Constructs query for creating logical replication slots. + * + * XXX: For simplification, pg_create_logical_replication_slot() is + * used. Is it sufficient? + */ + appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot("); + appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname, + conn); + appendPQExpBuffer(query, ", "); + appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin, + conn); + appendPQExpBuffer(query, ", false, %s);", + slot_arr->slots[slotnum].two_phase ? "true" : "false"); + + PQclear(executeQueryOrDie(conn, "%s", query->data)); + + resetPQExpBuffer(query); + } + + PQfinish(conn); + + destroyPQExpBuffer(query); + } + + end_progress_output(); + check_ok(); 14a Similar to the previous comment (#13). It might be nicer to introduce a variable, instead of all those array dereferences: LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum]; ~ 14b. It was not clear to me why this command is not being built using executeQueryOrDie directly instead of using the query buffer. Is there some reason? ~ 14c. I think it would be cleaner to have a separate res variable like you used elsewhere: res = executeQueryOrDie(...) instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data)); in one line ====== src/bin/pg_upgrade/pg_upgrade. 15. +void get_logical_slot_infos(ClusterInfo *cluster, bool live_check); I didn't see a reason for that 'live_check' parameter. ====== .../pg_upgrade/t/003_logical_replication_slots.pl 16. IMO this would be much easier to read if there were BIG comments between the actual TEST parts For example # ------------------------------ # TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical' <preparation> <test> <cleanup> # ------------------------------ # TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low <preparation> <test> <cleanup> # ------------------------------ # TEST: Successful upgrade <preparation> <test> <cleanup> ~~~ 17. +# Cause a failure at the start of pg_upgrade because wal_level is replica +command_fails( + [ + 'pg_upgrade', '--no-sync', + '-d', $old_publisher->data_dir, + '-D', $new_publisher->data_dir, + '-b', $bindir, + '-B', $bindir, + '-s', $new_publisher->host, + '-p', $old_publisher->port, + '-P', $new_publisher->port, + $mode, + ], + 'run of pg_upgrade of old node with wrong wal_level'); +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", + "pg_upgrade_output.d/ not removed after pg_upgrade failure"); The message is ambiguous BEFORE 'run of pg_upgrade of old node with wrong wal_level' SUGGESTION 'run of pg_upgrade where the new node has the wrong wal_level' ~~~ 18. +# Create an unnecessary slot on old node +$old_publisher->start; +$old_publisher->safe_psql( + 'postgres', qq[ + SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true); +]); + +$old_publisher->stop; + +# Preparations for the subsequent test. max_replication_slots is set to +# smaller than existing slots on old node +$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'"); +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1"); IMO the comment is misleading. It is not an "unnecessary slot", it is just a 2nd slot. And this is all part of the preparation for the next test so it should be under the other comment. For example SUGGESTION changes like this: # Preparations for the subsequent test. # 1. Create an unnecessary slot on the old node $old_publisher->start; $old_publisher->safe_psql( 'postgres', qq[ SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true); ]); $old_publisher->stop; # 2. max_replication_slots is set to smaller than the number of slots (2) present on the old node $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1"); # 3. new node wal_level is set correctly $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'"); ~~~ 19. +# Remove an unnecessary slot and consume WAL records +$old_publisher->start; +$old_publisher->safe_psql( + 'postgres', qq[ + SELECT pg_drop_replication_slot('test_slot2'); + SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL) +]); +$old_publisher->stop; + This comment should say more like: # Preparations for the subsequent test. ~~~ 20. +# Actual run, pg_upgrade_output.d is removed at the end This comment should mention that "successful upgrade is expected" because all the other prerequisites are now satisfied. ~~~ 21. +$new_publisher->start; +my $result = $new_publisher->safe_psql('postgres', + "SELECT slot_name, two_phase FROM pg_replication_slots"); +is($result, qq(test_slot1|t), 'check the slot exists on new node'); Should there be a matching new_pulisher->stop;? ------ [1] https://www.postgresql.org/message-id/CAA4eK1%2BdT2g8gmerguNd_TA%3DXMnm00nLzuEJ_Sddw6Pj-bvKVQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/TYAPR01MB586604802ABE42E11866762FF51BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: