RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | TYAPR01MB5866A537AC9AD46E49345A78F5769@TYAPR01MB5866.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
RE: [PoC] pg_upgrade: allow to upgrade publisher node |
List | pgsql-hackers |
Dear Peter, Thank you for reviewing! PSA new version. > > General. > > 1. pg_dump option is documented to the user. > > I'm not sure about exposing the new pg_dump > --logical-replication-slots-only option to the user. > > I thought this pg_dump option was intended only to be called > *internally* by the pg_upgrade. > But, this patch is also documenting the new option for the user (in > case they want to call it independently?) > > Maybe exposing it is OK, but if you do that then I thought perhaps > there should also be some additional pg_dump tests just for this > option (i.e. tested independently of the pg_upgrade) Right, I have written the document for the moment, but it should not If it is not exposed. Removed from the doc. > Commit message > > 2. > For pg_upgrade, when '--include-logical-replication-slots' is > specified, it executes > pg_dump with the new "--logical-replication-slots-only" option and > restores from the > dump. Apart from restoring schema, pg_resetwal must not be called > after restoring > replication slots. This is because the command discards WAL files and > starts from a > new segment, even if they are required by replication slots. This > leads to an ERROR: > "requested WAL segment XXX has already been removed". To avoid this, > replication slots > are restored at a different time than other objects, after running pg_resetwal. > > ~~ > > The "Apart from" sentence maybe could do with some rewording. I > noticed there is a code comment (below fragment) that says the same as > this, but more clearly. Maybe it is better to use that code-comment > wording in the comment message. > > + * XXX We cannot dump replication slots at the same time as the schema > + * dump because we need to separate the timing of restoring > + * replication slots and other objects. Replication slots, in > + * particular, should not be restored before executing the pg_resetwal > + * command because it will remove WALs that are required by the slots. Changed. > src/bin/pg_dump/pg_dump.c > > 3. main > > + if (dopt.logical_slots_only && !dopt.binary_upgrade) > + pg_fatal("options --logical-replication-slots-only requires option > --binary-upgrade"); > + > + if (dopt.logical_slots_only && dopt.dataOnly) > + pg_fatal("options --logical-replication-slots-only and > -a/--data-only cannot be used together"); > + if (dopt.logical_slots_only && dopt.schemaOnly) > + pg_fatal("options --logical-replication-slots-only and > -s/--schema-only cannot be used together"); > + > > Consider if it might be simpler to combine together all those > dopt.logical_slots_only checks. > > SUGGESTION > > if (dopt.logical_slots_only) > { > if (!dopt.binary_upgrade) > pg_fatal("options --logical-replication-slots-only requires > option --binary-upgrade"); > > if (dopt.dataOnly) > pg_fatal("options --logical-replication-slots-only and > -a/--data-only cannot be used together"); > if (dopt.schemaOnly) > pg_fatal("options --logical-replication-slots-only and > -s/--schema-only cannot be used together"); > } Right, fixed. > 4. getLogicalReplicationSlots > > + /* Check whether we should dump or not */ > + if (fout->remoteVersion < 160000 || !dopt->logical_slots_only) > + return; > > I'm not sure if this check is necessary. Given the way this function > is called, is it possible for this check to fail? Maybe that quick > exit would be better code as an Assert? I think the version check must be needed because it is not done yet. (Actually I'm not sure the restriction is needed, but now I will keep) About dopt->logical_slots_only, I agreed to remove that. > 5. dumpLogicalReplicationSlot > > +dumpLogicalReplicationSlot(Archive *fout, > + const LogicalReplicationSlotInfo *slotinfo) > +{ > + DumpOptions *dopt = fout->dopt; > + > + if (!dopt->logical_slots_only) > + return; > > (Similar to the previous comment). Is it even possible to arrive here > when dopt->logical_slots_only is false. Maybe that quick exit would be > better coded as an Assert? I think it is not possible, so changed to Assert(). > 6. > + PQExpBuffer query = createPQExpBuffer(); > + char *slotname = pg_strdup(slotinfo->dobj.name); > > I wondered if it was really necessary to strdup/free this slotname. > e.g. And if it is, then why don't you do this for the slotinfo->plugin > field? This was a debris for my testing. Removed. > src/bin/pg_upgrade/check.c > > 7. check_and_dump_old_cluster > > /* Extract a list of databases and tables from the old cluster */ > get_db_and_rel_infos(&old_cluster); > + get_logical_slot_infos(&old_cluster); > > Is it correct to associate this new call with that existing comment > about "databases and tables"? Added a comment. > 8. check_new_cluster > > @@ -188,6 +190,7 @@ void > check_new_cluster(void) > { > get_db_and_rel_infos(&new_cluster); > + get_logical_slot_infos(&new_cluster); > > check_new_cluster_is_empty(); > > @@ -210,6 +213,9 @@ check_new_cluster(void) > check_for_prepared_transactions(&new_cluster); > > check_for_new_tablespace_dir(&new_cluster); > + > + if (user_opts.include_logical_slots) > + check_for_parameter_settings(&new_cluster); > > Can the get_logical_slot_infos() be done later, guarded by that the > same condition if (user_opts.include_logical_slots)? Added. > 9. check_new_cluster_is_empty > > + * If --include-logical-replication-slots is required, check the > + * existing of slots > + */ > > Did you mean to say "check the existence of slots"? Yes, it is my typo. Fixed. > 10. check_for_parameter_settings > > + if (strcmp(wal_level, "logical") != 0) > + pg_fatal("wal_level must be \"logical\", but set to \"%s\"", > + wal_level); > > /but set to/but is set to/ Fixed. > src/bin/pg_upgrade/info.c > > 11. get_db_and_rel_infos > > + { > get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]); > > + /* > + * Additionally, slot_arr must be initialized because they will be > + * checked later. > + */ > + cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0; > + cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL; > + } > > 11a. > I think probably it would have been easier to just use 'pg_malloc0' > instead of 'pg_malloc' in the get_db_infos, then this code would not > be necessary. I was not sure whether it is OK to change like that because of the performance efficiency. But OK, fixed. > 11b. > BTW, shouldn't this function also be calling free_logical_slot_infos() > too? That will also have the same effect (initializing the slot_arr) > but without having to change anything else. > > ~~~ > > 12. get_logical_slot_infos > +/* > + * Higher level routine to generate LogicalSlotInfoArr for all databases. > + */ > +void > +get_logical_slot_infos(ClusterInfo *cluster) > > To be consistent with the other nearby function headers it should have > another line saying just get_logical_slot_infos(). Added. > 13. get_logical_slot_infos > > +void > +get_logical_slot_infos(ClusterInfo *cluster) > +{ > + int dbnum; > + > + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > + { > + if (cluster->dbarr.dbs[dbnum].slot_arr.slots) > + free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr); > + > + get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]); > + } > + > + if (cluster == &old_cluster) > + pg_log(PG_VERBOSE, "\nsource databases:"); > + else > + pg_log(PG_VERBOSE, "\ntarget databases:"); > + > + if (log_opts.verbose) > + { > + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > + { > + pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name); > + print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr); > + } > + } > +} > > I didn't see why there are 2 loops exactly the same. I think with some > minor refactoring these can both be done in the same loop can't they? The style follows get_db_and_rel_infos(), but... > SUGGESTION 1: > > if (cluster == &old_cluster) > pg_log(PG_VERBOSE, "\nsource databases:"); > else > pg_log(PG_VERBOSE, "\ntarget databases:"); > > for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > { > if (cluster->dbarr.dbs[dbnum].slot_arr.slots) > free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr); > > get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]); > > if (log_opts.verbose) > { > pg_log(PG_VERBOSE, "Database: %s", > cluster->dbarr.dbs[dbnum].db_name); > print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr); > } > } > > ~ > > I expected it could be simplified further still by using some variables > > SUGGESTION 2: > > if (cluster == &old_cluster) > pg_log(PG_VERBOSE, "\nsource databases:"); > else > pg_log(PG_VERBOSE, "\ntarget databases:"); > > for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > { > DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum]; > if (pDbInfo->slot_arr.slots) > free_logical_slot_infos(&pDbInfo->slot_arr); > > get_logical_slot_infos_per_db(cluster, pDbInfo); > > if (log_opts.verbose) > { > pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name); > print_slot_infos(&pDbInfo->slot_arr); > } > } I chose SUGGESTION 2. > 14. get_logical_slot_infos_per_db > > + char query[QUERY_ALLOC]; > + > + query[0] = '\0'; /* initialize query string to empty */ > + > + snprintf(query + strlen(query), sizeof(query) - strlen(query), > + "SELECT slot_name, plugin, two_phase " > + "FROM pg_catalog.pg_replication_slots " > + "WHERE database = current_database() AND temporary = false " > + "AND wal_status IN ('reserved', 'extended');"); > > I didn't understand the purpose of those calls to 'strlen(query)' > since the string was initialised to empty-string immediately above. Removed. > 15. > +static void > +print_slot_infos(LogicalSlotInfoArr *slot_arr) > +{ > + int slotnum; > + > + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++) > + pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d", > + slot_arr->slots[slotnum].slotname, > + slot_arr->slots[slotnum].plugin, > + slot_arr->slots[slotnum].two_phase); > +} > > IMO those colons don't make sense. > > BEFORE > "slotname: %s: plugin: %s: two_phase %d" > > SUGGESTION > "slotname: %s, plugin: %s, two_phase: %d" Fixed. I followed print_rel_infos() style, but I prefer yours. > src/bin/pg_upgrade/pg_upgrade.h > > 16. LogicalSlotInfo > > +typedef struct > +{ > + char *slotname; /* slot name */ > + char *plugin; /* plugin */ > + bool two_phase; /* Can the slot decode 2PC? */ > +} LogicalSlotInfo; > > The RelInfo had a comment for the typedef struct, so I think the > LogicalSlotInfo struct also should have a comment. Added. > 17. DbInfo > > RelInfoArr rel_arr; /* array of all user relinfos */ > + LogicalSlotInfoArr slot_arr; /* array of all logicalslotinfos */ > } DbInfo; > > Should the comment say "LogicalSlotInfo" instead of "logicalslotinfos"? Right, fixed. > .../t/003_logical_replication_slots.pl > > 18. RESULTS > > I run this by 'make check' in the src/bin/pg_upgrade folder. > > For some reason, the test does not work for me. The results I get are: > > # +++ tap check in src/bin/pg_upgrade +++ > t/001_basic.pl ...................... ok > t/002_pg_upgrade.pl ................. ok > t/003_logical_replication_slots.pl .. 3/? # Tests were run but no plan > was declared and done_testing() was not seen. > t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat > 7424, 0x1d00) > All 4 subtests passed > > Test Summary Report > ------------------- > t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 4 Failed: 0) > Non-zero exit status: 29 > Parse errors: No plan found in TAP output > Files=3, Tests=27, 128 wallclock secs ( 0.04 usr 0.01 sys + 18.02 > cusr 6.06 csys = 24.13 CPU) > Result: FAIL > make: *** [check] Error 1 > > ~ > > And the log file > (tmp_check/log/003_logical_replication_slots_old_node.log) shows the > following ERROR: > > 2023-05-09 12:19:25.330 AEST [32572] 003_logical_replication_slots.pl > LOG: statement: SELECT > pg_create_logical_replication_slot('test_slot', 'test_decoding', > false, true); > 2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl > ERROR: could not access file "test_decoding": No such file or > directory > 2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl > STATEMENT: SELECT pg_create_logical_replication_slot('test_slot', > 'test_decoding', false, true); > 2023-05-09 12:19:25.335 AEST [32564] LOG: received immediate shutdown > request > 2023-05-09 12:19:25.337 AEST [32564] LOG: database system is shut down > > ~ > > Is it a bug? Or, if I am doing something wrong please let me know how > to run the test. Good point. I could not find the problem because I used meson build system. When I used the traditional make, the ERROR could be reproduced. IIUC the problem was occurred the dependency between pg_upgrade and test_decoding was not set in the Makefile. Hence, I added a variable EXTRA_INSTALL to Makefile in order to clarify the dependency. This followed other directories like pg_basebackup. > 19. > +# Clean up > +rmtree($new_node->data_dir . "/pg_upgrade_output.d"); > +$new_node->append_conf('postgresql.conf', "wal_level = 'logical'"); > +$new_node->append_conf('postgresql.conf', "max_replication_slots = 0"); > > I think the last 2 lines are not "clean up". They are preparations for > the subsequent test, so maybe they should be commented as such. Right, it is a preparation for the next. Added a comment. > 20. > +# Clean up > +rmtree($new_node->data_dir . "/pg_upgrade_output.d"); > +$new_node->append_conf('postgresql.conf', "max_replication_slots = 10"); > > I think the last line is not "clean up". It is preparation for the > subsequent test, so maybe it should be commented as such. Added a comment. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
pgsql-hackers by date: