RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | TYAPR01MB586644F209335F2B35A01243F5DFA@TYAPR01MB5866.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
|
List | pgsql-hackers |
Dear Bharath, Amit, Thanks for reviewing! PSA new version. I addressed comments which have not been claimed. > On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > Thank you for reviewing! PSA new version. > > > > > > 6. A nit: how about is_decodable_txn or is_decodable_change or some > > > > other instead of just a plain name processing_required? > > > > + /* Do we need to process any change in 'fast_forward' mode? */ > > > > + bool processing_required; > > > > > > I preferred current one. Because not only decodable txn, non-txn change and > > > empty transactions also be processed. > > > > Right. It's not the txn, but the change. processing_required seems too > > generic IMV. A nit: is_change_decodable or something? > > > > If we don't want to keep it generic then we should use something like > 'contains_decodable_change'. 'is_change_decodable' could have suited > here if we were checking a particular change. I kept the name for now. How does Bharath think? > > Thanks for the patch. Here are few comments on v56 patch: > > > > 1. > > + * > > + * Although this function is currently used only during pg_upgrade, there are > > + * no reasons to restrict it, so IsBinaryUpgrade is not checked here. > > > > This comment isn't required IMV, because anyone looking at the code > > and callsites can understand it. Removed. > > 2. A nit: IMV "This is a special purpose ..." statement seems redundant. > > + * > > + * This is a special purpose function to ensure that the given slot can be > > + * upgraded without data loss. > > > > How about > > > > Verify that the given replication slot has consumed all the WAL changes. > > If there's any decodable WAL record after the slot's > > confirmed_flush_lsn, the slot's consumer will lose that data after the > > slot is upgraded. > > Returns true if there are no decodable WAL records after the > > confirmed_flush_lsn. Otherwise false. > > > > Personally, I find the current comment succinct and clear. I kept current one. > > 3. > > + if (PG_ARGISNULL(0)) > > + elog(ERROR, "null argument to > > binary_upgrade_validate_wal_records is not allowed"); > > > > I can see the above style is referenced from > > binary_upgrade_create_empty_extension, but IMV the following looks > > better and latest (ereport is new style than elog) > > > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("replication slot name cannot be null"))); > > > > Do you have any theory for making elog to ereport? I am not completely > sure but as this and related function is used internally, so using > elog seems reasonable. Also, I find keeping it consistent with the > existing error message is also reasonable. We can change both later > together if we get a broader agreement. I kept current style. elog() was used here because I regarded it as "cannot happen" error. According to the doc [1], elog() is still used for the purpose. > > 4. The following comment seems frivolous, the code tells it all. > > Please remove the comment. > > + > > + /* No need to check this slot, seek to new one */ > > + continue; Removed. > > 5. A typo - s/gets/Gets > > + * gets the LogicalSlotInfos for all the logical replication slots of the Replaced. > > 6. An optimization in count_old_cluster_logical_slots(void): Turn > > slot_count to a function static variable so that the for loop isn't > > required every time because the slot count is prepared in > > get_old_cluster_logical_slot_infos only once and won't change later > > on. Do you see any problem with the following? This saves a few CPU > > cycles when there are large number of replication slots. > > { > > static int slot_count = 0; > > static bool first_time = true; > > > > if (first_time) > > { > > for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) > > slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots; > > > > first_time = false; > > } > > > > return slot_count; > > } > > > > This may not be a problem but this is also not a function that will be > used frequently. I am not sure if adding such code optimizations is > worth it. Not addressed. > > 7. A typo: s/slotname/slot name. "slot name" looks better in user > > visible messages. > > + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", > two_phase: %s", > > > > If we want to follow other parameters then we can even use slot_name. Changed to slot_name. Below part is replies for remained comments: >8. >+else >+{ >+ test_upgrade_from_pre_PG17($old_publisher, $new_publisher, >+ @pg_upgrade_cmd); >+} >Will this ever be tested in current TAP test framework? I mean, will >the TAP test framework allow testing upgrades from one PG version to >another PG version? Yes, the TAP tester allow to do cross-version upgrade. According to src/bin/pg_upgrade/TESTING file: ``` Testing an upgrade from a different PG version is also possible, and provides a more thorough test that pg_upgrade does what it's meant for. ``` Below commands are an example of the test. ``` # test PG9.5 -> patched HEAD $ oldinstall=/home/hayato/older/pg95 make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl' ... # +++ tap check in src/bin/pg_upgrade +++ t/003_upgrade_logical_replication_slots.pl .. ok All tests successful. Files=1, Tests=3, 11 wallclock secs ( 0.03 usr 0.01 sys + 2.78 cusr 1.08 csys = 3.90 CPU) Result: PASS # grep the output and find an evidence that cross-version check was done $ cat tmp_check/log/regress_log_003_upgrade_logical_replication_slots | grep 'check the slot does not exist on new cluster' [05:14:22.322](0.139s) ok 3 - check the slot does not exist on new cluster ``` >9. A nit: Can single quotes around variable names in the comments be >removed just to be consistent? >+ * We also skip decoding in 'fast_forward' mode. This check must be last >+ /* Do we need to process any change in 'fast_forward' mode? */ Removed. Also, based on a comment [2], the upgrade function was renamed to 'binary_upgrade_logical_slot_has_caught_up'. [1]: https://www.postgresql.org/docs/devel/error-message-reporting.html [2]: https://www.postgresql.org/message-id/CAA4eK1%2BYZP3j1H4ChhzSR23k6MPryW-cgGstyvqbek2CMJoHRA%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
pgsql-hackers by date: