RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date
Msg-id TYAPR01MB586644F209335F2B35A01243F5DFA@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
Dear Bharath, Amit,

Thanks for reviewing! PSA new version.
I addressed comments which have not been claimed.

> On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Thank you for reviewing! PSA new version.
> >
> > > > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > > > other instead of just a plain name processing_required?
> > > > +    /* Do we need to process any change in 'fast_forward' mode? */
> > > > +    bool        processing_required;
> > >
> > > I preferred current one. Because not only decodable txn, non-txn change and
> > > empty transactions also be processed.
> >
> > Right. It's not the txn, but the change. processing_required seems too
> > generic IMV. A nit: is_change_decodable or something?
> >
> 
> If we don't want to keep it generic then we should use something like
> 'contains_decodable_change'. 'is_change_decodable' could have suited
> here if we were checking a particular change.

I kept the name for now. How does Bharath think?

> > Thanks for the patch. Here are few comments on v56 patch:
> >
> > 1.
> > + *
> > + * Although this function is currently used only during pg_upgrade, there are
> > + * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
> >
> > This comment isn't required IMV, because anyone looking at the code
> > and callsites can understand it.

Removed.

> > 2. A nit: IMV "This is a special purpose ..." statement seems redundant.
> > + *
> > + * This is a special purpose function to ensure that the given slot can be
> > + * upgraded without data loss.
> >
> > How about
> >
> > Verify that the given replication slot has consumed all the WAL changes.
> > If there's any decodable WAL record after the slot's
> > confirmed_flush_lsn, the slot's consumer will lose that data after the
> > slot is upgraded.
> > Returns true if there are no decodable WAL records after the
> > confirmed_flush_lsn. Otherwise false.
> >
> 
> Personally, I find the current comment succinct and clear.

I kept current one.

> > 3.
> > +    if (PG_ARGISNULL(0))
> > +        elog(ERROR, "null argument to
> > binary_upgrade_validate_wal_records is not allowed");
> >
> > I can see the above style is referenced from
> > binary_upgrade_create_empty_extension, but IMV the following looks
> > better and latest (ereport is new style than elog)
> >
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("replication slot name cannot be null")));
> >
> 
> Do you have any theory for making elog to ereport? I am not completely
> sure but as this and related function is used internally, so using
> elog seems reasonable. Also, I find keeping it consistent with the
> existing error message is also reasonable. We can change both later
> together if we get a broader agreement.

I kept current style. elog() was used here because I regarded it as
"cannot happen" error. According to the doc [1], elog() is still used
for the purpose.

> > 4. The following comment seems frivolous, the code tells it all.
> > Please remove the comment.
> > +
> > +                /* No need to check this slot, seek to new one */
> > +                continue;

Removed.

> > 5. A typo - s/gets/Gets
> > + * gets the LogicalSlotInfos for all the logical replication slots of the

Replaced.

> > 6. An optimization in count_old_cluster_logical_slots(void): Turn
> > slot_count to a function static variable so that the for loop isn't
> > required every time because the slot count is prepared in
> > get_old_cluster_logical_slot_infos only once and won't change later
> > on. Do you see any problem with the following? This saves a few CPU
> > cycles when there are large number of replication slots.
> > {
> >     static int slot_count = 0;
> >     static bool first_time = true;
> >
> >     if (first_time)
> >     {
> >         for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> >             slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> >
> >         first_time = false;
> >     }
> >
> >     return slot_count;
> > }
> >
> 
> This may not be a problem but this is also not a function that will be
> used frequently. I am not sure if adding such code optimizations is
> worth it.

Not addressed.

> > 7. A typo: s/slotname/slot name. "slot name" looks better in user
> > visible messages.
> > +        pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
> two_phase: %s",
> >
> 
> If we want to follow other parameters then we can even use slot_name.

Changed to slot_name.

Below part is replies for remained comments:

>8.
>+else
>+{
>+    test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
>+        @pg_upgrade_cmd);
>+}
>Will this ever be tested in current TAP test framework? I mean, will
>the TAP test framework allow testing upgrades from one PG version to
>another PG version?

Yes, the TAP tester allow to do cross-version upgrade. According to
src/bin/pg_upgrade/TESTING file:

```
Testing an upgrade from a different PG version is also possible, and
provides a more thorough test that pg_upgrade does what it's meant for.
```

Below commands are an example of the test.

```
# test PG9.5 -> patched HEAD
$ oldinstall=/home/hayato/older/pg95 make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. ok   
All tests successful.
Files=1, Tests=3, 11 wallclock secs ( 0.03 usr  0.01 sys +  2.78 cusr  1.08 csys =  3.90 CPU)
Result: PASS

# grep the output and find an evidence that cross-version check was done
$ cat tmp_check/log/regress_log_003_upgrade_logical_replication_slots | grep 'check the slot does not exist on new
cluster'
[05:14:22.322](0.139s) ok 3 - check the slot does not exist on new cluster

```

>9. A nit: Can single quotes around variable names in the comments be
>removed just to be consistent?
>+     * We also skip decoding in 'fast_forward' mode. This check must be last
>+    /* Do we need to process any change in 'fast_forward' mode? */

Removed.

Also, based on a comment [2], the upgrade function was renamed to 
'binary_upgrade_logical_slot_has_caught_up'.

[1]: https://www.postgresql.org/docs/devel/error-message-reporting.html
[2]: https://www.postgresql.org/message-id/CAA4eK1%2BYZP3j1H4ChhzSR23k6MPryW-cgGstyvqbek2CMJoHRA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Next
From: Junwang Zhao
Date:
Subject: Re: Add trailing commas to enum definitions