RE: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | TYAPR01MB5866A112A644DDF3D2E0D3E4F5E3A@TYAPR01MB5866.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
Dear Peter, Thanks for reviewing! New patch could be available in [1]. > 1. check_for_lost_slots > > + > +/* > + * Verify that all logical replication slots are usable. > + */ > +void > +check_for_lost_slots(ClusterInfo *cluster) > > 1a. > AFAIK we don't ever need to call this also for 'new_cluster'. So the > function should have no parameter and just access 'old_cluster' > directly. Actually I have asked in previous post, and I understood you like the style. Fixed. Also, get_logical_slot_infos() and count_logical_slots() are also called only for old_cluster, then removed the argument. > 1b. > Can't this be a static function now? Yeah, changed to static. > 2. > + for (i = 0; i < ntups; i++) > + pg_log(PG_WARNING, > + "\nWARNING: logical replication slot \"%s\" is in 'lost' state.", > + PQgetvalue(res, i, i_slotname)); > > Is it correct that this message also includes the word "WARNING"? > Other PG_WARNING messages don't do that. create_script_for_old_cluster_deletion() has the word and I followed that: ``` pg_log(PG_WARNING, "\nWARNING: new data directory should not be inside the old data directory, i.e. %s", old_cluster_pgdata); ``` > 3. check_for_confirmed_flush_lsn > > +/* > + * Verify that all logical replication slots consumed all WALs, except a > + * CHECKPOINT_SHUTDOWN record. > + */ > +static void > +check_for_confirmed_flush_lsn(ClusterInfo *cluster) > > AFAIK we don't ever need to call this also for 'new_cluster'. So the > function should have no parameter and just access 'old_cluster' > directly. Removed. > 4. > + for (i = 0; i < ntups; i++) > + pg_log(PG_WARNING, > + "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet", > + PQgetvalue(res, i, i_slotname)); > > Is it correct that this message also includes the word "WARNING"? > Other PG_WARNING messages don't do that. See above reply, create_script_for_old_cluster_deletion() has that. > src/bin/pg_upgrade/controldata.c > > 5. get_control_data > > + else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL) > + { > + /* > + * Gather the latest checkpoint location if the cluster is PG17 > + * or later. This is used for upgrading logical replication > + * slots. > + */ > + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) > > But we are not "gathering" anything. It's just one LSN. I think this > ought to just say "Read the latest..." Changed. > 6. > + /* > + * The upper and lower part of LSN must be read separately > + * because it is reported in %X/%X format. > + */ > > /reported/stored as/ Changed. > src/bin/pg_upgrade/pg_upgrade.h > > 7. > +void check_for_lost_slots(ClusterInfo *cluster);\ > > Why is this needed here? Can't this be a static function? Removed. > .../t/003_logical_replication_slots.pl > > 8. > +# 2. Consume WAL records to avoid another type of upgrade failure. It will be > +# tested in subsequent cases. > +$old_publisher->safe_psql('postgres', > + "SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, > NULL);" > +); > > I wondered if that step really needed. Why will there be WAL records to consume? > > IIUC we haven't published anything yet. The primal reason was described in [2], the reply for comment 10. After creating 'test_slot1', another 'test_slot2' is also created, and the function generates the RUNNING_XLOG record. The backtrace is as follows: pg_create_logical_replication_slot create_logical_replication_slot CreateInitDecodingContext ReplicationSlotReserveWal LogStandbySnapshot LogCurrentRunningXacts XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS); check_for_confirmed_flush_lsn() detects the record and raises FATAL error before checking GUC on new cluster. > 9. > +# ------------------------------ > +# TEST: Successful upgrade > + > +# Preparations for the subsequent test: > +# 1. Remove the remained slot > +$old_publisher->start; > +$old_publisher->safe_psql('postgres', > + "SELECT * FROM pg_drop_replication_slot('test_slot1');" > +); > > Should removal of the slot be done as part of the cleanup of the > previous test, instead of preparing for this one? Moved to cleanup part. > 10. > # 3. Disable the subscription once > $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE"); > $old_publisher->stop; > > 10a. > What do you mean by "once"? I added the word because the subscription would be enabled again. But after considering more, I thought "Temporarily" seems better. Fixed. > 10b. > That old_publisher->stop; seems strangely placed. Why is it here? We must shut down the cluster before doing pg_upgrade. Isn't it same as line 124? ``` # 2. Generate extra WAL records. Because these WAL records do not get consumed # it will cause the upcoming pg_upgrade test to fail. $old_publisher->safe_psql('postgres', "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;" ); $old_publisher->stop; ``` > 11. > # Check that the slot 'test_slot1' has migrated to the new cluster > $new_publisher->start; > my $result = $new_publisher->safe_psql('postgres', > "SELECT slot_name, two_phase FROM pg_replication_slots"); > is($result, qq(sub|t), 'check the slot exists on new cluster'); > > ~ > > That comment now seems wrong. That slot was previously removed, right? Yeah, it should be 'sub'. Changed. > 12. > # Update the connection > my $new_connstr = $new_publisher->connstr . ' dbname=postgres'; > $subscriber->safe_psql('postgres', > "ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'"); > $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE"); > > ~ > > Maybe better to combine both SQL. Combined. [1]: https://www.postgresql.org/message-id/TYAPR01MB5866D7677BAE6F66839570FCF5E3A%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/TYAPR01MB58668021BB233D129B466122F51CA%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: