Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) - Mailing list pgsql-hackers

From vignesh C
Subject Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)
Date
Msg-id CALDaNm2RRpqfif3EqZ91uY-sPeOwEjYhAYxxuTyUodA2kHaDug@mail.gmail.com
Whole thread Raw
In response to RE: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
On Fri, 6 Oct 2023 at 18:30, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> Based on comments, I revised my patch. PSA the file.
>
> >
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream. In this new API,
> > > we will start with each slot's restart_lsn location and try to process
> > > till the end of WAL, if we encounter any WAL that needs to be
> > > processed (like we need to send the decoded WAL downstream) we can
> > > return a false indicating that there is an unexpected WAL. The reason
> > > to start with restart_lsn is that it is the location that we use to
> > > start scanning the WAL anyway.
>
> I implemented this by using decoding context. The binary upgrade function
> processes WALs from the confirmed_flush, and returns false if some meaningful
> changes are found.
>
> Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
> If the decoding context is in the mode, the output plugin is not loaded, but
> any WALs are decoded without skipping. Also, a new flag "did_process" is also
> added. This flag is set if wrappers for output plugin callbacks are called during
> the silent mode. The upgrading function checks both reorder buffer and the new
> flag because both (non-)transactional changes should be detected. If we only
> check reorder buffer, we miss the non-transactional one.
>
> fast_forward was changed as a variant of decoding mode.
>
> Currently the function is called for all the valid slot. If the approach seems
> good, we can refactor like Bharath said [1].
>
> >
> > > Then, we should also try to create slots before invoking pg_resetwal.
> > > The idea is that we can write a new binary mode function that will do
> > > exactly what pg_resetwal does to compute the next segment and use that
> > > location as a new location (restart_lsn) to create the slots in a new
> > > node. Then, pass it pg_resetwal by using the existing option '-l
> > > walfile'. As we don't have any API that takes restart_lsn as input, we
> > > can write a new API probably for binary mode to create slots that do
> > > take restart_lsn as input. This will ensure that there is no new WAL
> > > inserted by background processes between resetwal and the creation of
> > > slots.
>
> Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
> This function is similar to pg_create_logical_replication_slot(), but the
> restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> filename is returned and it is passed to pg_resetwal command.
>
> One consideration is that pg_log_standby_snapshot() must be executed before
> slots consuming changes. New cluster does not have RUNNING_XACTS records so that
> decoding context on new cluster cannot be create a consistent snapshot as-is.
> This may lead to discard changes during the upcoming consuming event. To
> prevent it the function is called after the final pg_resetwal.

Few comments:
1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
this funcion too:
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */
+       bool            two_phase = PG_GETARG_BOOL(2);

2) Generally we are specifying the slot name in this case, is slot
name null check required:
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+       Name            slot_name;
+       XLogRecPtr      end_of_wal;
+       LogicalDecodingContext *ctx = NULL;
+       bool            has_record;
+
+       CHECK_IS_BINARY_UPGRADE;
+
+       /* Quick exit if the input is NULL */
+       if (PG_ARGISNULL(0))
+               PG_RETURN_BOOL(false);

3) Since this is similar to pg_create_logical_replication_slot, can we
add a comment saying any change in pg_create_logical_replication_slot
would also need the same check to be added in
binary_upgrade_create_logical_replication_slot:
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */

4) Any conclusion on this try catch comment, do you want to add which
setting you want to revert in catch, if try/catch is not required we
can remove this comment:
+       ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+       /* XXX: Is PG_TRY/CATCH needed around here? */
+
+       /*
+        * We use silent mode here to decode all changes without
outputting them,
+        * allowing us to detect all the records that could be sent downstream.
+        */

5) I felt these 2 comments can be combined as both are trying to say
the same thing:
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.

6) I feel this memset is not required as we are initializing at the
beginning of function, if you want to keep the memset, the
initialization can be removed:
+       values[2] = CStringGetTextDatum(xlogfilename);
+
+       memset(nulls, 0, sizeof(nulls));
+
+       tuple = heap_form_tuple(tupdesc, values, nulls);

7) looks like a typo, "mu" should be "must":
+       /*
+        * Also, we mu execute pg_log_standby_snapshot() when logical
replication
+        * slots are migrated. Because RUNNING_XACTS record is
required to create
+        * a consistent snapshot.
+        */
+       if (count_old_cluster_logical_slots())
+               create_consistent_snapshot();

8) consitent should be consistent:
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+       DbInfo     *old_db = &old_cluster.dbarr.dbs[0];
+       PGconn     *conn;
+
+       prep_status("Creating a consitent snapshot on new cluster");

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Checks in RegisterBackgroundWorker.()
Next
From: Alvaro Herrera
Date:
Subject: Re: Clean up some pg_dump tests