Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)
Date
Msg-id CAA4eK1LbeRft-P-zZK-uDZKYdTzr+X+r0qFb-S6Ngrd_qTOwMg@mail.gmail.com
Whole thread Raw
In response to RE: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: [PoC] pg_upgrade: allow to upgrade publisher node
RE: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Based on comments, I revised my patch. PSA the file.
>
> >
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream. In this new API,
> > > we will start with each slot's restart_lsn location and try to process
> > > till the end of WAL, if we encounter any WAL that needs to be
> > > processed (like we need to send the decoded WAL downstream) we can
> > > return a false indicating that there is an unexpected WAL. The reason
> > > to start with restart_lsn is that it is the location that we use to
> > > start scanning the WAL anyway.
>
> I implemented this by using decoding context. The binary upgrade function
> processes WALs from the confirmed_flush, and returns false if some meaningful
> changes are found.
>
> Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
> If the decoding context is in the mode, the output plugin is not loaded, but
> any WALs are decoded without skipping.
>

I think it may be okay not to load the output plugin as we are not
going to process any record in this case but is that the only reason
or you have something else in mind as well?

> Also, a new flag "did_process" is also
> added. This flag is set if wrappers for output plugin callbacks are called during
> the silent mode.
>

Isn't it sufficient to add a test for silent mode in
begin/stream_start/begin_prepare kind of APIs and set
ctx->did_process? In all other APIs, we can assert that did_process
shouldn't be set and we never reach there when decoding mode is
silent.

> The upgrading function checks both reorder buffer and the new
> flag because both (non-)transactional changes should be detected. If we only
> check reorder buffer, we miss the non-transactional one.
>

+ /* Check whether the meaningful change was found */
+ found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
+ ctx->did_process);

Are you talking about this check in the patch? If so, can you please
explain when does the first check help?

> fast_forward was changed as a variant of decoding mode.
>
> Currently the function is called for all the valid slot. If the approach seems
> good, we can refactor like Bharath said [1].
>
> >
> > > Then, we should also try to create slots before invoking pg_resetwal.
> > > The idea is that we can write a new binary mode function that will do
> > > exactly what pg_resetwal does to compute the next segment and use that
> > > location as a new location (restart_lsn) to create the slots in a new
> > > node. Then, pass it pg_resetwal by using the existing option '-l
> > > walfile'. As we don't have any API that takes restart_lsn as input, we
> > > can write a new API probably for binary mode to create slots that do
> > > take restart_lsn as input. This will ensure that there is no new WAL
> > > inserted by background processes between resetwal and the creation of
> > > slots.
>
> Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
> This function is similar to pg_create_logical_replication_slot(), but the
> restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> filename is returned and it is passed to pg_resetwal command.
>

I am not sure if it is a good idea that a
binary_upgrade_create_logical_replication_slot() API does the logfile
name calculation.

> One consideration is that pg_log_standby_snapshot() must be executed before
> slots consuming changes. New cluster does not have RUNNING_XACTS records so that
> decoding context on new cluster cannot be create a consistent snapshot as-is.
> This may lead to discard changes during the upcoming consuming event. To
> prevent it the function is called after the final pg_resetwal.
>
> How do you think?
>

+ /*
+ * Also, we mu execute pg_log_standby_snapshot() when logical replication
+ * slots are migrated. Because RUNNING_XACTS record is required to create
+ * a consistent snapshot.
+ */
+ if (count_old_cluster_logical_slots())
+ create_consistent_snapshot();

We shouldn't do this separately. Instead
binary_upgrade_create_logical_replication_slot() should ensure that
corresponding WAL is reserved similar to what we do in
ReplicationSlotReserveWal() and then similarly invoke
LogStandbySnapshot() to ensure that we have enough information to
start.

Few minor comments:
==================
1. The commit message and other comments like atop
get_old_cluster_logical_slot_infos() needs to be adjusted as per
recent changes.
2.
@@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
  LogicalErrorCallbackState state;
  ErrorContextCallback errcallback;

- Assert(!ctx->fast_forward);
+ /*
+ * In silent mode all the two-phase callbacks are not set so that the
+ * wrapper should not be called.
+ */
+ Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);

This and other similar comments doesn't seems to be consistent as the
function name and comments are not matching.

With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: remaining sql/json patches
Next
From: Bohdan Mart
Date:
Subject: Re: Where can I find the doxyfile?