Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) |
Date | |
Msg-id | CALDaNm2RRpqfif3EqZ91uY-sPeOwEjYhAYxxuTyUodA2kHaDug@mail.gmail.com Whole thread Raw |
In response to |
RE: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) |
Responses |
RE: [PoC] pg_upgrade: allow to upgrade publisher node
|
List | pgsql-hackers |
On Fri, 6 Oct 2023 at 18:30, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > Based on comments, I revised my patch. PSA the file. > > > > > > Today, I discussed this problem with Andres at PGConf NYC and he > > > suggested as following. To verify, if there is any pending unexpected > > > WAL after shutdown, we can have an API like > > > pg_logical_replication_slot_advance() which will simply process > > > records without actually sending anything downstream. In this new API, > > > we will start with each slot's restart_lsn location and try to process > > > till the end of WAL, if we encounter any WAL that needs to be > > > processed (like we need to send the decoded WAL downstream) we can > > > return a false indicating that there is an unexpected WAL. The reason > > > to start with restart_lsn is that it is the location that we use to > > > start scanning the WAL anyway. > > I implemented this by using decoding context. The binary upgrade function > processes WALs from the confirmed_flush, and returns false if some meaningful > changes are found. > > Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it. > If the decoding context is in the mode, the output plugin is not loaded, but > any WALs are decoded without skipping. Also, a new flag "did_process" is also > added. This flag is set if wrappers for output plugin callbacks are called during > the silent mode. The upgrading function checks both reorder buffer and the new > flag because both (non-)transactional changes should be detected. If we only > check reorder buffer, we miss the non-transactional one. > > fast_forward was changed as a variant of decoding mode. > > Currently the function is called for all the valid slot. If the approach seems > good, we can refactor like Bharath said [1]. > > > > > > Then, we should also try to create slots before invoking pg_resetwal. > > > The idea is that we can write a new binary mode function that will do > > > exactly what pg_resetwal does to compute the next segment and use that > > > location as a new location (restart_lsn) to create the slots in a new > > > node. Then, pass it pg_resetwal by using the existing option '-l > > > walfile'. As we don't have any API that takes restart_lsn as input, we > > > can write a new API probably for binary mode to create slots that do > > > take restart_lsn as input. This will ensure that there is no new WAL > > > inserted by background processes between resetwal and the creation of > > > slots. > > Based on that, I added another binary function binary_upgrade_create_logical_replication_slot(). > This function is similar to pg_create_logical_replication_slot(), but the > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed > filename is returned and it is passed to pg_resetwal command. > > One consideration is that pg_log_standby_snapshot() must be executed before > slots consuming changes. New cluster does not have RUNNING_XACTS records so that > decoding context on new cluster cannot be create a consistent snapshot as-is. > This may lead to discard changes during the upcoming consuming event. To > prevent it the function is called after the final pg_resetwal. Few comments: 1) Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for this funcion too: +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS) +{ + Name name = PG_GETARG_NAME(0); + Name plugin = PG_GETARG_NAME(1); + + /* Temporary slots is never handled in this function */ + bool two_phase = PG_GETARG_BOOL(2); 2) Generally we are specifying the slot name in this case, is slot name null check required: +Datum +binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS) +{ + Name slot_name; + XLogRecPtr end_of_wal; + LogicalDecodingContext *ctx = NULL; + bool has_record; + + CHECK_IS_BINARY_UPGRADE; + + /* Quick exit if the input is NULL */ + if (PG_ARGISNULL(0)) + PG_RETURN_BOOL(false); 3) Since this is similar to pg_create_logical_replication_slot, can we add a comment saying any change in pg_create_logical_replication_slot would also need the same check to be added in binary_upgrade_create_logical_replication_slot: +/* + * SQL function for creating a new logical replication slot. + * + * This function is almost same as pg_create_logical_replication_slot(), but + * this can specify the restart_lsn. + */ +Datum +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS) +{ + Name name = PG_GETARG_NAME(0); + Name plugin = PG_GETARG_NAME(1); + + /* Temporary slots is never handled in this function */ 4) Any conclusion on this try catch comment, do you want to add which setting you want to revert in catch, if try/catch is not required we can remove this comment: + ReplicationSlotAcquire(NameStr(*slot_name), true); + + /* XXX: Is PG_TRY/CATCH needed around here? */ + + /* + * We use silent mode here to decode all changes without outputting them, + * allowing us to detect all the records that could be sent downstream. + */ 5) I felt these 2 comments can be combined as both are trying to say the same thing: + * This is a special purpose function to ensure that there are no WAL records + * pending to be decoded after the given LSN. + * + * It is used to ensure that there is no pending WAL to be consumed for + * the logical slots. 6) I feel this memset is not required as we are initializing at the beginning of function, if you want to keep the memset, the initialization can be removed: + values[2] = CStringGetTextDatum(xlogfilename); + + memset(nulls, 0, sizeof(nulls)); + + tuple = heap_form_tuple(tupdesc, values, nulls); 7) looks like a typo, "mu" should be "must": + /* + * Also, we mu execute pg_log_standby_snapshot() when logical replication + * slots are migrated. Because RUNNING_XACTS record is required to create + * a consistent snapshot. + */ + if (count_old_cluster_logical_slots()) + create_consistent_snapshot(); 8) consitent should be consistent: +/* + * Log the details of the current snapshot to the WAL, allowing the snapshot + * state to be reconstructed for logical decoding on the upgraded slots. + */ +static void +create_consistent_snapshot(void) +{ + DbInfo *old_db = &old_cluster.dbarr.dbs[0]; + PGconn *conn; + + prep_status("Creating a consitent snapshot on new cluster"); Regards, Vignesh
pgsql-hackers by date: