RE: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: Slow catchup of 2PC (twophase) transactions on replica in LR |
Date | |
Msg-id | OSBPR01MB2552D3618A0BAB99ECDDD589F5E62@OSBPR01MB2552.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Slow catchup of 2PC (twophase) transactions on replica in LR (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
Dear Peter, > ====== > Commit message > > 1. > IIUC there is quite a lot of subtlety and details about why the slot > option needs to be changed only when altering "true" to "false", but > not when altering "false" to "true". > > It also should explain why PreventInTransactionBlock is only needed > when altering two_phase "true" to "false". > > Please include a commit message to describe all those tricky details. Added. > ====== > src/backend/commands/subscriptioncmds.c > > 2. AlterSubscription > > - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET > (two_phase)"); > + if (!opts.twophase) > + PreventInTransactionBlock(isTopLevel, > + "ALTER SUBSCRIPTION ... SET (two_phase = off)"); > > IMO this needs a comment to explain why PreventInTransactionBlock is > only needed when changing the 'two_phase' option from on to off. Added. Thoutht? > 3. AlterSubscription > > /* > * Try to acquire the connection necessary for altering slot. > * > * This has to be at the end because otherwise if there is an error while > * doing the database operations we won't be able to rollback altered > * slot. > */ > if (replaces[Anum_pg_subscription_subfailover - 1] || > replaces[Anum_pg_subscription_subtwophasestate - 1]) > { > bool must_use_password; > char *err; > WalReceiverConn *wrconn; > bool failover_needs_to_be_updated; > bool two_phase_needs_to_be_updated; > > /* Load the library providing us libpq calls. */ > load_file("libpqwalreceiver", false); > > /* Try to connect to the publisher. */ > must_use_password = sub->passwordrequired && !sub->ownersuperuser; > wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password, > sub->name, &err); > if (!wrconn) > ereport(ERROR, > (errcode(ERRCODE_CONNECTION_FAILURE), > errmsg("could not connect to the publisher: %s", err))); > > /* > * Consider which slot option must be altered. > * > * We must alter the failover option whenever subfailover is updated. > * Two_phase, however, is altered only when changing true to false. > */ > failover_needs_to_be_updated = > replaces[Anum_pg_subscription_subfailover - 1]; > two_phase_needs_to_be_updated = > (replaces[Anum_pg_subscription_subtwophasestate - 1] && > !opts.twophase); > > PG_TRY(); > { > if (two_phase_needs_to_be_updated || failover_needs_to_be_updated) > walrcv_alter_slot(wrconn, sub->slotname, > failover_needs_to_be_updated ? &opts.failover : NULL, > two_phase_needs_to_be_updated ? &opts.twophase : NULL); > } > PG_FINALLY(); > { > walrcv_disconnect(wrconn); > } > PG_END_TRY(); > } > > 3a. > The block comment "Consider which slot option must be altered..." says > WHEN those options need to be updated, but it doesn't say WHY. e.g. > why only update the 'two_phase' when it is being disabled but not when > it is being enabled? In other words, I think there needs to be more > background/reason details given in this comment. > > ~~~ > > 3b. > Can't those 2 new variable assignments be done up-front and guard this > entire "if-block" instead of the current replaces[] guarding it? Then > the code is somewhat simplified. > > SUGGESTION: > /* > * <improved comment here to explain these variables> > */ > update_failover = replaces[Anum_pg_subscription_subfailover - 1]; > update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - > 1] && !opts.twophase); > > /* > * Try to acquire the connection necessary for altering slot. > * > * This has to be at the end because otherwise if there is an error while > * doing the database operations we won't be able to rollback altered > * slot. > */ > if (update_failover || update_two_phase) > { > ... > > /* Load the library providing us libpq calls. */ > load_file("libpqwalreceiver", false); > > /* Try to connect to the publisher. */ > must_use_password = sub->passwordrequired && !sub->ownersuperuser; > wrconn = walrcv_connect(sub->conninfo, true, true, > must_use_password, sub->name, &err); > if (!wrconn) > ereport(ERROR, ...); > > PG_TRY(); > { > walrcv_alter_slot(wrconn, sub->slotname, > update_failover ? &opts.failover : NULL, > update_two_phase ? &opts.twophase : NULL); > } > PG_FINALLY(); > { > walrcv_disconnect(wrconn); > } > PG_END_TRY(); > } Two variables were added and comments were updated. > .../libpqwalreceiver/libpqwalreceiver.c > > 4. > + appendStringInfo(&cmd, "ALTER_REPLICATION_SLOT %s ( ", > + quote_identifier(slotname)); > + > + if (failover) > + appendStringInfo(&cmd, "FAILOVER %s ", > + (*failover) ? "true" : "false"); > + > + if (two_phase) > + appendStringInfo(&cmd, "TWO_PHASE %s%s ", > + (*two_phase) ? "true" : "false", > + failover ? ", " : ""); > + > + appendStringInfoString(&cmd, ");"); > > 4a. > IIUC the comma logic here was broken in v7 when you swapped the order. > Anyway, IMO it will be better NOT to try combining that comma logic > with the existing appendStringInfo. Doing it separately is both easier > and less error-prone. > > Furthermore, the parentheses like "(*two_phase)" instead of just > "*two_phase" seemed a bit overkill. > > SUGGESTION: > + if (failover) > + appendStringInfo(&cmd, "FAILOVER %s", > + *failover ? "true" : "false"); > + > + if (failover && two_phase) > + appendStringInfo(&cmd, ", "); > + > + if (two_phase) > + appendStringInfo(&cmd, "TWO_PHASE %s", > + *two_phase ? "true" : "false"); > + > + appendStringInfoString(&cmd, " );"); Fixed. > 4b. > Like I said above, IMO the current separator logic in v7 is broken. So > it is a bit concerning the tests all passed anyway. How did that > happen? I think this indicates that there needs to be an additional > test scenario where both 'failover' and 'two_phase' get altered at the > same time so this code gets exercised properly. Right, it was added. > ====== > src/test/subscription/t/099_twophase_added.pl > > 5. > +# Define pre-existing tables on both nodes > > Why say they are "pre-existing"? They are not pre-existing because you > are creating them right here! Removed the word. > 6. > +###### > +# Check the case that prepared transactions exist on publisher node > +###### > > I think this needs a slightly more detailed comment. > > SUGGESTION (this is just an example, but you can surely improve it) > > # Check the case that prepared transactions exist on the publisher node. > # > # Since two_phase is "off", then normally this PREPARE will do nothing until > # the COMMIT PREPARED, but in this test, we toggle the two_phase to "on" again > # before the COMMIT PREPARED happens. Changed with adjustments. > 7. > Maybe this test case needs a few more one-line comments for each of > the sub-steps. e.g.: > > # prepare a transaction to insert some rows to the table > > # verify the prepared tx is not yet replicated to the subscriber > (because 'two_phase = off') > > # toggle the two_phase to 'on' *before* the COMMIT PREPARED > > # verify the inserted rows got replicated ok Modified like yours, but changed based on the suggestion by Grammarly. > 8. > IIUC this test will behave the same even if you DON'T do the toggle > 'two_phase = on'. So I wonder is there something more you can do to > test this scenario more convincingly? I found an indicator. When the apply starts, it outputs the current status of two_phase option. I added wait_for_log() to ensure below appeared. Thought? ``` ereport(DEBUG1, (errmsg_internal("logical replication apply worker for subscription \"%s\" two_phase is %s", MySubscription->name, MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_DISABLED ? "DISABLED" : MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING ? "PENDING" : MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED ? "ENABLED" : "?"))); ``` Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
pgsql-hackers by date: