Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Slow catchup of 2PC (twophase) transactions on replica in LR |
Date | |
Msg-id | CAHut+Ptu_w_UCGR-5DbenA+y7wRiA8QPi_ZP=CCJ3SGdTn-==g@mail.gmail.com Whole thread Raw |
In response to | RE: Slow catchup of 2PC (twophase) transactions on replica in LR ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
|
List | pgsql-hackers |
Hi, Here are some review comments for v7-0002 ====== Commit message 1. IIUC there is quite a lot of subtlety and details about why the slot option needs to be changed only when altering "true" to "false", but not when altering "false" to "true". It also should explain why PreventInTransactionBlock is only needed when altering two_phase "true" to "false". Please include a commit message to describe all those tricky details. ====== src/backend/commands/subscriptioncmds.c 2. AlterSubscription - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase)"); + if (!opts.twophase) + PreventInTransactionBlock(isTopLevel, + "ALTER SUBSCRIPTION ... SET (two_phase = off)"); IMO this needs a comment to explain why PreventInTransactionBlock is only needed when changing the 'two_phase' option from on to off. ~~~ 3. AlterSubscription /* * Try to acquire the connection necessary for altering slot. * * This has to be at the end because otherwise if there is an error while * doing the database operations we won't be able to rollback altered * slot. */ if (replaces[Anum_pg_subscription_subfailover - 1] || replaces[Anum_pg_subscription_subtwophasestate - 1]) { bool must_use_password; char *err; WalReceiverConn *wrconn; bool failover_needs_to_be_updated; bool two_phase_needs_to_be_updated; /* Load the library providing us libpq calls. */ load_file("libpqwalreceiver", false); /* Try to connect to the publisher. */ must_use_password = sub->passwordrequired && !sub->ownersuperuser; wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password, sub->name, &err); if (!wrconn) ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("could not connect to the publisher: %s", err))); /* * Consider which slot option must be altered. * * We must alter the failover option whenever subfailover is updated. * Two_phase, however, is altered only when changing true to false. */ failover_needs_to_be_updated = replaces[Anum_pg_subscription_subfailover - 1]; two_phase_needs_to_be_updated = (replaces[Anum_pg_subscription_subtwophasestate - 1] && !opts.twophase); PG_TRY(); { if (two_phase_needs_to_be_updated || failover_needs_to_be_updated) walrcv_alter_slot(wrconn, sub->slotname, failover_needs_to_be_updated ? &opts.failover : NULL, two_phase_needs_to_be_updated ? &opts.twophase : NULL); } PG_FINALLY(); { walrcv_disconnect(wrconn); } PG_END_TRY(); } 3a. The block comment "Consider which slot option must be altered..." says WHEN those options need to be updated, but it doesn't say WHY. e.g. why only update the 'two_phase' when it is being disabled but not when it is being enabled? In other words, I think there needs to be more background/reason details given in this comment. ~~~ 3b. Can't those 2 new variable assignments be done up-front and guard this entire "if-block" instead of the current replaces[] guarding it? Then the code is somewhat simplified. SUGGESTION: /* * <improved comment here to explain these variables> */ update_failover = replaces[Anum_pg_subscription_subfailover - 1]; update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] && !opts.twophase); /* * Try to acquire the connection necessary for altering slot. * * This has to be at the end because otherwise if there is an error while * doing the database operations we won't be able to rollback altered * slot. */ if (update_failover || update_two_phase) { ... /* Load the library providing us libpq calls. */ load_file("libpqwalreceiver", false); /* Try to connect to the publisher. */ must_use_password = sub->passwordrequired && !sub->ownersuperuser; wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password, sub->name, &err); if (!wrconn) ereport(ERROR, ...); PG_TRY(); { walrcv_alter_slot(wrconn, sub->slotname, update_failover ? &opts.failover : NULL, update_two_phase ? &opts.twophase : NULL); } PG_FINALLY(); { walrcv_disconnect(wrconn); } PG_END_TRY(); } ====== .../libpqwalreceiver/libpqwalreceiver.c 4. + appendStringInfo(&cmd, "ALTER_REPLICATION_SLOT %s ( ", + quote_identifier(slotname)); + + if (failover) + appendStringInfo(&cmd, "FAILOVER %s ", + (*failover) ? "true" : "false"); + + if (two_phase) + appendStringInfo(&cmd, "TWO_PHASE %s%s ", + (*two_phase) ? "true" : "false", + failover ? ", " : ""); + + appendStringInfoString(&cmd, ");"); 4a. IIUC the comma logic here was broken in v7 when you swapped the order. Anyway, IMO it will be better NOT to try combining that comma logic with the existing appendStringInfo. Doing it separately is both easier and less error-prone. Furthermore, the parentheses like "(*two_phase)" instead of just "*two_phase" seemed a bit overkill. SUGGESTION: + if (failover) + appendStringInfo(&cmd, "FAILOVER %s", + *failover ? "true" : "false"); + + if (failover && two_phase) + appendStringInfo(&cmd, ", "); + + if (two_phase) + appendStringInfo(&cmd, "TWO_PHASE %s", + *two_phase ? "true" : "false"); + + appendStringInfoString(&cmd, " );"); ~~ 4b. Like I said above, IMO the current separator logic in v7 is broken. So it is a bit concerning the tests all passed anyway. How did that happen? I think this indicates that there needs to be an additional test scenario where both 'failover' and 'two_phase' get altered at the same time so this code gets exercised properly. ====== src/test/subscription/t/099_twophase_added.pl 5. +# Define pre-existing tables on both nodes Why say they are "pre-existing"? They are not pre-existing because you are creating them right here! ~~~ 6. +###### +# Check the case that prepared transactions exist on publisher node +###### I think this needs a slightly more detailed comment. SUGGESTION (this is just an example, but you can surely improve it) # Check the case that prepared transactions exist on the publisher node. # # Since two_phase is "off", then normally this PREPARE will do nothing until # the COMMIT PREPARED, but in this test, we toggle the two_phase to "on" again # before the COMMIT PREPARED happens. ~~~ 7. Maybe this test case needs a few more one-line comments for each of the sub-steps. e.g.: # prepare a transaction to insert some rows to the table # verify the prepared tx is not yet replicated to the subscriber (because 'two_phase = off') # toggle the two_phase to 'on' *before* the COMMIT PREPARED # verify the inserted rows got replicated ok ~~~ 8. IIUC this test will behave the same even if you DON'T do the toggle 'two_phase = on'. So I wonder is there something more you can do to test this scenario more convincingly? ====== Kind Regards, Peter Smith Fujitsu Australia
pgsql-hackers by date: