Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Single transaction in the tablesync worker? |
Date | |
Msg-id | CAA4eK1JhpuwujrV6ABMmZ3jXfW37ssZnJ3fikrY7rRdvoEmu_g@mail.gmail.com Whole thread Raw |
In response to | Re: Single transaction in the tablesync worker? (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker? Re: Single transaction in the tablesync worker? Re: Single transaction in the tablesync worker? |
List | pgsql-hackers |
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA the v18 patch for the Tablesync Solution1. > Few comments: ============= 1. - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * So the state progression is always: INIT -> DATASYNC -> + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY. I don't think we need to be specific here that sync worker sets FINISHEDCOPY state. 2. @@ -98,11 +102,16 @@ #include "miscadmin.h" #include "parser/parse_relation.h" #include "pgstat.h" +#include "postmaster/interrupt.h" #include "replication/logicallauncher.h" #include "replication/logicalrelation.h" +#include "replication/logicalworker.h" #include "replication/walreceiver.h" #include "replication/worker_internal.h" +#include "replication/slot.h" I don't think the above includes are required. They seem to the remnant of the previous approach. 3. process_syncing_tables_for_sync(XLogRecPtr current_lsn) { - Assert(IsTransactionState()); + bool sync_done = false; SpinLockAcquire(&MyLogicalRepWorker->relmutex); + sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && + current_lsn >= MyLogicalRepWorker->relstate_lsn; + SpinLockRelease(&MyLogicalRepWorker->relmutex); - if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && - current_lsn >= MyLogicalRepWorker->relstate_lsn) + if (sync_done) { TimeLineID tli; + /* + * Change state to SYNCDONE. + */ + SpinLockAcquire(&MyLogicalRepWorker->relmutex); Why do we need these changes? If you have done it for the code-readability purpose then we can consider this as a separate patch because I don't see why these are required w.r.t this patch. 4. - /* - * To build a slot name for the sync work, we are limited to NAMEDATALEN - - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the - * NAMEDATALEN on the remote that matters, but this scheme will also work - * reasonably if that is different.) - */ - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ - slotname = psprintf("%.*s_%u_sync_%u", - NAMEDATALEN - 28, - MySubscription->slotname, - MySubscription->oid, - MyLogicalRepWorker->relid); + /* Calculate the name of the tablesync slot. */ + slotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid); What is the reason for changing the slot name calculation? If there is any particular reasons, then we can add a comment to indicate why we can't include the subscription's slotname in this calculation. 5. This is WAL + * logged for for the purpose of recovery. Locks are to prevent the + * replication origin from vanishing while advancing. /for for/for 6. + /* Remove the tablesync's origin tracking if exists. */ + snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid); + originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + { + elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname); I don't think we need this and the DEBUG1 message in AlterSubscription_refresh. IT is fine to print this information for background workers like in apply-worker but not sure if need it here. The DropSubscription drops the origin of apply worker but it doesn't use such a DEBUG message so I guess we don't it for tablesync origins as well. 7. Have you tested with the new patch the scenario where we crash after FINISHEDCOPY and before SYNCDONE, is it able to pick up the replication using the new temporary slot? Here, we need to test the case where during the catchup phase we have received few commits and then the tablesync worker is crashed/errored out? Basically, check if the replication is continued from the same point? I understand that this can be only tested by adding some logs and we might not be able to write a test for it. -- With Regards, Amit Kapila.
pgsql-hackers by date: