Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAA4eK1LGxuB_RTfZ2HLJT76wv=FLV6UPqT+FWkiDg61rvQkkmQ@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
Re: Single transaction in the tablesync worker?  (Ajin Cherian <itsajin@gmail.com>)
Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Amit.
>
> PSA the v17 patch for the Tablesync Solution1.
>

Thanks for the updated patch. Below are few comments:
1. Why are we changing the scope of PG_TRY in DropSubscription()?
Also, it might be better to keep the replication slot drop part as it
is.

2.
- *    - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ *    - Tablesync worker does initial table copy; there is a
FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.

In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE?

3.
+void
+tablesync_cleanup_at_interrupt(void)
+{
+ bool drop_slot_needed;
+ char originname[NAMEDATALEN] = {0};
+ RepOriginId originid;
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ elog(DEBUG1,
+ "tablesync_cleanup_at_interrupt for relid = %d",
+ MyLogicalRepWorker->relid);

The function name and message makes it sound like that we drop slot
and origin at any interrupt. Isn't it better to name it as
tablesync_cleanup_at_shutdown()?

4.
+ drop_slot_needed =
+ wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY;
+
+ if (drop_slot_needed)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is missing. */

I think we can avoid using missing_ok and drop_slot_needed variables.

5. Can we drop the origin along with the slot in
process_syncing_tables_for_sync() instead of
process_syncing_tables_for_apply()? I think this is possible because
of the other changes you made in origin.c. Also, if possible, we can
try to use the same code to drop the slot and origin in
tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync.

6.
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */

There seems to be a typo (crashed/etc) in the above comment.

7.
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+    or die "no error stop for the pre-existing origin";

In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
  or die "Timed out while waiting for subscriber to start sync";

Is there a reason why we can't use the existing way to check for
failure in this case?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ResourceOwner refactoring
Next
From: Peter Eisentraut
Date:
Subject: Re: Printing LSN made easy