Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAA4eK1LLWo8X8OFLuupW8iCYY+FnMLNBqm4XtKJV_YCuL02rQQ@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Single transaction in the tablesync worker?
List pgsql-hackers
On Sat, Dec 19, 2020 at 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
>
> I understand why you are trying to create this patch atop logical
> decoding of 2PC patch but I think it is better to create this as an
> independent patch and then use it to test 2PC problem. Also, please
> explain what kind of testing you did to ensure that it works properly
> after the table sync worker restarts after the crash.
>

Few other comments:
==================
1.
* FIXME 3 - Crashed tablesync workers may also have remaining slots
because I don't think
+ * such workers are even iterated by this loop, and nobody else is
removing them.
+ */
+ if (slotname)
+ {

The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?

2.
DropSubscription()
{
..
ReplicationSlotDropAtPubNode(
+ NULL,
+ conninfo, /* use conninfo to make a new connection. */
+ subname,
+ syncslotname);
..
}

With the above call, it will form a connection with the publisher and
drop the required slots. I think we need to save the connection info
so that we don't need to connect/disconnect for each slot to be
dropped. Later in this function, we again connect and drop the apply
worker slot. I think we should connect just once drop the apply and
table sync slots if any.

3.
ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char
*conninfo, char *subname, char *slotname)
{
..
+ PG_TRY();
..
+ PG_CATCH();
+ {
+ /* NOP. Just gobble any ERROR. */
+ }
+ PG_END_TRY();

Why are we suppressing the error instead of handling it the error in
the same way as we do while dropping the apply worker slot in
DropSubscription?

4.
@@ -139,6 +141,28 @@ finish_sync_worker(void)
  get_rel_name(MyLogicalRepWorker->relid))));
  CommitTransactionCommand();

+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);

This is not how we export functions at other places?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: \gsetenv
Next
From: Bharath Rupireddy
Date:
Subject: Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs