Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From Melih Mutlu
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAGPVpCT6wG=Eqd2Jc20yO9xNp4S3_XvDPinz2JbaLTtb7M5jHg@mail.gmail.com
Whole thread Raw
In response to RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
Hi,

wangw.fnst@fujitsu.com <wangw.fnst@fujitsu.com>, 31 Oca 2023 Sal, 13:40 tarihinde şunu yazdı:
Sorry, I forgot to add the link to the email. Please refer to [1].

[1] - https://www.postgresql.org/message-id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Thanks for pointing out this review. I somehow skipped that, sorry.

Please see attached patches.

shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com>, 17 Oca 2023 Sal, 10:46 tarihinde şunu yazdı:
On Wed, Jan 11, 2023 4:31 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
0001 patch
============
1. walsender.c
+       /* Create a tuple to send consisten WAL location */

"consisten" should be "consistent" I think.

Done.
 
2. logical.c
+       if (need_full_snapshot)
+       {
+               LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+               SpinLockAcquire(&slot->mutex);
+               slot->effective_catalog_xmin = xmin_horizon;
+               slot->data.catalog_xmin = xmin_horizon;
+               slot->effective_xmin = xmin_horizon;
+               SpinLockRelease(&slot->mutex);
+
+               xmin_horizon = GetOldestSafeDecodingTransactionId(!need_full_snapshot);
+               ReplicationSlotsComputeRequiredXmin(true);
+
+               LWLockRelease(ProcArrayLock);
+       }

It seems that we should first get the safe decoding xid, then inform the slot
machinery about the new limit, right? Otherwise the limit will be
InvalidTransactionId and that seems inconsistent with the comment.

You're right. Moved that call before assigning xmin_horizon.
 
3. doc/src/sgml/protocol.sgml
+       is used in the currenct transaction. This command is currently only supported
+       for logical replication.
+       slots.

We don't need to put "slots" in a new line.

Done.
 
0002 patch
============
1.
In pg_subscription_rel.h, I think the type of "srrelslotname" can be changed to
NameData, see "subslotname" in pg_subscription.h.

2.
+                                * Find the logical replication sync worker if exists store
+                                * the slot number for dropping associated replication slots
+                                * later.

Should we add comma after "if exists"?

Done. 

3.
+       PG_FINALLY();
+       {
+               pfree(cmd.data);
+       }
+       PG_END_TRY();
+       \
+               return tablelist;
+}

Do we need the backslash?

Removed it.
 
4.
+       /*
+        * Advance to the LSN got from walrcv_create_slot. This is WAL
+        * logged for the purpose of recovery. Locks are to prevent the
+        * replication origin from vanishing while advancing.

"walrcv_create_slot" should be changed to
"walrcv_create_slot/walrcv_slot_snapshot" I think.

Right, done.
 
 
5.
+                       /* Replication drop might still exist. Try to drop */
+                       replorigin_drop_by_name(originname, true, false);

Should "Replication drop" be "Replication origin"?

Done.
 
6.
I saw an assertion failure in the following case, could you please look into it?
The backtrace is attached.

-- pub
CREATE TABLE tbl1 (a int, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create publication pub for table tbl1, tbl2;
insert into tbl1 values (1, 'a');
insert into tbl1 values (1, 'a');

-- sub
CREATE TABLE tbl1 (a int primary key, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

Subscriber log:
2023-01-17 14:47:10.054 CST [1980841] LOG:  logical replication apply worker for subscription "sub" has started
2023-01-17 14:47:10.060 CST [1980843] LOG:  logical replication table synchronization worker for subscription "sub", table "tbl1" has started
2023-01-17 14:47:10.070 CST [1980845] LOG:  logical replication table synchronization worker for subscription "sub", table "tbl2" has started
2023-01-17 14:47:10.073 CST [1980843] ERROR:  duplicate key value violates unique constraint "tbl1_pkey"
2023-01-17 14:47:10.073 CST [1980843] DETAIL:  Key (a)=(1) already exists.
2023-01-17 14:47:10.073 CST [1980843] CONTEXT:  COPY tbl1, line 2
2023-01-17 14:47:10.074 CST [1980821] LOG:  background worker "logical replication worker" (PID 1980843) exited with exit code 1
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table synchronization worker for subscription "sub", table "tbl2" has finished
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table synchronization worker for subscription "sub" has moved to sync table "tbl1".
TRAP: failed Assert("node != InvalidRepOriginId"), File: "origin.c", Line: 892, PID: 1980845

I'm not able to reproduce this yet. Will look into it further.

Thanks,
--
Melih Mutlu
Microsoft
Attachment

pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Privileges on PUBLICATION
Next
From: Melih Mutlu
Date:
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication