Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From Melih Mutlu
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAGPVpCTuwTwAh8V8EcaKyea+RTk32CWUVX5Der13jrgk8wB5_Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
Hi,

Amit Kapila <amit.kapila16@gmail.com>, 2 Ağu 2023 Çar, 12:01 tarihinde şunu yazdı:
I think we are getting the error (ERROR:  could not find logical
decoding starting point) because we wouldn't have waited for WAL to
become available before reading it. It could happen due to the
following code:
WalSndWaitForWal()
{
...
if (streamingDoneReceiving && streamingDoneSending &&
!pq_is_send_pending())
break;
..
}

Now, it seems that in 0003 patch, instead of resetting flags
streamingDoneSending, and streamingDoneReceiving before start
replication, we should reset before create logical slots because we
need to read the WAL during that time as well to find the consistent
point.

Thanks for the suggestion Amit. I've been looking into this recently and couldn't figure out the cause until now.
I quickly made the fix in 0003. Seems like it resolved the "could not find logical decoding starting point" errors.

vignesh C <vignesh21@gmail.com>, 1 Ağu 2023 Sal, 09:32 tarihinde şunu yazdı:
I agree that  "no copy in progress issue" issue has nothing to do with
0001 patch. This issue is present with the 0002 patch.
In the case when the tablesync worker has to apply the transactions
after the table is synced, the tablesync worker sends the feedback of
writepos, applypos and flushpos which results in "No copy in progress"
error as the stream has ended already. Fixed it by exiting the
streaming loop if the tablesync worker is done with the
synchronization. The attached 0004 patch has the changes for the same.
The rest of v22 patches are the same patch that were posted by Melih
in the earlier mail.

Thanks for the fix. I placed it into 0002 with a slight change as follows: 

- send_feedback(last_received, false, false);
+ if (!MyLogicalRepWorker->relsync_completed)
+ send_feedback(last_received, false, false);
 
IMHO relsync_completed means simply the same with streaming_done, that's why I wanted to check that flag instead of an additional goto statement. Does it make sense to you as well?

Thanks,
--
Melih Mutlu
Microsoft
Attachment

pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Support to define custom wait events for extensions
Next
From: Andrey Lepikhov
Date:
Subject: Re: [PoC] Reducing planning time when tables have many partitions