Re: Seeking Clarification on Logical Replication Start LSN - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Seeking Clarification on Logical Replication Start LSN
Date
Msg-id CAA4eK1K3vjj7o-x8AbDvrF_O989enWcb_2_aWorBWj1+_dYx5w@mail.gmail.com
Whole thread Raw
In response to Seeking Clarification on Logical Replication Start LSN  (Pradeep Kumar <spradeepkumar29@gmail.com>)
List pgsql-hackers
On Tue, Feb 27, 2024 at 5:56 PM Pradeep Kumar <spradeepkumar29@gmail.com> wrote:
>
> Dear Postgres Community,
>
> I hope this email finds you well. I am reaching out to seek clarification on an issue I am encountering with logical
replicationin PostgreSQL. 
>
> My specific question pertains to determining the appropriate LSN (Log Sequence Number) from which to start logical
replication.Allow me to provide detailed context for better understanding: 
>
> During the process of performing a parallel pg_basebackup, I concurrently execute DML queries. As part of the
pg_basebackupcommand, I utilize the option create-slot to create a replication slot. Subsequently, upon completion of
thebase backup, I initiate logical replication using the restart_lsn obtained during the execution of the pg_basebackup
command.My intention is to ensure consistency between two database clusters. 
>
> However, I am encountering errors during this process. Specifically, I receive the following error message on the
sourceside: 
>
> """
> 2024-02-27 16:20:09.271 IST [2838457] ERROR: duplicate key value violates unique constraint "table_15_36_pkey"
> 2024-02-27 16:20:09.271 IST [2838457] DETAIL: Key (col_1, col_2)=(23, 2024-02-27 15:14:24.332557) already exists.
> 2024-02-27 16:20:09.272 IST [2834967] LOG: background worker "logical replication worker" (PID 2838457) exited with
exitcode 1 
> Upon analysis, it appears that the errors stem from starting the logical replication with an incorrect LSN, one that
hasalready been applied to the target side, leading to duplicate key conflicts. 
> """
>
> In light of this issue, I seek guidance on determining the appropriate LSN from which to commence logical
replication.
>
> To further clarify my problem:
>
> 1)I have a source machine and a target machine.
> 2) I perform a backup from the source to the target using pg_basebackup.
> 3) Prior to initiating the base backup, I create logical replication slots on the source machine.
> 4) During the execution of pg_basebackup, DML queries are executed, and I aim to replicate this data on the target
machine.
> 5) My dilemma lies in determining the correct LSN to begin the logical replication process.
>

I think the reason of the problem you are seeing is pg_basebackup also
includes the WAL generated during backup if you specify -X method. See
[1]. Now, as you have created a logical slot before starting backup,
data duplication is possible. I don't see a very straightforward way
but you might be able to achieve your desired purpose if somehow
identify the last WAL location copied in backup and use that as your
starting point for logical replication.

[1] - https://www.postgresql.org/docs/devel/app-pgbasebackup.html

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Regardign RecentFlushPtr in WalSndWaitForWal()
Next
From: Nikita Malakhov
Date:
Subject: Re: Shared detoast Datum proposal