Re: Logical replication is missing block of rows when sending initial sync? - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: Logical replication is missing block of rows when sending initial sync?
Date
Msg-id a266b461-e340-10c6-d511-40f9edb37d28@enterprisedb.com
Whole thread Raw
In response to Re: Logical replication is missing block of rows when sending initial sync?  (hubert depesz lubaczewski <depesz@depesz.com>)
Responses Re: Logical replication is missing block of rows when sending initial sync?  (hubert depesz lubaczewski <depesz@depesz.com>)
List pgsql-bugs

On 11/3/23 13:04, hubert depesz lubaczewski wrote:
> On Fri, Nov 03, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
>> On Thu, Nov 2, 2023 at 4:53 PM hubert depesz lubaczewski
>> <depesz@depesz.com> wrote:
>>>
>>> On Thu, Nov 02, 2023 at 10:17:13AM +0900, Kyotaro Horiguchi wrote:
>>>> At Mon, 30 Oct 2023 07:10:35 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in
>>>>> I've tried, but I could not reproduce the failure. PSA the script what I did.
>>>>
>>>> I'm not well-versed in the details of logical replication, but does
>>>> logical replication inherently operate in such a way that it fully
>>>> maintains relationships between tables? If not, isn't it possible that
>>>> the issue in question is not about missing referenced data, but merely
>>>> a temporary delay?
>>>
>>> The problem is that date that appeared *later* was visible on the
>>> subscriber. Data that came earlier was visible too. Just some block of
>>> data got, for some reason, skipped.
>>>
>>
>> Quite strange. I think to narrow down such a problem, the first thing
>> to figure out is whether the data is skipped by initial sync or later
>> replication. To find that out, you can check remote_lsn value in
>> pg_replication_origin_status for the origin used in the initial sync
>> once the relation reaches the 'ready' state. Then, you can try to see
>> on the publisher side using pg_waldump whether the missing rows exist
>> before the value of remote_lsn or after it. That can help us to narrow
>> down the problem and could give us some clues for the next steps.
> 
> I will be prepping another set of clusters to upgrade soon, will try to
> get some more data. The window to work on the bad data isn't long,
> though.
> 

I think it'd be interesting to know:

1) Commit LSN for the missing rows (for the xmin).

2) Are the other changes for these transactions that *got* replicated
correctly?

3) LSNs used for the tablesync slot, catchup etc. I believe those are in
the server log.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-bugs by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: Re: Logical replication is missing block of rows when sending initial sync?
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: Logical replication is missing block of rows when sending initial sync?