On Fri, Nov 03, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
> On Thu, Nov 2, 2023 at 4:53 PM hubert depesz lubaczewski
> <depesz@depesz.com> wrote:
> >
> > On Thu, Nov 02, 2023 at 10:17:13AM +0900, Kyotaro Horiguchi wrote:
> > > At Mon, 30 Oct 2023 07:10:35 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in
> > > > I've tried, but I could not reproduce the failure. PSA the script what I did.
> > >
> > > I'm not well-versed in the details of logical replication, but does
> > > logical replication inherently operate in such a way that it fully
> > > maintains relationships between tables? If not, isn't it possible that
> > > the issue in question is not about missing referenced data, but merely
> > > a temporary delay?
> >
> > The problem is that date that appeared *later* was visible on the
> > subscriber. Data that came earlier was visible too. Just some block of
> > data got, for some reason, skipped.
> >
>
> Quite strange. I think to narrow down such a problem, the first thing
> to figure out is whether the data is skipped by initial sync or later
> replication. To find that out, you can check remote_lsn value in
> pg_replication_origin_status for the origin used in the initial sync
> once the relation reaches the 'ready' state. Then, you can try to see
> on the publisher side using pg_waldump whether the missing rows exist
> before the value of remote_lsn or after it. That can help us to narrow
> down the problem and could give us some clues for the next steps.
I will be prepping another set of clusters to upgrade soon, will try to
get some more data. The window to work on the bad data isn't long,
though.
Best regards,
depesz