Re: WAL accumulating, Logical Replication pg 13 - Mailing list pgsql-general

From Willy-Bas Loos
Subject Re: WAL accumulating, Logical Replication pg 13
Date
Msg-id CAHnozTgA6oztDwcQsr0nREOBNFKF4f6Gouja7pAYpXn158mC4Q@mail.gmail.com
Whole thread Raw
In response to Re: WAL accumulating, Logical Replication pg 13  (Tomas Pospisek <tpo2@sourcepole.ch>)
Responses Re: WAL accumulating, Logical Replication pg 13  (Willy-Bas Loos <willybas@gmail.com>)
List pgsql-general
Hi, I was going to follow up on this one, sorry for the long silence.
The replication is working fine now, and I have no idea what the problem was. Not cool.
If I find out, I will let you know.

On Mon, May 31, 2021 at 6:06 PM Tomas Pospisek <tpo2@sourcepole.ch> wrote:
Hi Willy-Bas Loos,

On 31.05.21 17:32, Willy-Bas Loos wrote:
>
>
> On Mon, May 31, 2021 at 4:24 PM Vijaykumar Jain
> <vijaykumarjain.github@gmail.com
> <mailto:vijaykumarjain.github@gmail.com>> wrote:
>
>     So I got it all wrong it seems :)
>
> Thank you for taking the time to help me!
>
>     You upgraded to pg13 fine? , but while on pg13 you have issues with
>     logical replication ?
>
> Yes, the upgrade went fine. So here are some details:
> I already had londiste running on postgres 9.3, but londiste wouldn't
> run on Debian 10
> So i first made the new server Debian 9 with postgres 9.6 and i started
> replicating with londiste from 9.3 to 9.6
> When all was ready, i stopped the replication to the 9.6 server and
> deleted all londiste & pgq content with drop schema cascade.
> Then I upgraded the server to Debian  10. Then i user pg_upgrade to
> upgrade from postgres 9.6 to 13. (PostGIS versions were kept compatible).
> Then I added logical replication and a third server as a subscriber.
>
> I was going to write that replication is working fine (since the table
> contains a lot of data and there are no conflicts in the log), but it
> turns out that it isn't.
> The subscriber is behind and It looks like there hasn't been any
> incoming data after the initial data synchronization.
> So at least now i know that the WAL is being retained with a reason. The
> connection is working properly (via psql anyway)

I once maybe had a similar problem due to some ports that were needed
for replication being firewalled off or respectively the master having
the wrong IP address of the old master (now standby server) or such.

There was absolutely no word anywhere in any log about the problem I was
just seeing the new postgres master not starting up after hours and
hours of waiting after a failover. I somehow found out about the
required port being blocked (I don't remember - maybe seing the
unanswered SYNs in tcpdump? Or via ufw log entries?).

> I will also look into how to diagnose this from the system tables, e.g.
> substracting LSN's to get some quantitative measure  for the lag.
>
>
>
>     There is a path in the postgresql source user subscription folder
>     iirc which covers various logical replication scenarios.
>     That may help you just in case.
>
> OK, so comments in the source code you mean?
>



--
Willy-Bas Loos

pgsql-general by date:

Previous
From: Adrian Ho
Date:
Subject: Re: Even more OT: Ditto machines [was: bottom / top posting]
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: How to pass a parameter in a query to postgreSQL 12