Thread: Logically replicated table has no visible rows

Logically replicated table has no visible rows

From
Jeff Ross
Date:
Hello,

We have a logically replicated table on RDS that is 39 G in size on both 
the publisher (10.21) and the subscriber (12.8).

The replication slots on the publisher are all marked as active and the 
lsns are current so no lag.

Other tables on the subscriber side are also identical in size and have 
no problem with queries against them.

We did a vacuum full on the subscriber table and the size dropped to a 
couple hundred MBs and is growing but still has no visible rows.

Now on the publisher I see a temporary replication slot so I'm guessing 
that in reality replication had not finished and that's why the 
load_events table was full sized but had no visible rows?

At this point I guess my questions are does my hunch that replication 
hadn't completely finished is the root cause of what we saw make sense?

I've always used both the lsns and the lack of temporary replication 
slots to tell when logical replication had finished--is there a more 
authoritative way to do so?

Thanks,

Jeff



Re: Logically replicated table has no visible rows

From
Jeff Ross
Date:
On 5/31/22 11:46 AM, Jeff Ross wrote:
> Hello,
>
> We have a logically replicated table on RDS that is 39 G in size on 
> both the publisher (10.21) and the subscriber (12.8).
>
> The replication slots on the publisher are all marked as active and 
> the lsns are current so no lag.
>
> Other tables on the subscriber side are also identical in size and 
> have no problem with queries against them.
>
> We did a vacuum full on the subscriber table and the size dropped to a 
> couple hundred MBs and is growing but still has no visible rows.
>
> Now on the publisher I see a temporary replication slot so I'm 
> guessing that in reality replication had not finished and that's why 
> the load_events table was full sized but had no visible rows?
>
> At this point I guess my questions are does my hunch that replication 
> hadn't completely finished is the root cause of what we saw make sense?
>
> I've always used both the lsns and the lack of temporary replication 
> slots to tell when logical replication had finished--is there a more 
> authoritative way to do so?
>
> Thanks,
>
> Jeff
>
>

As a follow up for the archives...

It became apparent the longer we looked that logical replication in fact 
had not finished and indeed never did finish.  On both an EC2 server and 
an RDS server we were missing at least one table that by table size 
appeared to be identical to the publisher's table and yet had no visible 
rows.  On the publisher I would occasionally see temporary replication 
slots but they would go away in a few minutes.

I then dropped the subscription on the RDS server and immediately 
restarted it with (copy_data = False).  As expected we immediately saw 
rows being inserted into that massive table but it still showed only 
those rows.  I have a "backfill" process to sync up logically replicated 
tables and that ran but very, very slowly.  Finally while syncing that 
big table it stalled completely.  Network monitoring showed about 
100Kb/s traffic to the RDS server.  That is not a typo.

Our client spun up another RDS server, imported all the settings from 
the previous one and I setup the database and imported the schemas.  As 
soon as we started replication we were seeing network transfers in the 
45Mb/s range.  Replication finished with all tables intact in under 5 
hours for 420G.

We changed nothing on our side so whatever was causing the glacial data 
transfer was on AWS side.

Jeff