Home > mailing lists

[MASSMAIL]Logical replication failure modes - Mailing list pgsql-hackers

From	Philip Warner
Subject	[MASSMAIL]Logical replication failure modes
Date	March 29, 2024 11:43:26
Msg-id	7bcd1f6b2697e13ac70177ccccfdc4df@rhyme.com.au Whole thread Raw
Responses	Re: Logical replication failure modes Re: Logical replication failure modes
List	pgsql-hackers

Tree view

I am trying to discover the causes of occasional data loss in logical replication; it is VERY rare and happens every few week/months.

Our setup is a source DB running in docker on AWS cloud server. The source database is stored in on local disks on the cloud server.

The replication target is a K8 POD running in an AWS instance with an attached persistent AWS disk. The disk mounting is managed by K8. Periodically this POD is deleted and restarted in an orderly way, and the persistent disk stores the database.

What we are seeing is *very* occasional records not being replicated in the more active tables.

Sometimes we have a backlog of several GB of data due to missing fields in the target or network outages etc.

I am also seeing signs that some triggers are not being applied (at the same time frame): ie. data *is* inserted but triggers that summarize that data is not summarizing some rows and the dates on those non-summarized rows corresponds to dates on unrelated missing rows in other tables.

This all leads me to conclude that there might be missing transactions? Or non-applied transactions etc. But it is further complicated by the fact that there is a second target database that *does* have all the missing records.

Any insights or avenues of exploration would be very welcome!

pgsql-hackers by date:

From: Danil Anisimow
Date: 29 March 2024, 11:20:11
Subject: Re: Comments on Custom RMGRs

From: Philip Warner
Date: 29 March 2024, 11:47:24
Subject: Re: Logical replication failure modes

[MASSMAIL]Logical replication failure modes - Mailing list pgsql-hackers

Previous

Next