Home > mailing lists

How can you find out what point logical replication is at? - Mailing list pgsql-general

From	John Ashmead
Subject	How can you find out what point logical replication is at?
Date	August 3, 2020 17:38:42
Msg-id	06C3100D-2C47-4707-B426-14D00A95260B@ashmeadsoftware.com Whole thread Raw
In response to	Re: Apparent missed query optimization with self-join and inner grouping (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: How can you find out what point logical replication is at? -- or weird, slow, infinite loop
List	pgsql-general

Tree view

I have logical replication setup from a factory in Zhuhai China to a data warehouse in New Jersey. We are using postgresql 10.13 on both sides, on Redhat Linux 7.6.

The logical replication has been in “catchup” mode for several days now, stuck at a specific LSN (9EF/89ADF7E0). The slave side seems to be chugging along, generating lots of WAL files — but not actually getting any new records in.

The logical replication is being used to move some fairly large files: averaging about 1 MB but with a max up to about 250 MB. (I think I want to fix that longer term, but that’s not a quick fix.)

My working hypothesis is that we are stuck on a long transaction: that we can’t get some abnormally large blob over before we drop the line. In this case fixing the connection should fix the problem.

Is there a way I can see what is going on? The slave is working hard, but what is it working on?

And are there any suggestions on how to handle this?

I could restart the logical replication with “copy_data = false”, then fill in the holes by hand. But I would rather not!

Thanks in advance!

John

PS. Output of pg_stat_replication & pg_stat_subscription on master & slave respectively. (Some proprietary information X’d out)

select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------

pid | 42451

usesysid | 10

usename | postgres

application_name | china_to_nj_sub

client_addr | XXX.XXX.XXX.XXX

client_hostname |

client_port | 54300

backend_start | 2020-08-03 09:07:07.257454-04

backend_xmin | 16574498

state | catchup

sent_lsn | 9EF/89ADF7E0

write_lsn | 9EF/89ADF7E0

flush_lsn | 9EF/89ADF7E0

replay_lsn | 9EF/89ADF7E0

write_lag |

flush_lag |

replay_lag |

sync_priority | 0

sync_state | async

select * from pg_stat_subscription;

-[ RECORD 1 ]---------+------------------------------

subid | 7222699

subname | china_to_nj_sub

pid | 14764

relid |

received_lsn | 9EF/89ADF7E0

last_msg_send_time | 2020-08-03 10:15:48.644575-04

last_msg_receipt_time | 2020-08-03 10:14:57.247993-04

latest_end_lsn | 9EF/89ADF7E0

latest_end_time | 2020-08-03 09:30:57.974223-04

John Ashmead

139 Montrose Avenue

Rosemont, PA, 19010-1508

(610) 527 9560

mobile (610) 247 2323

john.ashmead@ashmeadsoftware.com

Attachment

smime.p7s

pgsql-general by date:

From: David Rowley
Date: 03 August 2020, 13:34:29
Subject: Re: How to get the name of the current database in C function/extention inside a trigger?

From: Ben Chobot
Date: 03 August 2020, 17:41:44
Subject: Re: 12.3 replicas falling over during WAL redo

How can you find out what point logical replication is at? - Mailing list pgsql-general

Attachment

Previous

Next