Logical replication is missing block of rows when sending initial sync? - Mailing list pgsql-bugs
From | hubert depesz lubaczewski |
---|---|
Subject | Logical replication is missing block of rows when sending initial sync? |
Date | |
Msg-id | ZTu8GTDajCkZVjMs@depesz.com Whole thread Raw |
Responses |
RE: Logical replication is missing block of rows when sending initial sync?
|
List | pgsql-bugs |
Hi, we have situation where we use native logical replication to copy data from pg12 (on ubuntu bionic) to pg14 (on ubuntu focal). Generally it mostly works. The process is like this: 1. make all tables in destination, with *just* primary keys. no other indexes, constraints 2. setup logical replication, adding tables in batches by 40, adding new batch to publication only when all tables are in 'r' state 3. once all tables are replicating, add indexes, fkeys, trgiggers and the like on recipient. Every so often we got errors when adding fkeys. Usually I didn't have much time to investigate, but today had a bit. Unfortunatley the replica is gone (I had to re-set the replication). But when I had it there I noticed that a block of rows from one of the tables was missing. I am not entirely sure about the size of the block, but I check 20 rows with consecutive ids, and they were all missing from focal side: #v+ =# select ctid, xmin, id, created_at, updated_at from c33s16831.subs where id between 19733130 and 19733150 order by 1; ctid │ xmin │ id │ created_at │ updated_at ─────────────┼────────────┼──────────┼────────────────────────────┼──────────────────────────── (486337,57) │ 2668461379 │ 19733130 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,58) │ 2668461379 │ 19733131 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,59) │ 2668461379 │ 19733132 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,60) │ 2668461379 │ 19733133 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,61) │ 2668461379 │ 19733134 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,62) │ 2668461379 │ 19733135 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,63) │ 2668461379 │ 19733136 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,64) │ 2668461379 │ 19733137 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,65) │ 2668461379 │ 19733138 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,66) │ 2668461379 │ 19733139 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486337,67) │ 2668461379 │ 19733140 │ 2023-10-25 19:11:41.926204 │ 2023-10-25 19:11:41.926204 (486340,38) │ 2668691763 │ 19733147 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:20:07.360778 (486340,58) │ 2668695225 │ 19733146 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:20:15.227267 (486340,61) │ 2668711141 │ 19733144 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:20:43.499749 (486340,62) │ 2668717802 │ 19733149 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:20:49.900798 (486340,63) │ 2668723876 │ 19733141 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:21:00.749445 (486340,64) │ 2668753420 │ 19733142 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:22:02.140332 (486340,66) │ 2668761678 │ 19733150 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:22:19.140231 (486341,9) │ 2668726256 │ 19733145 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:21:06.845051 (486341,17) │ 2668748364 │ 19733148 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:21:50.246899 (486341,18) │ 2668750613 │ 19733143 │ 2023-10-25 19:19:06.299059 │ 2023-10-25 19:21:55.863125 (21 rows) #v- Last vacuum (from autovacuum) was ~ 3 weeks ago, so it isn't a factor. I don't know when exactly this batch was added to publication, but I found that COPY that sent initial sync of the data waslogged at: #v+ 2023-10-25 18:48:43.982 UTC,"upguser","dbname",12995,"10.1.207.7:44956",65396265.32c3,6,"COPY",2023-10-25 18:45:57 UTC,74/160580191,0,LOG,00000,"duration:166355.331 ms statement: COPY c33s16831.subs TO STDOUT",,,,,,,,,"pg_443910492_sync_443582288_7292150788357777097" #v- Unfortunately I don't have a way to repeat the problem. Just noticed that on some dbs that we're upgrading the problem is more common, and on some - it generally doesn't happen. So, I don't have high hopes re: finding/fixing the issue, but figured I'll let you know that there is such situation, perhaps it will strike someone as something relating to something that could be the problem... Best regards, depesz
pgsql-bugs by date: