Thread: Need help debugging slow logical replication

Need help debugging slow logical replication

From
"sunyucong@gmail.com"
Date:
Hi there,

I am using PG 14.14 on both primary and secondary DB on AWS, setup
using a logical replication, I'm having trouble with huge replication
lag.

My setup is as follows:

P1 - physical - P1-R
  | (logical)
P2 - physical - P2-R


The lag between P1 & P1-R , P2 & P2-R are both minimal , less than
10seconds , but I'm seeing several hours of lag between P1 & P2  .We
currently have 1 pub/sub that covers about 100 tables.

Here is the output from P1 showing there is a Lag of at least 80GB
(keep growing)

> SELECT now() AS CURRENT_TIME,
       slot_name,
       active,
       active_pid,confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),
confirmed_flush_lsn)) AS diff_size,
       pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS diff_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';
-[ RECORD 1 ]-------+---------------------------
current_time        | 2023-02-07 23:26:00.733+00
slot_name           | upgrade
active              | t
active_pid          | 5180
confirmed_flush_lsn | 26B09/8C08C610
diff_size           | 81 GB
diff_bytes          | 86573472240

Here is what pg_stat_replication shows: note that the write_lag is
very high: we previously had to set wal_sender_timeout to 0, otherwise
the logical replication work keep exiting and fail.

> select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | xxx
usesysid         | xxx
usename          | dev
application_name | upgrade_target
client_addr      | 10.xxx
client_hostname  |
client_port      | 27404
backend_start    | 2023-02-07 23:02:39.228572+00
backend_xmin     | 2909692747
state            | catchup
sent_lsn         | 26B09/8C08C610
write_lsn        | 26B09/840514C0
flush_lsn        | 26B09/840514C0
replay_lsn       | 26B09/840514C0
write_lag        | 00:07:03.60362
flush_lag        | 00:07:03.60362
replay_lag       | 00:07:03.60362
sync_priority    | 0
sync_state       | async
reply_time       | 2023-02-07 23:22:08.245066+00

So, the problem seems to be like P2 takes a very long time to apply a
chunk,  Mow, my question is why?  From AWS performance insight, it
looks like there is only 1 process running on P2, that is 100% blocked
by CPU. I don't really understand why it is being blocked by the CPU,
and any help in further debugging is much appreciated.

Cheers.

Attachment

Re: Need help debugging slow logical replication

From
Justin
Date:

On Tue, Feb 7, 2023 at 6:38 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
Hi there,

I am using PG 14.14 on both primary and secondary DB on AWS, setup
using a logical replication, I'm having trouble with huge replication
lag.

My setup is as follows:

P1 - physical - P1-R
  | (logical)
P2 - physical - P2-R


The lag between P1 & P1-R , P2 & P2-R are both minimal , less than
10seconds , but I'm seeing several hours of lag between P1 & P2  .We
currently have 1 pub/sub that covers about 100 tables.

Here is the output from P1 showing there is a Lag of at least 80GB
(keep growing)

> SELECT now() AS CURRENT_TIME,
       slot_name,
       active,
       active_pid,confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),
confirmed_flush_lsn)) AS diff_size,
       pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS diff_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';
-[ RECORD 1 ]-------+---------------------------
current_time        | 2023-02-07 23:26:00.733+00
slot_name           | upgrade
active              | t
active_pid          | 5180
confirmed_flush_lsn | 26B09/8C08C610
diff_size           | 81 GB
diff_bytes          | 86573472240

Here is what pg_stat_replication shows: note that the write_lag is
very high: we previously had to set wal_sender_timeout to 0, otherwise
the logical replication work keep exiting and fail.

> select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | xxx
usesysid         | xxx
usename          | dev
application_name | upgrade_target
client_addr      | 10.xxx
client_hostname  |
client_port      | 27404
backend_start    | 2023-02-07 23:02:39.228572+00
backend_xmin     | 2909692747
state            | catchup
sent_lsn         | 26B09/8C08C610
write_lsn        | 26B09/840514C0
flush_lsn        | 26B09/840514C0
replay_lsn       | 26B09/840514C0
write_lag        | 00:07:03.60362
flush_lag        | 00:07:03.60362
replay_lag       | 00:07:03.60362
sync_priority    | 0
sync_state       | async
reply_time       | 2023-02-07 23:22:08.245066+00

So, the problem seems to be like P2 takes a very long time to apply a
chunk,  Mow, my question is why?  From AWS performance insight, it
looks like there is only 1 process running on P2, that is 100% blocked
by CPU. I don't really understand why it is being blocked by the CPU,
and any help in further debugging is much appreciated.

Cheers.

Hi Sir,

What is the replica identity being used on the tables?  Are any of the tables using  REPLICA IDENTITY FULL ?

How many tables are being replicated? 

Re: Need help debugging slow logical replication

From
"sunyucong@gmail.com"
Date:
Hi Justin:

- i checked that I have 2 tables using replication identity FULL, but
one table is empty and one table has only 1 row
- 7 tables using index
- overall I have ~100 tables in the publication:  But I do find that
some tables doesn't have PK , but using replica identity default, I'm
abit confused since the doc mentioned that that means it is using PK,
does that mean it is using FULL?

Cheers.

On Tue, Feb 7, 2023 at 4:20 PM Justin <zzzzz.graf@gmail.com> wrote:
>
>
> On Tue, Feb 7, 2023 at 6:38 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
>>
>> Hi there,
>>
>> I am using PG 14.14 on both primary and secondary DB on AWS, setup
>> using a logical replication, I'm having trouble with huge replication
>> lag.
>>
>> My setup is as follows:
>>
>> P1 - physical - P1-R
>>   | (logical)
>> P2 - physical - P2-R
>>
>>
>> The lag between P1 & P1-R , P2 & P2-R are both minimal , less than
>> 10seconds , but I'm seeing several hours of lag between P1 & P2  .We
>> currently have 1 pub/sub that covers about 100 tables.
>>
>> Here is the output from P1 showing there is a Lag of at least 80GB
>> (keep growing)
>>
>> > SELECT now() AS CURRENT_TIME,
>>        slot_name,
>>        active,
>>        active_pid,confirmed_flush_lsn,
>>        pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),
>> confirmed_flush_lsn)) AS diff_size,
>>        pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS diff_bytes
>> FROM pg_replication_slots
>> WHERE slot_type = 'logical';
>> -[ RECORD 1 ]-------+---------------------------
>> current_time        | 2023-02-07 23:26:00.733+00
>> slot_name           | upgrade
>> active              | t
>> active_pid          | 5180
>> confirmed_flush_lsn | 26B09/8C08C610
>> diff_size           | 81 GB
>> diff_bytes          | 86573472240
>>
>> Here is what pg_stat_replication shows: note that the write_lag is
>> very high: we previously had to set wal_sender_timeout to 0, otherwise
>> the logical replication work keep exiting and fail.
>>
>> > select * from pg_stat_replication;
>> -[ RECORD 1 ]----+------------------------------
>> pid              | xxx
>> usesysid         | xxx
>> usename          | dev
>> application_name | upgrade_target
>> client_addr      | 10.xxx
>> client_hostname  |
>> client_port      | 27404
>> backend_start    | 2023-02-07 23:02:39.228572+00
>> backend_xmin     | 2909692747
>> state            | catchup
>> sent_lsn         | 26B09/8C08C610
>> write_lsn        | 26B09/840514C0
>> flush_lsn        | 26B09/840514C0
>> replay_lsn       | 26B09/840514C0
>> write_lag        | 00:07:03.60362
>> flush_lag        | 00:07:03.60362
>> replay_lag       | 00:07:03.60362
>> sync_priority    | 0
>> sync_state       | async
>> reply_time       | 2023-02-07 23:22:08.245066+00
>>
>> So, the problem seems to be like P2 takes a very long time to apply a
>> chunk,  Mow, my question is why?  From AWS performance insight, it
>> looks like there is only 1 process running on P2, that is 100% blocked
>> by CPU. I don't really understand why it is being blocked by the CPU,
>> and any help in further debugging is much appreciated.
>>
>> Cheers.
>
>
> Hi Sir,
>
> What is the replica identity being used on the tables?  Are any of the tables using  REPLICA IDENTITY FULL ?
>
> How many tables are being replicated?



Re: Need help debugging slow logical replication

From
Justin
Date:


On Tue, Feb 7, 2023 at 8:07 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
Hi Justin:

- i checked that I have 2 tables using replication identity FULL, but
one table is empty and one table has only 1 row
- 7 tables using index
- overall I have ~100 tables in the publication:  But I do find that
some tables doesn't have PK , but using replica identity default, I'm
abit confused since the doc mentioned that that means it is using PK,
does that mean it is using FULL?

Cheers.

On Tue, Feb 7, 2023 at 4:20 PM Justin <zzzzz.graf@gmail.com> wrote:
>
>
> On Tue, Feb 7, 2023 at 6:38 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
>>
>> Hi there,
>>
>> I am using PG 14.14 on both primary and secondary DB on AWS, setup
>> using a logical replication, I'm having trouble with huge replication
>> lag.
>>
>> My setup is as follows:
>>
>> P1 - physical - P1-R
>>   | (logical)
>> P2 - physical - P2-R
>>
>>
>> The lag between P1 & P1-R , P2 & P2-R are both minimal , less than
>> 10seconds , but I'm seeing several hours of lag between P1 & P2  .We
>> currently have 1 pub/sub that covers about 100 tables.
>>
>> Here is the output from P1 showing there is a Lag of at least 80GB
>> (keep growing)
>>
>> > SELECT now() AS CURRENT_TIME,
>>        slot_name,
>>        active,
>>        active_pid,confirmed_flush_lsn,
>>        pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),
>> confirmed_flush_lsn)) AS diff_size,
>>        pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS diff_bytes
>> FROM pg_replication_slots
>> WHERE slot_type = 'logical';
>> -[ RECORD 1 ]-------+---------------------------
>> current_time        | 2023-02-07 23:26:00.733+00
>> slot_name           | upgrade
>> active              | t
>> active_pid          | 5180
>> confirmed_flush_lsn | 26B09/8C08C610
>> diff_size           | 81 GB
>> diff_bytes          | 86573472240
>>
>> Here is what pg_stat_replication shows: note that the write_lag is
>> very high: we previously had to set wal_sender_timeout to 0, otherwise
>> the logical replication work keep exiting and fail.
>>
>> > select * from pg_stat_replication;
>> -[ RECORD 1 ]----+------------------------------
>> pid              | xxx
>> usesysid         | xxx
>> usename          | dev
>> application_name | upgrade_target
>> client_addr      | 10.xxx
>> client_hostname  |
>> client_port      | 27404
>> backend_start    | 2023-02-07 23:02:39.228572+00
>> backend_xmin     | 2909692747
>> state            | catchup
>> sent_lsn         | 26B09/8C08C610
>> write_lsn        | 26B09/840514C0
>> flush_lsn        | 26B09/840514C0
>> replay_lsn       | 26B09/840514C0
>> write_lag        | 00:07:03.60362
>> flush_lag        | 00:07:03.60362
>> replay_lag       | 00:07:03.60362
>> sync_priority    | 0
>> sync_state       | async
>> reply_time       | 2023-02-07 23:22:08.245066+00
>>
>> So, the problem seems to be like P2 takes a very long time to apply a
>> chunk,  Mow, my question is why?  From AWS performance insight, it
>> looks like there is only 1 process running on P2, that is 100% blocked
>> by CPU. I don't really understand why it is being blocked by the CPU,
>> and any help in further debugging is much appreciated.
>>
>> Cheers.
>
>
> Hi Sir,
>
> What is the replica identity being used on the tables?  Are any of the tables using  REPLICA IDENTITY FULL ?
>
> How many tables are being replicated?

Setting REPLICA IDENTITY FULL causes FULL TABLE SCANS for UPDATES and DELETES.  REPLICA IDENTITY FULL should only be used on INSERT only or tables that see very few  updates and deletes.  REPLICA IDENTITY FULL will cause replaying of the WAL to be very slow if there are many updates/deletes for those tables.

REPLICA IDENTITY DEFAULT   will only use primary keys,  if the publisher  includes those tables, the subscriber when  replaying the WAL will stop throwing an error not knowing how to replay the UPDATE/DELETE.

Logical Replication is most likely broken at this point.

I suggest stopping logical replication and correcting tables that don't have qualifying indexes for logical replication by creating the necessary indexes and avoid using replica identity full.  Then restart logical replication from the beginning.


Re: Need help debugging slow logical replication

From
"sunyucong@gmail.com"
Date:
Hi Justin, thanks for the response!

> REPLICA IDENTITY DEFAULT   will only use primary keys,  if the publisher  includes those tables, the subscriber when
replayingthe WAL will stop throwing an error not knowing how to replay the UPDATE/DELETE.
 

But I don't see any errors being thrown out in the postgresql logs?
Should I be seeing it complain there? Is postgresql falling back to
replica identity full here?

However I checked that table, it doesn't seem to be making progress at
all: so I suspect you are right that it is the problem.

> Logical Replication is most likely broken at this point.
>
> I suggest stopping logical replication and correcting tables that don't have qualifying indexes for logical
replicationby creating the necessary indexes and avoid using replica identity full.  Then restart logical replication
fromthe beginning.
 
>
>



Re: Need help debugging slow logical replication

From
"sunyucong@gmail.com"
Date:
ccing the mailist again in case someone else would have a idea how to debug:

Here is what I see in the pg_locks on subscribe at all time:

As you can see, it mostly concern following 3 tables, accounts,
ledger, pending_ledger_fees, which I have tried analyze, vacuum them
etc, none of them helped much:

I do see in the log these tables are getting autovaccumed *very*
frequently, is that a problem for logical replication?

2023-02-08 04:59:47 UTC::@:[13626]:LOG:  automatic vacuum of table
"dev.public.accounts": index scans: 0
pages: 0 removed, 71437 remain, 0 skipped due to pins, 11465 skipped frozen
tuples: 0 removed, 3590987 remain, 3522495 are dead but not yet
removable, oldest xmin: 2893907681
index scan bypassed: 4 pages from table (0.01% of total) have 5 dead
item identifiers
I/O timings: read: 0.000 ms, write: 0.000 ms
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 120107 hits, 0 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 708 bytes
system usage: CPU: user: 0.23 s, system: 0.00 s, elapsed: 1.01 s
2023-02-08 04:59:47 UTC::@:[13626]:LOG:  automatic vacuum of table
"dev.public.pending_fee_ledger": index scans: 0
pages: 0 removed, 199818 remain, 0 skipped due to pins, 117769 skipped frozen
tuples: 0 removed, 3670095 remain, 3520000 are dead but not yet
removable, oldest xmin: 2893907681
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
I/O timings: read: 0.000 ms, write: 0.000 ms
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 164212 hits, 0 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.27 s, system: 0.00 s, elapsed: 0.27 s

Thanks

On Tue, Feb 7, 2023 at 6:10 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
>
> Hi Justin, thanks for the response!
>
> > REPLICA IDENTITY DEFAULT   will only use primary keys,  if the publisher  includes those tables, the subscriber
when replaying the WAL will stop throwing an error not knowing how to replay the UPDATE/DELETE.
 
>
> But I don't see any errors being thrown out in the postgresql logs?
> Should I be seeing it complain there? Is postgresql falling back to
> replica identity full here?
>
> However I checked that table, it doesn't seem to be making progress at
> all: so I suspect you are right that it is the problem.
>
> > Logical Replication is most likely broken at this point.
> >
> > I suggest stopping logical replication and correcting tables that don't have qualifying indexes for logical
replicationby creating the necessary indexes and avoid using replica identity full.  Then restart logical replication
fromthe beginning.
 
> >
> >

Attachment

Re: Need help debugging slow logical replication

From
"sunyucong@gmail.com"
Date:
OK: so I think I've got some new insight by using test_decoding to
peek into the changes being replicate, here is what i think is
happening:

- there is a background job generating a Huge transaction (10K
changes) one after another  that basically does DELETE on 1 table,
INSERT on 1 table and UPDATE on 1 table. Since this is a test
environment, this is the only TX being accumulated.
- since there is only 1 publication,  this  worker on primary
basically spend all its CPU decoding this large transaction then send
to subscriber to execute
   -  memory usage on logical decoding worker maybe a issue: as it
might have spilled to disk during decoding
   -  worker on subscriber basically spend all time applying this change

- I suspect  keep-alives are not sent during applying a transaction,
and the transaction apply time takes longer than wal_sender_timeout
(30seconds) , that's why we have to increase this number before.
- I think utilizing the new streaming feature for logical replication
in PG14 will improve performance here.
- I think the other thing that will help is to drop the index
temporarily on the subscriber to accelerate applying, but will need to
rebuild once close.

BTW: just to clarify Replica identity is not an issue for the table
that only receives INSERT, I tried to create a replica identity for
that table, and it didn't change anything recorded in WAL (as
expected), if anything it actually decreased performance since

What's puzzling for me is that why does logical apply being so slower
than physical apply (and also apparently physical streaming never
times out!)

On Tue, Feb 7, 2023 at 9:13 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
>
> ccing the mailist again in case someone else would have a idea how to debug:
>
> Here is what I see in the pg_locks on subscribe at all time:
>
> As you can see, it mostly concern following 3 tables, accounts,
> ledger, pending_ledger_fees, which I have tried analyze, vacuum them
> etc, none of them helped much:
>
> I do see in the log these tables are getting autovaccumed *very*
> frequently, is that a problem for logical replication?
>
> 2023-02-08 04:59:47 UTC::@:[13626]:LOG:  automatic vacuum of table
> "dev.public.accounts": index scans: 0
> pages: 0 removed, 71437 remain, 0 skipped due to pins, 11465 skipped frozen
> tuples: 0 removed, 3590987 remain, 3522495 are dead but not yet
> removable, oldest xmin: 2893907681
> index scan bypassed: 4 pages from table (0.01% of total) have 5 dead
> item identifiers
> I/O timings: read: 0.000 ms, write: 0.000 ms
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 120107 hits, 0 misses, 0 dirtied
> WAL usage: 1 records, 0 full page images, 708 bytes
> system usage: CPU: user: 0.23 s, system: 0.00 s, elapsed: 1.01 s
> 2023-02-08 04:59:47 UTC::@:[13626]:LOG:  automatic vacuum of table
> "dev.public.pending_fee_ledger": index scans: 0
> pages: 0 removed, 199818 remain, 0 skipped due to pins, 117769 skipped frozen
> tuples: 0 removed, 3670095 remain, 3520000 are dead but not yet
> removable, oldest xmin: 2893907681
> index scan not needed: 0 pages from table (0.00% of total) had 0 dead
> item identifiers removed
> I/O timings: read: 0.000 ms, write: 0.000 ms
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 164212 hits, 0 misses, 0 dirtied
> WAL usage: 0 records, 0 full page images, 0 bytes
> system usage: CPU: user: 0.27 s, system: 0.00 s, elapsed: 0.27 s
>
> Thanks
>
> On Tue, Feb 7, 2023 at 6:10 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
> >
> > Hi Justin, thanks for the response!
> >
> > > REPLICA IDENTITY DEFAULT   will only use primary keys,  if the publisher  includes those tables, the subscriber
when replaying the WAL will stop throwing an error not knowing how to replay the UPDATE/DELETE.
 
> >
> > But I don't see any errors being thrown out in the postgresql logs?
> > Should I be seeing it complain there? Is postgresql falling back to
> > replica identity full here?
> >
> > However I checked that table, it doesn't seem to be making progress at
> > all: so I suspect you are right that it is the problem.
> >
> > > Logical Replication is most likely broken at this point.
> > >
> > > I suggest stopping logical replication and correcting tables that don't have qualifying indexes for logical
replicationby creating the necessary indexes and avoid using replica identity full.  Then restart logical replication
fromthe beginning.
 
> > >
> > >