Home > mailing lists

Async standby lag + physical slot + hot_standby_feedback=on appeared to degrade primary performance - Mailing list pgsql-performance

From	Priya V
Subject	Async standby lag + physical slot + hot_standby_feedback=on appeared to degrade primary performance
Date	April 10 01:39:06
Msg-id	CAFsZ43y2s3FE=RhDoTRKNVDRmdoRaL5X9CzpoZaT7E=XoLdyVg@mail.gmail.com Whole thread
List	pgsql-performance

Tree view

Hi all,

I’m looking for insight into a behavior we observed in a PostgreSQL physical replication setup.

Environment:

PostgreSQL version:15.14
DB size - 282 GB
Environment - AWS EC2
PR = primary
HA = synchronous standby
DP = asynchronous standby
DP used a physical replication slot
hot_standby_feedback = on on DP

Observed behavior:

DP fell behind PR by about 400 GB of replication lag
There were no user queries running on DP
During this period, query performance on PR degraded and application backlog built up on PR
After removing DP from replication, PR performance improved gradually over about 1 to 2 hours, not immediately

Why this is confusing:

DP was async, so this does not appear to be synchronous commit wait
There were no active queries on DP at the time we checked
The delayed recovery on PR makes me wonder whether cleanup on PR had been held back for some time, causing dead tuple accumulation / bloat / autovacuum backlog, and whether removing DP only allowed PR to recover gradually afterward

My questions:

In an async physical standby setup, can a lagging standby with a physical slot and hot_standby_feedback=on still hold back VACUUM cleanup on the primary even when no queries are currently running on the standby?
Can an old or stale slot xmin on the primary explain this kind of behavior?
Does the 1–2 hour gradual recovery after removing DP point more toward cleanup debt / dead tuple buildup / bloat on PR, WAL retention / storage pressure, or a combination of both?
What PR-side evidence would best confirm the root cause after the fact? For example:
- pg_stat_replication.backend_xmin
- pg_replication_slots.xmin
- pg_replication_slots.restart_lsn
- pg_stat_user_tables.n_dead_tup
- autovacuum activity on heavily updated tables

Any insights would be appreciated.

Thanks.

pgsql-performance by date:

From: Jeff Davis
Date: 08 April, 00:58:22
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

From: David Rowley
Date: 11 April, 05:09:45
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

Async standby lag + physical slot + hot_standby_feedback=on appeared to degrade primary performance - Mailing list pgsql-performance

Previous

Next