Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot - Mailing list pgsql-hackers
| From | Bharath Rupireddy |
|---|---|
| Subject | Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot |
| Date | |
| Msg-id | CALj2ACVc3iYLOkC36VJwoXyVZmGcb0WEMKoc478q+xdRG+2BtA@mail.gmail.com Whole thread Raw |
| In response to | Add logical_decoding_spill_limit to cap spill file disk usage per slot (shawn wang <shawn.wang.pg@gmail.com>) |
| Responses |
Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot
|
| List | pgsql-hackers |
Hi, On Mon, Mar 23, 2026 at 6:20 AM shawn wang <shawn.wang.pg@gmail.com> wrote: > > Hi hackers, Thank you for proposing this new feature. > == Motivation == > > We operate a fleet of PostgreSQL instances with logical replication. On several occasions, we have experienced productionincidents where logical decoding spill files (pg_replslot/<slot>/xid-*.spill) grew uncontrollably — consuming tensof gigabytes and eventually filling up the data disk. This caused the entire instance to go read-only, impacting notjust replication but all write workloads. > > The typical scenario is a large transaction (e.g. bulk data load or a long-running DDL) combined with a subscriber thatis either slow or temporarily disconnected. The reorder buffer exceeds logical_decoding_work_mem and starts spilling,but there is no upper bound on how much can be spilled. The only backstop today is the OS returning ENOSPC, at whichpoint the damage is already done. Having a lot of spill files also increases crash/recovery times. However, files spilling to disk causing no-space-left-on-disk issues leading to downtime applies to WAL files, historical catalog snapshot files, subtransaction overflow files, CLOG (and all the subsystems backed by SLRU data structure), etc. - basically any Postgres subsystem writing files to disk. I'm a bit worried that we may end up solving disk space issues, which IMHO are outside of the database scope, in the database. Others may have different opinions though. How common is this issue? Could you please add a test case to the proposed patch that without this feature would otherwise hit the issue described? Having said that, were alternatives like disabling subscriptions when seen occupying the disk space considered? > We looked for existing protections: > > max_slot_wal_keep_size: limits WAL retention, but does not affect spill files at all. > logical_decoding_work_mem: controls *when* spilling starts, but not *how much* can be spilled. > There is no existing GUC, patch, or commitfest entry that addresses spill file disk quota. Interesting! > The "Report reorder buffer size" patch (CF #6053, by Ashutosh Bapat) improves observability of reorder buffer state, whichis complementary — but observability alone cannot prevent disk-full incidents. With the proposed reorder buffer stats above, would it be possible to have a monitoring solution (an extension or a tool) to disable subscriptions and notify the admin? Would something like this work? > == Proposed solution == > > The attached patch adds a new GUC: > logical_decoding_spill_limit (integer, unit kB, default 0) > > When set to a positive value, it limits the total size of on-disk spill files per replication slot. Key design points: > > Tracking: We add two new fields: - ReorderBuffer.spillBytesOnDisk — current total on-disk spill size for this slot (unlikespillBytes which is a cumulative statistic counter, this is a live gauge). - ReorderBufferTXN.serialized_size — per-transactionon-disk size, so we can accurately decrement the global counter during cleanup. > Increment: In ReorderBufferSerializeChange(), after a successful write(), both counters are incremented by the size written. > Decrement: In ReorderBufferRestoreCleanup(), when spill files are unlinked, the global counter is decremented by the transaction'sserialized_size. > Enforcement: In ReorderBufferCheckMemoryLimit(), before calling ReorderBufferSerializeTXN(), we check: if (spillBytesOnDisk+ txn->size > spill_limit) ereport(ERROR, ...) This is only checked on the spill-to-disk path — not on thestreaming path (which involves no disk I/O). > Behavior on limit exceeded: An ERROR is raised with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED. The walsender exits, but theslot's restart_lsn and confirmed_flush are preserved. The subscriber can reconnect after the DBA: > > increases logical_decoding_spill_limit, or > increases logical_decoding_work_mem (to reduce spilling), or > switches to a streaming-capable output plugin (which avoids spilling entirely). When the logical_decoding_spill_limit is exceeded, ERRORing out in the walsender is even more problematic, right? The replication slot would be inactive, causing bloat and preventing tuple freezing, WAL files growth and eventually the system may hit disk-space issues - it is like "we avoided disk space issues for one subsystem, but introduced it for another". This looks a bit problematic IMHO. Others may have different opinions though. -- Bharath Rupireddy Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: