Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
| From | Matthias van de Meent |
|---|---|
| Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date | |
| Msg-id | CAEze2Wg9fWze9dA3GssLVP_TZNV0DqdNq2Td8XZ5XHJtqA1SDw@mail.gmail.com Whole thread Raw |
| In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
| List | pgsql-hackers |
On Thu, 8 Jan 2026 at 06:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > NB. I'm not opposed to changing wal_level in a running cluster, and I > > > do think that the current xact+checkpoint -based approach to selecting > > > the local effective_wal_level is fine, as well as standby picking up > > > the primary's current setting; it's the trigger condition for the > > > decision to change effective_wal_level that I have problems with. > > > > > > > Thank you for the comments. > > > > I understand the concern that users with the REPLICATION privilege can > > now effectively control wal_level, potentially increasing system-wide > > overhead. While the REPLICATION privilege already implies a high > > degree of trust as we allow it to take a basebackup and create a > > physical slot etc., I agree that this feature might elevate that power > > further, and we may need a mechanism to address this. > > > > If we allow taking the entire physical data via the REPLICATION > privilege, then the user must already be highly privileged. Such a > user is already allowed to read every byte of data in the database via > physical streaming. Now, such a user influencing wal_level to be > changed from 'replica' to 'logical' is of lesser harm. I don't think the harm of changing wal_level from 'replica' to 'logical' is decreased, because the harm is in the distributed performance impact, not the access to data. A physical replication slot does not (need to) impact the write performance of other backends if it's sufficiently partitioned from other workloads (not configured for syncrep, etc.), but wal_level=logical cannot be partitioned from write workloads as it adds a non-negotiable overhead to the write workloads of other backends, as they now needs to track more data (identity columns) and must write more WAL. > I agree that it > can lead to some non-malicious impact, like disk space (due to > increased WAL volume), and extra CPU consumption due to extra WAL > volume. But I think REPLICATION privilege can already lead to extra > CPU consumption due to wal_sender activity, and even disk space by not > letting the slot advance, which can even crash the system. > > Since these users already have the power to access all data and cause > a Denial of Service (DoS) via disk exhaustion, the ability to > "upgrade" WAL logging from replica to logical can be seen as an > incremental addition to an already highly trusted role. I think we can > update the documentation of the REPLICATION privilege. Replication slots that keep WAL from being recycled can be monitored (and therefore, likely acted on) before the relevant problem (OOD) occurs; which is not the case with the current effective_wal_level implementation. One moment your tps is normal, the next moment it drops because a role with REPLICATION added a logical slot, and you'll have to delete it and wait for a checkpoint to revert back to replica. The difference here is reaction time until it starts impacting transactions. > > > > To address your concerns, I have come up with the following ideas: > > > > I feel, If an administrator does not want to allow logical decoding, > they can set max_replication_slots to a value that only covers their > known physical replicas. So, they can still control the additional CPU > consumption if they are worried that it can cause harm. The other > possibility is to have a separate GUC for logical slots such as > max_logical_replication_slots. So, still, an administrator can keep > control. As mentioned by Ashutosh, it's not strange to configure max_replication_slots with some leeway; e.g. to allow for new permanent replicas to be added, or to make scheduled failovers less painful by being able to pre-provision the new secondary replica ahead of time. max_logical_replication_slots could as extension to this, but it feels like putting the cart before the horse: Instead of not allowing REPLICA users to effectuate change effective_wal_level, you now don't allow them to create replications slots, which then has the side effect of not triggering effective_wal_level=logical. I would personally prefer a wal_level=dynamic or such, which could be put between replica and logical. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
pgsql-hackers by date: