Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date | |
| Msg-id | CAD21AoAacO_wg7HOuOV8B_NbN+AGrXouQoVX6JUUZowdszKQcA@mail.gmail.com Whole thread Raw |
| In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
| List | pgsql-hackers |
On Mon, Dec 8, 2025 at 8:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 9, 2025 at 12:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Dec 8, 2025 at 3:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Dec 8, 2025 at 1:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Fri, Dec 5, 2025 at 4:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > > > Hmm, ISTM that the root cause is that we don't synchronously update > > > > > > the XLogLogicalInfo cache on the standby. A backend on the standby who > > > > > > started when logical decoding is disabled keeps holding > > > > > > XLogLogicalInfo = false until the promotion. So even if the startup > > > > > > process enables logical decoding when applying the STATUS_CHANGE > > > > > > record, logical decoding is enabled on the standby but the backend > > > > > > process would start logical decoding with XLogLogicalInfo = false, > > > > > > resulting in RelationIsAccessibleInLogicalDecoding() returns false. I > > > > > > missed the fact that we check XLogLogicalInfoActive() even read paths. > > > > > > Will fix it. > > > > > > > > > > > > > > > > I noticed that the XLogLogicalInfoXactCache doesn't work as expected > > > > > even on the primary; because the process keep using the > > > > > XLogLogicalInfoXactCache in a transaction, sending > > > > > PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO with a wait doesn't > > > > > actually ensure that all processes are writing logical-WAL records > > > > > before enabling logical decoding. XLogLogicalInfoXactCache could be > > > > > cached even read paths without XID assignment, for example when HOT > > > > > pruning happens while reading a table. So, if a process has been > > > > > executing only read queries while logical decoding is disabled, it has > > > > > XLogLogicalInfoXactCache=0 and can get an XID after logical decoding > > > > > is enabled, writing non-logical WAL records. Such read-only > > > > > transactions don't appear in xl_running_xacts so we cannot wait for > > > > > them to finish during logical decoding initialization. > > > > > > > > > > I think there are two possible solutions to resolve this issue: > > > > > > > > > > 1. Revert the XLogLogicalInfoXactCache idea. While we can simplify the > > > > > code, I'm concerned that it could be problematic if > > > > > XLogLogicalInfoActive() could return different results whenever it's > > > > > called. For instance, we should at least remove the assertion in > > > > > ExecuteTruncateGuts(), but we could face a similar issue in the > > > > > future. > > > > > > > > I'm concerned that even this idea could be problematic without waiting > > > > for all running transactions to finish before enabling logical > > > > decoding, depending on how we write WAL records. If > > > > XLogLogicalInfoActive() could return a different value within the same > > > > command, we could write a WAL record mixiting logical-information and > > > > non-logical information, because some functions use > > > > XLogLogicalInfoActive() several times before writing a one WAL record. > > > > Further, if there is a path where a transaction decides whether to > > > > include logical information based on XLogLogicalInfoActive() *before* > > > > getting an XID, the transaction might not appear in xl_running_xacts > > > > record, resulting in that logical decoding that started before the > > > > transaction decodes such WAL records in the CONSISTENT snapbuild > > > > state. > > > > > > > > > > Are you imagining a case with following steps that could lead to a > > > problem: (a) backend determine whether to include logical_info in WAL > > > via XLogLogicalInfoActive() and the answer is false, (b) assign > > > transaction_id, (c) got a signal to update XLogLogicalInfo to enable > > > logical_info and the backend updates it, (d) backend still writes WAL > > > without logical_info because it uses the information determined in > > > (a), (e) logical decoding won't wait for such a transaction before > > > reaching consistent state as by that time xid was not assigned, (f) > > > decoding results are unpredictable because the WAL doesn't have > > > logical_info? > > > > Exactly. In addition to such potential problems, such WAL records look > > incomplete and bogus, so the validation check could fail if we have in > > the future. > > > > IIUC a similar thing happens even if XLogLogicalInfoActive() returns a > > consistent result within the same transaction, if we don't wait for > > all transactions to finish; > > > > 1. a backend determines not to include logical-info in WAL as > > XLogLogicalInfoActive() returns false. > > 2. logical decoding is enabled and the global signal barrier is sent. > > 3. the backend doesn't update its cache as it's within the transaction > > (will be updated at the next transaction). > > 4. logical decoding started and won't wait for such a transaction > > before reaching a consistent state as the transaction doesn't have an > > XID yet. > > 5. the backend gets an XID and writes WAL records (without logical-info). > > 6. logical decoding decodes such WAL records. > > > > As far as I checked the source code, there does not seem to be such > > code paths but it might be introduced in the future. > > > > I think if we don't have transaction-level cache, then chances are > less that such a code path gets introduced even in the future. Why do you think the chances are less without the transaction-level cache? Did you mean to say "if we have the transaction-level cache"? > I think > it is not a good idea to rely on the old value of > XLogLogicalInfoActive(), especially after this patch, when its return > value can change dynamically. We can document/comment such a > precaution, and if in the future there really is a need or argument to > allow such cases, then we can introduce wait during slot_creation. > > Having said that, I see your point about introducing a wait having > merits w.r.t. making the code more future-proof, but it is also > possible that such a need never arises, and we unnecessarily let users > pay the price by waiting for all concurrent transactions to finish. > For example, won't we need to consider the possibility of walsender > timeout because of this wait, say for tablesync case where we create > the logical slot? Or similarly connection_timeout? And even if we are > okay based on the theory that the same can happen when we wait for the > snapshot to reach a consistent state during decoding, we need some > logic as we have in SnapBuildWaitSnapshot(). Right. Given that such coding practice is uncommon and inadvisable, it seems a reasonable implementation that we have the transaction-level cache and don't wait for read-only queries before enabling logical decoding. I've verified if XLogLogicalInfoActive() is called without XID assignment in a transaction state (and not a vacuum process), there doesn't seem to be. It would be great if we could have some assertions to detect such cases but I could not see a reasonable way. I've attached the updated patch that includes the following changes: - addressed review comments. - changed the transaction-level cache implementation to avoid caching the value multi-level (i.e., XLogLogicalInfo and XLogLogicalInfoXactCache). - fixed an assertion failure in a case where there are concurrent activations. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: