Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAA4eK1JVNbb-OT1PO=iOFG1KA__Q83n8cLZoDjF2yA1rZyvCnA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Wed, Sep 17, 2025 at 10:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Sep 17, 2025 at 4:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Sep 16, 2025 at 11:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Tue, Sep 16, 2025 at 1:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > When user is dropping a temporary slot, we should disable the > > > > decoding. The lazy behaviour should be for ERROR or session_exit > > > > cases. > > > > > > I think it might be worth discussing whether to use lazy behavior in > > > all cases. > > > > > > > Agreed. > > > > > There are several advantages: > > > > > > - It mitigates the risk of connection timeouts during a logical slot > > > drop or a subscription drop. > > > - In scenarios involving frequent creation and deletion of logical > > > slots (such as during initial data synchronization), it could > > > potentially avoid the issue of a frequent switch on and off. > > > > > > On the other hand, drawbacks are: > > > > > > - users would have to wait for effective_wal_level to get decreased to > > > 'replica' somehow. > > > - makes the checkpointer more busy in addition to its checkpointing job. > > > - it could take a longer time to disable logical decoding if the > > > checkpoint is busy with a checkpointing job. > > > > > > > This last point in drawback could hurt performance of systems for a > > longer time when that was really not required. It should be okay to > > use lazy behavior in all cases when we can do that in a predictable > > time. > > Agreed. > > If we use the lazy behavior in ERROR or session_exit cases, we would > have these drawbacks anyway. But assuming it won't happen frequently > in practice, we can live with that. > > > The other background process to consider doing lazy processing > > is the launcher whose role is to launch apply workers for subscription > > and maintain a conflict_slot (if required). Now, because disabling > > logical_info could also take longer time in worst cases, the > > launcher's own tasks can become unpredictable. Also, if tomorrow, we > > decide to support dynamically changing wal_level from minimal to some > > upper level, the launcher won't be the appropriate process. > > Right. Also, we don't launch the launcher process when > max_logical_replication_workers == 0. It should be >0 on the > subscriber but might not be on the publisher. > > > > > The other idea could be to have a new auxiliary process to disable > > logical_info lazily. It is arguable if we just have a separate process > > for this purpose but we have previously discussed some other tasks for > > such a process like removal of old_serialized_snapshots and > > old_logical_ rewrite_map files. See [1]. If we agree to have a > > separate process for this purpose then disabling logical_info in all > > cases sounds okay to me. > > Yeah, the custodian worker would be one solution. But please refer to > subsequent discussions[1][2]; > I think Tom's idea of spawning the worker on need basis has some use here, like, during drop_slot, we can launch the worker to complete this task and then exit to ameliorate the risk of connection_timeout for drop subscription cases. However, we can consider such ideas as an iterative improvements as well. there might not be other tasks to > delegate to the custodian worker than this logical decoding > deactivation, and it might be not optimal to have a single worker that > is responsible for all custodian works. Actually we've discussed a > similar idea on this thread and I drafted a patch[3] that utilizes > bgworkers to do internal tasks in the background in a > one-task-per-one-worker manner. > > It requires more discussion anyway if we want to go with this > direction. I think we can start with using lazy behavior in ERROR or > session_exit cases (assuming it won't happen frequently in practice), > and consider using lazy behavior other cases if it's really > preferable. > Fair enough. So, let's proceed with this plan (use lazy behavior in ERROR and session_exit cases) and see how it works. BTW, we also need to consider ERROR cases when the slot is dropped but we failed to disable the logical_info due to any random ERROR. -- With Regards, Amit Kapila.
pgsql-hackers by date: