Re: ReplicationSlotRelease() crashes when the instance is in the single user mode - Mailing list pgsql-hackers
From | Paul A Jungwirth |
---|---|
Subject | Re: ReplicationSlotRelease() crashes when the instance is in the single user mode |
Date | |
Msg-id | CA+renyWi+bbD8M8CDAaOMGbUrXptKLCK8zQWEhXy00PJuJqNuQ@mail.gmail.com Whole thread Raw |
In response to | RE: ReplicationSlotRelease() crashes when the instance is in the single user mode ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
List | pgsql-hackers |
Mutaamba (cc'd) and I reviewed this patch together. To summarize the patch and thread so far: The patch adds a new function, CheckSlotIsInSingleUserMode. If true then we raise an error. Otherwise we would trip an assert in ReplicationSlotRelease requiring the slot to have an active_pid, which is never set in single-user mode. We do want to support pg_drop_replication_slot in single-user mode, but not other replication slot functions. In particular, advancing the slot may call non-core code, which seems risky in single-user mode. The patch does not apply. The attached v5 fixes it with a small update to the context for the slot.h hunk. The commit message needs some explanation. Why are we doing this? What does the patch do? What alternatives did we consider? The current error message seems reasonable. The patch has no docs. If we plan to forbid some functions in single-user mode, we should document which ones (e.g. in func.sgml). There are no tests. Do we have any tests at all for single-user mode? The only one I see is in recovery/t/017_shm.pl. What if we had tests that ran the regress suite in single-user mode? Basically instead of psql we run postgres --single? Do we expect a lot of it would fail? Is it small enough the test could maintain a diff that expects those failures? I'm not saying we should do that as part of this patch, but is there any interest in that? Since single-user mode is most often used when things are broken, it's a harsh place to hit a bug. The patch lacks source comments. Something on CheckSlotIsInSingleUserMode explaining why would be good. In ReplicationSlotRelease, why only assert that `slot->active_pid != 0`? Why not assert that it's MyProcPid (even outside single-user mode)? Can one backend really release a slot while another is using it? Can you drop it? Maybe you can copy it? Are we calling CheckSlotIsInSingleUserMode from everywhere that needs it? We tried to check all the functions that call ReplicationSlotCreate, ReplicationSlotRelease, and update_synced_slots_inactive_since (since they all have these asserts). To call out a few: The pg_replication_origin_* functions don't call the Assert-ing functions. binary_upgrade_logical_slot_has_caught_up - Not available from the command line. WalSndErrorCleanup - Probably not used in single-user mode? We also see that PostgresMain calls ReplicationSlotRelease from its error handler. Presumably single-user mode would be executing PostgresSingleUserMain instead, but still perhaps we should call CheckSlotIsInSingleUserMode here just as a sanity-check? On Thu, Feb 27, 2025 at 9:42 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > I understand that we may not have a clear use case for this to work in > > single-user mode. But how will we define the boundary in similar > > cases? I mean, we should have some rule for such exposed functions, > > and it should be followed uniformly. Now, if one needs a bigger or > > complex change to make the function work in single-user mode, then it > > is easy to take an exception from what is currently being followed in > > the code. However, if the change is as trivial as you proposed in the > > first email, why not go with that and make this function work in > > single-user mode? > > Hmm, the opinion about this is completely opposite with other reviewers. I want > to ask them again how you feel. I also added Tom who pointed out in the initial > thread. > > Question: how you feel to restrict SQL functions for slot during the single-user > mode? Nobody has considered use cases for it; we do not have concrete theory and > similar cases. And needed band-aid to work seems very small. No one has replied yet, but I vote for forbidding these functions. I can't articulate a full theory for which functions we restrict in single-user mode, and I think we should permit as much as possible. But any theory would weigh usefulness against risk. Here no one has found a use-case for several years, and executing user-supplied code in an unusual, higher-risk scenario seems like a good reason to be cautious. Also without tests for single-user mode, I'm extra wary. If single-user support is desired by someone, they could submit a patch. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
pgsql-hackers by date: