Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows - Mailing list pgsql-hackers
| From | Bryan Green |
|---|---|
| Subject | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows |
| Date | |
| Msg-id | 07d00685-26b1-461d-99bf-7e973236e875@gmail.com Whole thread Raw |
| In response to | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows (Bryan Green <dbryan.green@gmail.com>) |
| List | pgsql-hackers |
On 10/23/2025 11:02 PM, Bryan Green wrote: > On 3/26/2025 3:18 AM, vignesh C wrote: >> On Sat, 20 Jul 2024 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com> >>> wrote: >>>> Please ignore the above 4 lines in my review. See my comments in blue. >>> >>> OK, so I think it's unclear what the next steps are for this patch. >>> >>> 1. On June 3rd, Michael Paquier said that Tom Lane proposed that, >>> after doing what the patch currently does, we could simplify some >>> other stuff. The details are unclear, and Tom hasn't commented. >>> >>> 2. On June 29th, Noah Misch proposed a platform-independent way of >>> solving the problem. >>> >>> 3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start >>> the postmaster instead of cmd.exe. >>> >>> 4. On July 16th, Yasir Shah said that he tested the patch and found >>> that the problem only exists in v17, not any prior release, which is >>> contrary to my understanding of the situation. He also proposed a >>> minor tweak to the patch itself. >>> >>> So, as I see it, we have three possible ways forward here. First, we >>> could stick with the current patch, possibly with further work as per >>> [1] or adjustments as per [4]. Second, we could abandon the current >>> approach and adopt Noah's proposal in [2]. Third, we could possibly >>> abandon the current approach and adopt Sutou's proposal in [3]. I say >>> "possibly" because I can't personally assess whether this approach is >>> feasible. >> >> Thank you very much, Robert, for summarizing this. If anyone has >> suggestions on which approach might work best, please share them to >> help move this discussion forward. >> >> Regards, >> Vignesh >> >> > Hello everyone, > > I've spent the last few days implementing Sutou-san's CreateProcess() > proposal as a proof-of-concept. With all three approaches now available > for comparison, I wanted to share my assessment and recommendation. > > The core issue, as we've established, is that cmd.exe intermediation > prevents pg_ctl from getting the real postgres.exe PID, creating race > conditions where we can't distinguish a newly-started postmaster from a > pre-existing one. This causes pg_ctl start to incorrectly report success > when attempting to start an already-running cluster. > > Horiguchi-san's process tree walking patch works and is ready to ship. > It has real merit as a minimal-change solution. However, it keeps > cmd.exe and adds process enumeration complexity to work around that > decision. We're fixing symptoms rather than the root cause. > > Noah's token proposal is architecturally elegant and platform- > independent. My concern is that it solves a Windows-specific problem by > adding complexity to all platforms. Making other platforms adopt token- > passing infrastructure for Windows's problems feels wrong. It also > requires postmaster changes for what is fundamentally a pg_ctl issue, > raising compatibility questions. > > The CreateProcess() approach eliminates cmd.exe entirely and gives us > the actual postgres.exe PID directly. I've implemented this in the > attached patch (about 250 lines of Windows-specific code). The > implementation uses CreateProcessAsUser() with a restricted security > token to handle the case where pg_ctl runs with elevated privileges—this > drops Administrator group membership and dangerous privileges before > launching the postmaster, matching the behavior of the existing > CreateRestrictedProcess() function. > > For I/O redirection, we use Windows's native handle-based APIs. When a > log file is specified, we open it with CreateFile() and pass it through > STARTUPINFO. When there's no log file (interactive use and postgres -C > queries), we inherit standard handles. One detail worth noting: we wait > 2 seconds for no-log-file launches to distinguish postgres -C (which > exits immediately) from actual server startup. This is long enough to > catch quick exits even on loaded CI systems without impacting test suite > performance. > > The implementation passes all regression tests on Windows, including CI > environments with elevated privileges. It handles all the problem > scenarios correctly: normal startup, detecting already-running clusters, > postgres -C queries, pg_upgrade with multiple instances, and concurrent > start attempts. > > I prefer the CreateProcess() approach. It fixes the root cause, not the > symptoms. We get the real PID immediately with no process tree walking > or PID verification complexity. It's a Windows-specific solution to a > Windows-specific problem. > > If the CreateProcess() approach is deemed unacceptable, I would support > Horiguchi-san's process tree walking patch as a pragmatic fallback. It > fixes the bug and is tested. However, I believe we'd be choosing a > workaround over a proper fix and adding maintenance burden we don't need. > > I'm opposed to the token approach for the reasons stated above—it's > architecturally appealing but touches all platforms for a Windows-only > problem. > > With this implementation complete, we now have all three options on the > table for evaluation. I believe the CreateProcess() approach is > technically sound and the right path forward, but I'm open to discussion > and happy to address any concerns about the implementation. > > Regardless of which way we choose to go, I'll happily help review and > test the solution. > > Best regards, > BG I realized the patch had an issue after sending. Here is the updated patch. Again, this patch is just for an example of Sutou-san's suggestion.
Attachment
pgsql-hackers by date: