Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows - Mailing list pgsql-hackers
| From | Bryan Green |
|---|---|
| Subject | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows |
| Date | |
| Msg-id | e4025275-0f97-4a3e-b107-a85e60ccf0f7@gmail.com Whole thread Raw |
| In response to | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows (vignesh C <vignesh21@gmail.com>) |
| Responses |
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
|
| List | pgsql-hackers |
On 3/26/2025 3:18 AM, vignesh C wrote: > On Sat, 20 Jul 2024 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com> wrote: >>> Please ignore the above 4 lines in my review. See my comments in blue. >> >> OK, so I think it's unclear what the next steps are for this patch. >> >> 1. On June 3rd, Michael Paquier said that Tom Lane proposed that, >> after doing what the patch currently does, we could simplify some >> other stuff. The details are unclear, and Tom hasn't commented. >> >> 2. On June 29th, Noah Misch proposed a platform-independent way of >> solving the problem. >> >> 3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start >> the postmaster instead of cmd.exe. >> >> 4. On July 16th, Yasir Shah said that he tested the patch and found >> that the problem only exists in v17, not any prior release, which is >> contrary to my understanding of the situation. He also proposed a >> minor tweak to the patch itself. >> >> So, as I see it, we have three possible ways forward here. First, we >> could stick with the current patch, possibly with further work as per >> [1] or adjustments as per [4]. Second, we could abandon the current >> approach and adopt Noah's proposal in [2]. Third, we could possibly >> abandon the current approach and adopt Sutou's proposal in [3]. I say >> "possibly" because I can't personally assess whether this approach is >> feasible. > > Thank you very much, Robert, for summarizing this. If anyone has > suggestions on which approach might work best, please share them to > help move this discussion forward. > > Regards, > Vignesh > > Hello everyone, I've spent the last few days implementing Sutou-san's CreateProcess() proposal as a proof-of-concept. With all three approaches now available for comparison, I wanted to share my assessment and recommendation. The core issue, as we've established, is that cmd.exe intermediation prevents pg_ctl from getting the real postgres.exe PID, creating race conditions where we can't distinguish a newly-started postmaster from a pre-existing one. This causes pg_ctl start to incorrectly report success when attempting to start an already-running cluster. Horiguchi-san's process tree walking patch works and is ready to ship. It has real merit as a minimal-change solution. However, it keeps cmd.exe and adds process enumeration complexity to work around that decision. We're fixing symptoms rather than the root cause. Noah's token proposal is architecturally elegant and platform-independent. My concern is that it solves a Windows-specific problem by adding complexity to all platforms. Making other platforms adopt token-passing infrastructure for Windows's problems feels wrong. It also requires postmaster changes for what is fundamentally a pg_ctl issue, raising compatibility questions. The CreateProcess() approach eliminates cmd.exe entirely and gives us the actual postgres.exe PID directly. I've implemented this in the attached patch (about 250 lines of Windows-specific code). The implementation uses CreateProcessAsUser() with a restricted security token to handle the case where pg_ctl runs with elevated privileges—this drops Administrator group membership and dangerous privileges before launching the postmaster, matching the behavior of the existing CreateRestrictedProcess() function. For I/O redirection, we use Windows's native handle-based APIs. When a log file is specified, we open it with CreateFile() and pass it through STARTUPINFO. When there's no log file (interactive use and postgres -C queries), we inherit standard handles. One detail worth noting: we wait 2 seconds for no-log-file launches to distinguish postgres -C (which exits immediately) from actual server startup. This is long enough to catch quick exits even on loaded CI systems without impacting test suite performance. The implementation passes all regression tests on Windows, including CI environments with elevated privileges. It handles all the problem scenarios correctly: normal startup, detecting already-running clusters, postgres -C queries, pg_upgrade with multiple instances, and concurrent start attempts. I prefer the CreateProcess() approach. It fixes the root cause, not the symptoms. We get the real PID immediately with no process tree walking or PID verification complexity. It's a Windows-specific solution to a Windows-specific problem. If the CreateProcess() approach is deemed unacceptable, I would support Horiguchi-san's process tree walking patch as a pragmatic fallback. It fixes the bug and is tested. However, I believe we'd be choosing a workaround over a proper fix and adding maintenance burden we don't need. I'm opposed to the token approach for the reasons stated above—it's architecturally appealing but touches all platforms for a Windows-only problem. With this implementation complete, we now have all three options on the table for evaluation. I believe the CreateProcess() approach is technically sound and the right path forward, but I'm open to discussion and happy to address any concerns about the implementation. Regardless of which way we choose to go, I'll happily help review and test the solution. Best regards, BG
Attachment
pgsql-hackers by date: