Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows - Mailing list pgsql-hackers

From Bryan Green
Subject Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Date
Msg-id 07d00685-26b1-461d-99bf-7e973236e875@gmail.com
Whole thread Raw
In response to Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows  (Bryan Green <dbryan.green@gmail.com>)
List pgsql-hackers
On 10/23/2025 11:02 PM, Bryan Green wrote:
> On 3/26/2025 3:18 AM, vignesh C wrote:
>> On Sat, 20 Jul 2024 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com> 
>>> wrote:
>>>> Please ignore the above 4 lines in my review. See my comments in blue.
>>>
>>> OK, so I think it's unclear what the next steps are for this patch.
>>>
>>> 1. On June 3rd, Michael Paquier said that Tom Lane proposed that,
>>> after doing what the patch currently does, we could simplify some
>>> other stuff. The details are unclear, and Tom hasn't commented.
>>>
>>> 2. On June 29th, Noah Misch proposed a platform-independent way of
>>> solving the problem.
>>>
>>> 3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start
>>> the postmaster instead of cmd.exe.
>>>
>>> 4. On July 16th, Yasir Shah said that he tested the patch and found
>>> that the problem only exists in v17, not any prior release, which is
>>> contrary to my understanding of the situation. He also proposed a
>>> minor tweak to the patch itself.
>>>
>>> So, as I see it, we have three possible ways forward here. First, we
>>> could stick with the current patch, possibly with further work as per
>>> [1] or adjustments as per [4]. Second, we could abandon the current
>>> approach and adopt Noah's proposal in [2]. Third, we could possibly
>>> abandon the current approach and adopt Sutou's proposal in [3]. I say
>>> "possibly" because I can't personally assess whether this approach is
>>> feasible.
>>
>> Thank you very much, Robert, for summarizing this. If anyone has
>> suggestions on which approach might work best, please share them to
>> help move this discussion forward.
>>
>> Regards,
>> Vignesh
>>
>>
> Hello everyone,
> 
> I've spent the last few days implementing Sutou-san's CreateProcess() 
> proposal as a proof-of-concept. With all three approaches now available 
> for comparison, I wanted to share my assessment and recommendation.
> 
> The core issue, as we've established, is that cmd.exe intermediation 
> prevents pg_ctl from getting the real postgres.exe PID, creating race 
> conditions where we can't distinguish a newly-started postmaster from a 
> pre-existing one. This causes pg_ctl start to incorrectly report success 
> when attempting to start an already-running cluster.
> 
> Horiguchi-san's process tree walking patch works and is ready to ship. 
> It has real merit as a minimal-change solution. However, it keeps 
> cmd.exe and adds process enumeration complexity to work around that 
> decision. We're fixing symptoms rather than the root cause.
> 
> Noah's token proposal is architecturally elegant and platform- 
> independent. My concern is that it solves a Windows-specific problem by 
> adding complexity to all platforms. Making other platforms adopt token- 
> passing infrastructure for Windows's problems feels wrong. It also 
> requires postmaster changes for what is fundamentally a pg_ctl issue, 
> raising compatibility questions.
> 
> The CreateProcess() approach eliminates cmd.exe entirely and gives us 
> the actual postgres.exe PID directly. I've implemented this in the 
> attached patch (about 250 lines of Windows-specific code). The 
> implementation uses CreateProcessAsUser() with a restricted security 
> token to handle the case where pg_ctl runs with elevated privileges—this 
> drops Administrator group membership and dangerous privileges before 
> launching the postmaster, matching the behavior of the existing 
> CreateRestrictedProcess() function.
> 
> For I/O redirection, we use Windows's native handle-based APIs. When a 
> log file is specified, we open it with CreateFile() and pass it through 
> STARTUPINFO. When there's no log file (interactive use and postgres -C 
> queries), we inherit standard handles. One detail worth noting: we wait 
> 2 seconds for no-log-file launches to distinguish postgres -C (which 
> exits immediately) from actual server startup. This is long enough to 
> catch quick exits even on loaded CI systems without impacting test suite 
> performance.
> 
> The implementation passes all regression tests on Windows, including CI 
> environments with elevated privileges. It handles all the problem 
> scenarios correctly: normal startup, detecting already-running clusters, 
> postgres -C queries, pg_upgrade with multiple instances, and concurrent 
> start attempts.
> 
> I prefer the CreateProcess() approach. It fixes the root cause, not the 
> symptoms. We get the real PID immediately with no process tree walking 
> or PID verification complexity. It's a Windows-specific solution to a 
> Windows-specific problem.
> 
> If the CreateProcess() approach is deemed unacceptable, I would support 
> Horiguchi-san's process tree walking patch as a pragmatic fallback. It 
> fixes the bug and is tested. However, I believe we'd be choosing a 
> workaround over a proper fix and adding maintenance burden we don't need.
> 
> I'm opposed to the token approach for the reasons stated above—it's 
> architecturally appealing but touches all platforms for a Windows-only 
> problem.
> 
> With this implementation complete, we now have all three options on the 
> table for evaluation. I believe the CreateProcess() approach is 
> technically sound and the right path forward, but I'm open to discussion 
> and happy to address any concerns about the implementation.
> 
> Regardless of which way we choose to go, I'll happily help review and 
> test the solution.
> 
> Best regards,
> BG

I realized the patch had an issue after sending.  Here is the updated 
patch. Again, this patch is just for an example of Sutou-san's suggestion.
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Avoid handle leak (src/bin/pg_ctl/pg_ctl.c)
Next
From: Michael Paquier
Date:
Subject: Re: Avoid resource leak (src/test/regress/pg_regress.c)