Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows |
| Date | |
| Msg-id | 20240111.173322.1809044112677090191.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows (Michael Paquier <michael@paquier.xyz>) |
| Responses |
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
|
| List | pgsql-hackers |
Thanks for restarting this thread.
At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier <michael@paquier.xyz> wrote in
> On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:
> > I'm not a Windows expert, but my guess is that 0001 is a very good
> > idea. I hope someone who is a Windows expert will comment on that.
>
> I am +1 on 0001. It is just something we've never anticipated when
> these wrappers around cmd in pg_ctl were written.
Thanks for committing it.
> > 0002 seems problematic to me. One potential issue is that it would
> > break if someone renamed postgres.exe to something else -- although
> > that's probably not really a serious problem.
>
> We do a find_other_exec_or_die() on "postgres" with what could be a
> custom execution path. So we're actually sure that the binary will be
> there in the start path, no? I don't like much the hardcoded
> dependency to .exe here.
The patch doesn't care of the path for postgres.exe. If you are referring to the code you cited below, it's for another
reason.I'll describe that there.
> > A bigger issue is that
> > it seems like it would break if someone used pg_ctl to start several
> > instances in different data directories on the same machine. If I'm
> > understanding correctly, that currently mostly works, and this would
> > break it.
>
> Not having the guarantee that a single shell_pid is associated to a
> single postgres.exe would be a problem. Now the patch includes this
> code:
> + if (ppe.th32ParentProcessID == shell_pid &&
> + strcmp("postgres.exe", ppe.szExeFile) == 0)
> + {
> + if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
> + multiple_children = true;
> + pm_pid = ppe.th32ProcessID;
> + }
>
> Which is basically giving this guarantee? multiple_children should
> never happen once the autorun part is removed. Is that right?
The patch indeed ensures the relationship between the parent
pg_ctl.exe and postgres.exe. However, for some reason, in my Windows
11 environment with the /D option specified, I observed that another
cmd.exe is spawned as the second child process of the parent
cmd.exe. This is why there is a need to verify the executable file
name. I have no idea how the second cmd.exe is being spawned.
> + * The launcher shell might start other cmd.exe instances or programs
> + * besides postgres.exe. Veryfying the program file name is essential.
>
> With the autorun part of cmd.exe removed, what's still relevant here?
Yes, if the strcmp() is commented out, multiple_children sometimes
becomes true..
> s/Veryfying/Verifying/.
Oops!
> Perhaps 0002 should make more efforts in documenting things like
> th32ProcessID and th32ParentProcessID.
Is it correct to understand that you are requesting changes as follows?
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid)
*
* Check for duplicate processes to ensure reliability.
*
- * The launcher shell might start other cmd.exe instances or programs
- * besides postgres.exe. Verifying the program file name is essential.
- *
- * The launcher shell process isn't checked in this function. It will be
- * checked by the caller.
+ * The ppe entry to be examined is identified by th32ParentProcessID, which
+ * should correspond to the cmd.exe process that executes the postgres.exe
+ * binary. Additionally, th32ProcessID in the same entry should be the PID
+ * of the launched postgres.exe. However, even though we have launched the
+ * parent cmd.exe with the /D option specified, it is sometimes observed
+ * that another cmd.exe is launched for unknown reasons. Therefore, it is
+ * crucial to verify the program file name to avoid returning the wrong
+ * PID.
*/
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment
pgsql-hackers by date: