Re: System shutdown signal on Windows (was Re: [GENERAL]) - Mailing list pgsql-hackers
From | Krystian Bigaj |
---|---|
Subject | Re: System shutdown signal on Windows (was Re: [GENERAL]) |
Date | |
Msg-id | CAN=kAeGOEmfh_+vdg+2oD=9KhWzGn4NNqfZNdHoQ_2QOsHhuLQ@mail.gmail.com Whole thread Raw |
In response to | System shutdown signal on Windows (was Re: [GENERAL]) (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On 23 July 2014 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Krystian Bigaj <krystian.bigaj@gmail.com> writes:
> - when pg_console_handler receives CTRL_SHUTDOWN_EVENT from OS, then it
> calls pg_queue_signal(SIGINT).
> Problems:
> - when OS is in shutdown path, then it sends CTRL_SHUTDOWN_EVENT, and *all*
> Postgres processes (main and sub/forked) will call pg_queue_signal(SIGINT)
> - so main and sub processes will start to shutdown independently? Can this
> have any bad consequences?
Hm. We ought to have that sending SIGTERM instead, so as to mimic the
situation when Unix "init" is trying to shut down the system. It might be
that SIGINT will more or less work, but the postmaster logic is designed
to work with global SIGTERM as being the clean-up-ASAP trigger. As an
example, backends servicing remote applications (which will *not* have
already gotten killed) would not exit in response to SIGINT.
> I think that CTRL_SHUTDOWN_EVENT should be removed from pg_console_handler,
That does not sound like a good idea, at least not if Windows has the
same behavior as "init" does of proceeding to hard kills after some
grace period.
regards, tom lane
However form what I see is that when Postgress is running by pg_ctl from service, then it will receive SIGINT (independently and in general in unspecified order)
- *each* postgres.exe process will queue itself SIGINT (because of CTRL_SHUTDOWN_EVENT),
- pg_ctl will send SIGINT to main postmaster process (and possibly it will pass that command to sub-processes)
So there are two independent paths where SIGINT are sent, and pg_ctl doesn't have really a control when postgres.exe receives SIGINT.
This CTRL_SHUTDOWN_EVENT is not used when postgres.exe is run on *user session* - so removing it won't change anything.
I see only two cases where CTRL_SHUTDOWN_EVENT might be need (all of there where postgres.exe is run on service session):
- postgres.exe run by pg_ctl.exe, but pg_ctl service process was terminated/killed, and then system was shutdown
- someone starts postgres.exe from own service, but doesn't send SIGINT/SIGTERM command to postgres.exe on service system shutdown (but he must for service stop)
As I previously wrote, I have workaround for it, so if you think that this change would break compatibility and don't want to change it, then I'm really fine with it.
However I've probably found something with pg_ctl.c regarding shutdown and maybe that suspicious postgres.exe process termination on Windows.
1) shutdownEvent is signaled in pgwin32_ServiceMain by SERVICE_CONTROL_STOP/SERVICE_CONTROL_SHUTDOWN in pgwin32_ServiceHandler
There is dwWaitHint = 10000.
2)
...
/* Wait for quit... */
ret = WaitForMultipleObjects(2, shutdownHandles, FALSE, INFINITE);
pgwin32_SetServiceStatus(SERVICE_STOP_PENDING);
switch (ret)
{
case WAIT_OBJECT_0: /* shutdown event */
kill(postmasterPID, SIGINT);
/*
* Increment the checkpoint and try again Abort after 12
* checkpoints as the postmaster has probably hung
*/
while (WaitForSingleObject(postmasterProcess, 5000) == WAIT_TIMEOUT && status.dwCheckPoint < 12)
status.dwCheckPoint++; <---- missing call to pgwin32_SetServiceStatus(SERVICE_STOP_PENDING) or SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status);
break;
...
There is incremented dwCheckPoint every 5000ms, but that status is not updated (missing pgwin32_SetServiceStatus/SetServiceStatus), so SCM after 10s (dwWaitHint = 10000) will not receive incremented dwCheckPoint, and it's allowed to kill that process (because this service didn't respond with dwWaitHint). It kills pg_ctl.exe, but when all services stops, then it simply terminates other remaining processes - in this case postgres.exe
dwWaitHint:
...
If the amount of time specified by dwWaitHint passes, and dwCheckPoint has not been incremented or dwCurrentState has not changed, the service control manager or service control program can assume that an error has occurred and the service should be stopped.
...
So if main postgres.exe (run by pg_ctl.exe service process) won't shutdown in 10s (for any reason) it might be terminated/killed by Windows/SCM.
Best regards,
Krystian Bigaj
Krystian Bigaj
pgsql-hackers by date: