Thread: [bug fix] Produce a crash dump before main() on Windows
Hello, postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called. The attached patch fixes this. I'd like this to be back-patched. I'll add this to the next CF. The original problem happened on our customer's production system. Their application sometimes failed to connect to thedatabase. That was because postgres.exe crashed due to access violation (exception code C0000005). But there was nocrash dump, so we had difficulty in finding the cause. The frequency was low -- about ten times during half a year. What caused the access violation was Symantec's antivirus software. It seems that sysfer.dll of the software interceptsregistry access, during C runtime library initialization, before main() is called. So, the direct cause of thisproblem is not PostgreSQL. On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult. The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting). This error mode is inherited from postmaster to its children. If a crash happens before the child sets up theexception handler, no crash dump is produced. Regards Takayuki Tsunakawa
Attachment
On Fri, Feb 16, 2018 at 8:28 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hello,
postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called. The attached patch fixes this. I'd like this to be back-patched. I'll add this to the next CF.
The original problem happened on our customer's production system. Their application sometimes failed to connect to the database. That was because postgres.exe crashed due to access violation (exception code C0000005). But there was no crash dump, so we had difficulty in finding the cause. The frequency was low -- about ten times during half a year.
What caused the access violation was Symantec's antivirus software. It seems that sysfer.dll of the software intercepts registry access, during C runtime library initialization, before main() is called. So, the direct cause of this problem is not PostgreSQL.
On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult. The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting). This error mode is inherited from postmaster to its children. If a crash happens before the child sets up the exception handler, no crash dump is produced.
The original call to SetErrorMode() was put in there to make sure we didn't show a popup message which would then make everything freeze (see very old commit 27bff7502f04ee01237ed3f5a997748ae43d3a81). Doesn't this turn that back on, so that if you are not actually there to monitor something you can end up with stuck processes and exactly the issues we had before that one?
It might still be useful in debugging, but in that case we might need to make it a configurable switch?
On 20 February 2018 at 21:47, Magnus Hagander <magnus@hagander.net> wrote:
-- On Fri, Feb 16, 2018 at 8:28 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: Hello,
postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called. The attached patch fixes this. I'd like this to be back-patched. I'll add this to the next CF.
The original problem happened on our customer's production system. Their application sometimes failed to connect to the database. That was because postgres.exe crashed due to access violation (exception code C0000005). But there was no crash dump, so we had difficulty in finding the cause. The frequency was low -- about ten times during half a year.
What caused the access violation was Symantec's antivirus software. It seems that sysfer.dll of the software intercepts registry access, during C runtime library initialization, before main() is called. So, the direct cause of this problem is not PostgreSQL.
On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult. The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting). This error mode is inherited from postmaster to its children. If a crash happens before the child sets up the exception handler, no crash dump is produced.The original call to SetErrorMode() was put in there to make sure we didn't show a popup message which would then make everything freeze (see very old commit 27bff7502f04ee01237ed3f5a997748ae43d3a81). Doesn't this turn that back on, so that if you are not actually there to monitor something you can end up with stuck processes and exactly the issues we had before that one?
Ha, I just went digging for the same.
We should not disable WER when running as a service (no UI access), it will not display an interactive dialog.
I'm not convinced we should disable it at all personally. Things have come a long way from drwatson.exe . Disabling WER makes it hard to debug postgres by installing Visual Studio Debugger as the hander (I always wondered why that didn't work!) and is generally just painful. It prevents us from collecting data via Microsoft about crashes, should we wish to do so. And who runs Pg on windows except as a service?!
So I'm all for just removing that.
On 20 February 2018 at 22:18, Craig Ringer <craig@2ndquadrant.com> wrote:
So I'm all for just removing that.
On Tue, Feb 20, 2018 at 3:18 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 20 February 2018 at 21:47, Magnus Hagander <magnus@hagander.net> wrote:On Fri, Feb 16, 2018 at 8:28 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: Hello,
postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called. The attached patch fixes this. I'd like this to be back-patched. I'll add this to the next CF.
The original problem happened on our customer's production system. Their application sometimes failed to connect to the database. That was because postgres.exe crashed due to access violation (exception code C0000005). But there was no crash dump, so we had difficulty in finding the cause. The frequency was low -- about ten times during half a year.
What caused the access violation was Symantec's antivirus software. It seems that sysfer.dll of the software intercepts registry access, during C runtime library initialization, before main() is called. So, the direct cause of this problem is not PostgreSQL.
On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult. The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting). This error mode is inherited from postmaster to its children. If a crash happens before the child sets up the exception handler, no crash dump is produced.The original call to SetErrorMode() was put in there to make sure we didn't show a popup message which would then make everything freeze (see very old commit 27bff7502f04ee01237ed3f5a997748ae43d3a81). Doesn't this turn that back on, so that if you are not actually there to monitor something you can end up with stuck processes and exactly the issues we had before that one? Ha, I just went digging for the same.We should not disable WER when running as a service (no UI access), it will not display an interactive dialog.I'm not convinced we should disable it at all personally. Things have come a long way from drwatson.exe . Disabling WER makes it hard to debug postgres by installing Visual Studio Debugger as the hander (I always wondered why that didn't work!) and is generally just painful. It prevents us from collecting data via Microsoft about crashes, should we wish to do so. And who runs Pg on windows except as a service?!
I've seen a number of usecases where apps start it alongside the app instead of as a service. I'm not sure how recent those apps are though, and I'm not sure it's better than using a service in the first place (but it does let you install things without being an admin).
We really shouldn't *break* that scenario for people. But making it work well for the service usecase should definitely be the priority.
On 20 February 2018 at 23:43, Magnus Hagander <magnus@hagander.net> wrote:
I've seen a number of usecases where apps start it alongside the app instead of as a service. I'm not sure how recent those apps are though, and I'm not sure it's better than using a service in the first place (but it does let you install things without being an admin).We really shouldn't *break* that scenario for people. But making it work well for the service usecase should definitely be the priority.
Good point and agreed.
The patch proposed here means that early crashes will invoke WER. If we're going to allow WER we should probably just do so unconditionally.
I suggest changing this to a command line flag or environment variable test that suppresses Pg's default disabling of WER. A GUC probably doesn't make sense; it's too niche, and too early.
I'd be in favour of leaving WER on when we find out we're in a noninteractive service too, but that'd be a separate patch for pg11+ only.
From: Craig Ringer [mailto:craig@2ndquadrant.com] > The patch proposed here means that early crashes will invoke WER. If we're > going to allow WER we should probably just do so unconditionally. > > I'd be in favour of leaving WER on when we find out we're in a noninteractive > service too, but that'd be a separate patch for pg11+ only. As for PG11+, I agree that we want to always leave WER on. That is, call SetErrorMode(SEM_FAILCRITICALERRORS) but not specifySEM_NOGPFAULTERRORBOX. The problem with the current specification of PostgreSQL is that the user can only get crashdumps in a fixed folder $PGDATA\crashdumps. That location is bad because the crash dumps will be backed up togetherwith the database cluster without the user noticing it. What's worse, the crash dumps are large. With WER, theuser can control the location and size of crash dumps. > I suggest changing this to a command line flag or environment variable test > that suppresses Pg's default disabling of WER. A GUC probably doesn't make > sense; it's too niche, and too early. As for the past major releases, it's burdensome for the user to have to specify a new flag or an environment variable whenhe has to get crash dumps to investigate a rare crash before main(). It's not necessary to disable WER for crashes aftermain() is called, because PG installs an exception handler at the beginning of main(), which works fine. So can wego with the current patch? Another idea to add to the current patch is to move the call to SetErrorMode() to the below function, which is called firstin main(). How about this? void pgwin32_install_crashdump_handler(void) { SetUnhandledExceptionFilter(crashDumpHandler); } Regards Takayuki Tsunakawa
From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com] > Another idea to add to the current patch is to move the call to SetErrorMode() > to the below function, which is called first in main(). How about this? > > void > pgwin32_install_crashdump_handler(void) > { > SetUnhandledExceptionFilter(crashDumpHandler); > } I moved SetErrorMode() to the beginning of main(). It should be placed before any code which could crash. The current locationis a bit late: in fact, write_stderr() crashed when WSAStartup() failed. Regards Takayuki Tsunakawa
Attachment
On Thu, Mar 1, 2018 at 4:14 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> Another idea to add to the current patch is to move the call to SetErrorMode()
> to the below function, which is called first in main(). How about this?
>
> void
> pgwin32_install_crashdump_handler(void)
> {
> SetUnhandledExceptionFilter(crashDumpHandler);
> }
I moved SetErrorMode() to the beginning of main(). It should be placed before any code which could crash. The current location is a bit late: in fact, write_stderr() crashed when WSAStartup() failed.
I reviewed patch and it works as per the subject, but I am not able to verify the actual
bug that is reported in the upthread. The moving of setErrorMode() call to the start
of the main function can handle all the cases that can lead to a crash with no popup.
The reset of setErrorMode(0) before start of any process can generate the coredump
because of any issue before it reaches the main() function, but this change can lead
to stop the postmaster restart process, if no one present to observe the scenario?
Doesn't this change can generate backward compatibility problems to particular users?
I don't have any other comments with the current patch.
Regards,
Haribabu Kommi
Fujitsu Australia
On 26 February 2018 at 12:06, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
-- From: Craig Ringer [mailto:craig@2ndquadrant.com]
> The patch proposed here means that early crashes will invoke WER. If we're
> going to allow WER we should probably just do so unconditionally.
>
> I'd be in favour of leaving WER on when we find out we're in a noninteractive
> service too, but that'd be a separate patch for pg11+ only.
As for PG11+, I agree that we want to always leave WER on. That is, call SetErrorMode(SEM_FAILCRITICALERRORS) but not specify SEM_NOGPFAULTERRORBOX. The problem with the current specification of PostgreSQL is that the user can only get crash dumps in a fixed folder $PGDATA\crashdumps. That location is bad because the crash dumps will be backed up together with the database cluster without the user noticing it. What's worse, the crash dumps are large. With WER, the user can control the location and size of crash dumps.
Yeah, that's quite old and dates back to when Windows didn't offer much if any control over WER in services.
At Wed, 18 Jul 2018 11:12:06 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YHv0KfWhA+Z=UVydpvLQ-QyLaidBqpHxQ=YqTPiDGG6dg@mail.gmail.com> > On 26 February 2018 at 12:06, Tsunakawa, Takayuki < > tsunakawa.takay@jp.fujitsu.com> wrote: > > > From: Craig Ringer [mailto:craig@2ndquadrant.com] > > > The patch proposed here means that early crashes will invoke WER. If > > we're > > > going to allow WER we should probably just do so unconditionally. > > > > > > I'd be in favour of leaving WER on when we find out we're in a > > noninteractive > > > service too, but that'd be a separate patch for pg11+ only. > > > > As for PG11+, I agree that we want to always leave WER on. That is, call > > SetErrorMode(SEM_FAILCRITICALERRORS) but not specify > > SEM_NOGPFAULTERRORBOX. The problem with the current specification of > > PostgreSQL is that the user can only get crash dumps in a fixed folder > > $PGDATA\crashdumps. That location is bad because the crash dumps will be > > backed up together with the database cluster without the user noticing it. > > What's worse, the crash dumps are large. With WER, the user can control > > the location and size of crash dumps. > > > > Yeah, that's quite old and dates back to when Windows didn't offer much if > any control over WER in services. Yeah. If we want to take a crash dump, we cannot have auto-restart. Since it is inevitable what we can do for this would be adding a new knob for that, which cannot be turned on together with restart_after_crash...? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com] > I reviewed patch and it works as per the subject, but I am not able to verify > the actual > bug that is reported in the upthread. The moving of setErrorMode() call > to the start > of the main function can handle all the cases that can lead to a crash with > no popup. Thank you for reviewing the patch. Yeah, I don't know reproduce the problem too, because some bug of Symantec AntiVirusrandomly caused the crash before main()... > The reset of setErrorMode(0) before start of any process can generate the > coredump > because of any issue before it reaches the main() function, but this change > can lead > to stop the postmaster restart process, if no one present to observe the > scenario? > Doesn't this change can generate backward compatibility problems to > particular users? Sorry, I couldn't get it. Could you elaborate on it? I couldn't understand how this change has an adverse effect on thepostmaster restart process (restart_after_crash = on) and backward compatibility. Regards Takayuki Tsunakawa
On Wed, Jul 18, 2018 at 06:09:57AM +0000, Tsunakawa, Takayuki wrote: > From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com] >> I reviewed patch and it works as per the subject, but I am not able to verify >> the actual >> bug that is reported in the upthread. The moving of setErrorMode() call >> to the start >> of the main function can handle all the cases that can lead to a crash with >> no popup. > > Thank you for reviewing the patch. Yeah, I don't know reproduce the > problem too, because some bug of Symantec AntiVirus randomly caused > the crash before main()... A suggestion that I have here would be to patch manually the postmaster with any kind of code which would make it to crash before main() is called. Anything is fine as long as a core dump is produced, right? -- Michael
Attachment
On 18 July 2018 at 14:34, Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 18, 2018 at 06:09:57AM +0000, Tsunakawa, Takayuki wrote:
> From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
>> I reviewed patch and it works as per the subject, but I am not able to verify
>> the actual
>> bug that is reported in the upthread. The moving of setErrorMode() call
>> to the start
>> of the main function can handle all the cases that can lead to a crash with
>> no popup.
>
> Thank you for reviewing the patch. Yeah, I don't know reproduce the
> problem too, because some bug of Symantec AntiVirus randomly caused
> the crash before main()...
A suggestion that I have here would be to patch manually the postmaster
with any kind of code which would make it to crash before main() is
called. Anything is fine as long as a core dump is produced, right?
When I was doing the original crashdump support I wrote a crashme extension that deliberately dereferenced a null pointer.
You'd do something similar in the postmaster for the testing.
On Wed, Jul 18, 2018 at 4:10 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
> I reviewed patch and it works as per the subject, but I am not able to verify
> the actual
> bug that is reported in the upthread. The moving of setErrorMode() call
> to the start
> of the main function can handle all the cases that can lead to a crash with
> no popup.
Thank you for reviewing the patch. Yeah, I don't know reproduce the problem too, because some bug of Symantec AntiVirus randomly caused the crash before main()...
> The reset of setErrorMode(0) before start of any process can generate the
> coredump
> because of any issue before it reaches the main() function, but this change
> can lead
> to stop the postmaster restart process, if no one present to observe the
> scenario?
> Doesn't this change can generate backward compatibility problems to
> particular users?
Sorry, I couldn't get it. Could you elaborate on it? I couldn't understand how this change has an adverse effect on the postmaster restart process (restart_after_crash = on) and backward compatibility.
Sorry about not providing more details. It is just my doubt, because
I am not able to reproduce the problem to test it after the fix. May be
I can give a try by modifying the source code to get the crash.
My point is, With this patch, in case if the postgres crashses
before reaching main(), does it generate a popup?
If the answer to the above question is yes, then if the popup is
not attended by the DBA, I feel the postmaster is not able to
restart all the processes because it is waiting for the crash process
to close. Am I Correct?
Regards,
Haribabu Kommi
Fujitsu Australia
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com] > May be I can give a try by modifying the source code to get the crash. Thank you, that would be great if you could come up with a good way! > My point is, With this patch, in case if the postgres crashses > before reaching main(), does it generate a popup? > > If the answer to the above question is yes, then if the popup is > not attended by the DBA, I feel the postmaster is not able to > restart all the processes because it is waiting for the crash process > to close. Am I Correct? IIRC, the pop up doesn't appear under Windows service. If you start the database server with pg_ctl start on the commandprompt, the pop up will appear, which I think is not bad. Regards Takayuki Tsunakawa
On Wed, Jul 18, 2018 at 6:39 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
> May be I can give a try by modifying the source code to get the crash.
Thank you, that would be great if you could come up with a good way!
+ if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
+ {
+ char *ptr = NULL;
+
+ printf ("%s",*ptr);
+ }
+
I added the above code as first execution code in main() function and I am able to
reproduce the WER popup.
> My point is, With this patch, in case if the postgres crashses
> before reaching main(), does it generate a popup?
>
> If the answer to the above question is yes, then if the popup is
> not attended by the DBA, I feel the postmaster is not able to
> restart all the processes because it is waiting for the crash process
> to close. Am I Correct?
IIRC, the pop up doesn't appear under Windows service. If you start the database server with pg_ctl start on the command prompt, the pop up will appear, which I think is not bad.
I also think that scenarios of starting the server from command prompt may be less, so this patch can be useful
to find out the issue by the starting the server from command prompt.
I marked the patch as "ready for committer".
Regards,
Haribabu Kommi
Fujitsu Australia
On Wed, Jul 18, 2018 at 4:38 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > IIRC, the pop up doesn't appear under Windows service. If you start the database server with pg_ctl start on the commandprompt, the pop up will appear, which I think is not bad. Hmm. I think that might be bad. What makes you think it isn't? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
From: Robert Haas [mailto:robertmhaas@gmail.com] > On Wed, Jul 18, 2018 at 4:38 AM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: > > IIRC, the pop up doesn't appear under Windows service. If you start the > database server with pg_ctl start on the command prompt, the pop up will > appear, which I think is not bad. > > Hmm. I think that might be bad. What makes you think it isn't? First, as Hari-san said, starting the server with "pg_ctl start" on the command prompt does not feel like production usebut like test environment, and that usage seems mostly interactive. Second, the current PostgreSQL is not exempt fromthe same problem: if postmaster or standalone backend crashes before main(), the pop up would appear. Regards Takayuki Tsunakawa
On Mon, Feb 26, 2018 at 04:06:48AM +0000, Tsunakawa, Takayuki wrote: > As for PG11+, I agree that we want to always leave WER on. That is, > call SetErrorMode(SEM_FAILCRITICALERRORS) but not specify > SEM_NOGPFAULTERRORBOX. The problem with the current specification of > PostgreSQL is that the user can only get crash dumps in a fixed folder > $PGDATA\crashdumps. That location is bad because the crash dumps will > be backed up together with the database cluster without the user > noticing it. What's worse, the crash dumps are large. With WER, the > user can control the location and size of crash dumps. I have not looked by myself at the problems around WER and such so I don't have a clear opinion, but including crash dumps within backups is *bad*. We have a set of filtering rules in basebackup.c and pg_rewind since v11, we could make use of it and remove all contents from crashdumps/ from what gets backed up or rewound. excludeDirContents creates empty folders when those are listed, so we'd want to not create an empty folder in Windows backups, which makes a bit more more difficult. -- Michael
Attachment
On Mon, Jul 23, 2018 at 01:31:57AM +0000, Tsunakawa, Takayuki wrote: > First, as Hari-san said, starting the server with "pg_ctl start" on > the command prompt does not feel like production use but like test > environment, and that usage seems mostly interactive. Second, the > current PostgreSQL is not exempt from the same problem: if postmaster > or standalone backend crashes before main(), the pop up would appear. When you use pg_ctl start within the command prompt, then closing the prompt session also causes Postgres to stop immediately. Would it be a problem? Another question I have is that I have seen Postgres fail to start because of missing dependent libraries, which happened after the postmaster startup as this was reported in the logs, could it be a problem with this patch? If the system waits for an operator to close this pop-up window, then it becomes harder to check automatically such failures and relies on the test application to timeout, if it is wise enough to do, which is not always the case.. -- Michael
Attachment
From: Michael Paquier [mailto:michael@paquier.xyz] > When you use pg_ctl start within the command prompt, then closing the prompt > session also causes Postgres to stop immediately. Would it be a problem? No. But that makes me think more of the startup on command prompt as non-production usage. > Another question I have is that I have seen Postgres fail to start because > of missing dependent libraries, which happened after the postmaster startup > as this was reported in the logs, could it be a problem with this patch? > If the system waits for an operator to close this pop-up window, then it > becomes harder to check automatically such failures and relies on the test > application to timeout, if it is wise enough to do, which is not always > the case.. I guess that is due to some missing files related to the libraries listed in shared_preload_library. If so, no, becausethis patch relates to failure before main(). Regards Takayuki Tsunakawa
On Mon, Jul 23, 2018 at 08:16:52AM +0000, Tsunakawa, Takayuki wrote: > I guess that is due to some missing files related to the libraries > listed in shared_preload_library. If so, no, because this patch > relates to failure before main(). No, I really mean a library dependency failure. For example, imagine that Postgres is compiled on Windows dynamically, and that it depends on libxml2.dll, which is itself compiled dynamically. Then imagine, in a custom build echosystem, that a folk comes in and adds lz support to libxml2 on Windows. If Postgres still consumes libxml2 but does not add in its PATH a version of lz, then a backend in need of libxml2 would fail to load, causing Postgres to not start properly. True, painful, story. What is ticking me is if the popup window stuff discussed on this thread could be a problem in the detection of such dependency errors, as with the product I am thinking about, Postgres is not running as a service, but kicked by another thing which is a service, and monitors the postmaster. -- Michael
Attachment
On Mon, Jul 23, 2018 at 12:56 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Jul 23, 2018 at 01:31:57AM +0000, Tsunakawa, Takayuki wrote: >> First, as Hari-san said, starting the server with "pg_ctl start" on >> the command prompt does not feel like production use but like test >> environment, and that usage seems mostly interactive. Second, the >> current PostgreSQL is not exempt from the same problem: if postmaster >> or standalone backend crashes before main(), the pop up would appear. > > When you use pg_ctl start within the command prompt, then closing the > prompt session also causes Postgres to stop immediately. Would it be a > problem? > > Another question I have is that I have seen Postgres fail to start > because of missing dependent libraries, which happened after the > postmaster startup as this was reported in the logs, could it be a > problem with this patch? > It doesn't appear so, because we have SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX); in dlopen. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
From: Michael Paquier [mailto:michael@paquier.xyz] > No, I really mean a library dependency failure. For example, imagine that > Postgres is compiled on Windows dynamically, and that it depends on > libxml2.dll, which is itself compiled dynamically. Then imagine, in a > custom build echosystem, that a folk comes in and adds lz support to > libxml2 on Windows. If Postgres still consumes libxml2 but does not add > in its PATH a version of lz, then a backend in need of libxml2 would fail > to load, causing Postgres to not start properly. True, painful, story. I see, that's surely painful. But the DLL in use cannot be overwritten on Windows. So, I assume the following: 1. postmaster loads libxml2.dll without LZ in folder A. 2. Someone adds libxml2.dll with LZ in folder B. folder B is ahead of folder A in postgres's PATH. 3. Some user tries to connect to a database, creating a new child postgres process, which fails to load libxml2.dll in folderB. > What is ticking me is if the popup window stuff discussed on this thread > could be a problem in the detection of such dependency errors, as with the > product I am thinking about, Postgres is not running as a service, but kicked > by another thing which is a service, and monitors the postmaster. I understood you are talking about a case where some (server) application uses PostgreSQL internally. That application runsas a Windows service, but PostgreSQL itself doesn't on its own. The application starts PostgreSQL by running pg_ctlstart. In that case, postgres runs under service. I confirmed it with the following test program. This code is extracted frompgwin32_is_service() in PostgreSQL. -------------------------------------------------- #include <windows.h> #include <stdio.h> int main(void) { BOOL IsMember; PSID ServiceSid; PSID LocalSystemSid; SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY}; FILE *fp; SetErrorMode(0); /* Check for service group membership */ if (!AllocateAndInitializeSid(&NtAuthority, 1, SECURITY_SERVICE_RID, 0, 0, 0, 0, 0, 0, 0, &ServiceSid)) { fprintf(stderr, "could not get SID for service group: error code %lu\n", GetLastError()); return 1; } if (!CheckTokenMembership(NULL, ServiceSid, &IsMember)) { fprintf(stderr, "could not check access token membership: error code %lu\n", GetLastError()); FreeSid(ServiceSid); return 1; } FreeSid(ServiceSid); fp = fopen("d:\\a.txt", "a"); if (IsMember) fprintf(fp, "is under service\n"); else fprintf(fp, "is not under service\n"); return 0; } -------------------------------------------------- You can build the above program with: cl chksvc.c advapi32.lib Regards Takayuki Tsunakawa
On Thu, Jul 26, 2018 at 3:52 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Michael Paquier [mailto:michael@paquier.xyz]
> No, I really mean a library dependency failure. For example, imagine that
> Postgres is compiled on Windows dynamically, and that it depends on
> libxml2.dll, which is itself compiled dynamically. Then imagine, in a
> custom build echosystem, that a folk comes in and adds lz support to
> libxml2 on Windows. If Postgres still consumes libxml2 but does not add
> in its PATH a version of lz, then a backend in need of libxml2 would fail
> to load, causing Postgres to not start properly. True, painful, story.
I see, that's surely painful. But the DLL in use cannot be overwritten on Windows. So, I assume the following:
1. postmaster loads libxml2.dll without LZ in folder A.
2. Someone adds libxml2.dll with LZ in folder B. folder B is ahead of folder A in postgres's PATH.
3. Some user tries to connect to a database, creating a new child postgres process, which fails to load libxml2.dll in folder B.
I doubt that the above scenario is also possible in windows, Once the process has started, it may not receive the new
environmental variable changes that are done. I am not sure though.
> What is ticking me is if the popup window stuff discussed on this thread
> could be a problem in the detection of such dependency errors, as with the
> product I am thinking about, Postgres is not running as a service, but kicked
> by another thing which is a service, and monitors the postmaster.
I understood you are talking about a case where some (server) application uses PostgreSQL internally. That application runs as a Windows service, but PostgreSQL itself doesn't on its own. The application starts PostgreSQL by running pg_ctl start.
In that case, postgres runs under service. I confirmed it with the following test program. This code is extracted from pgwin32_is_service() in PostgreSQL.
--------------------------------------------------
#include <windows.h>
#include <stdio.h>
int
main(void)
{
BOOL IsMember;
PSID ServiceSid;
PSID LocalSystemSid;
SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
FILE *fp;
SetErrorMode(0);
/* Check for service group membership */
if (!AllocateAndInitializeSid(&NtAuthority, 1,
SECURITY_SERVICE_RID, 0, 0, 0, 0, 0, 0, 0,
&ServiceSid))
{
fprintf(stderr, "could not get SID for service group: error code %lu\n",
GetLastError());
return 1;
}
if (!CheckTokenMembership(NULL, ServiceSid, &IsMember))
{
fprintf(stderr, "could not check access token membership: error code %lu\n",
GetLastError());
FreeSid(ServiceSid);
return 1;
}
FreeSid(ServiceSid);
fp = fopen("d:\\a.txt", "a");
if (IsMember)
fprintf(fp, "is under service\n");
else
fprintf(fp, "is not under service\n");
return 0;
}
--------------------------------------------------
You can build the above program with:
cl chksvc.c advapi32.lib
Thanks for confirmation of that PostgreSQL runs as service.
Based on the following details, we can decide whether this fix is required or not.
1. Starting of Postgres server using pg_ctl without service is of production use or not?
2. Without this fix, how difficult is the problem to find out that something is preventing the
server to start? In case if it is easy to find out, may be better to provide some troubleshoot
guide for windows users can help.
I am in favor of doc fix if it easy to find the problem instead of assuming the user usage.
Regards,
Haribabu Kommi
Fujitsu Australia
Hello. At Tue, 6 Nov 2018 15:53:37 +1100, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in <CAJrrPGcxZi4Z_SttnuUvYOaw+SAdk7+cJgYpuf7ao43vuJLH2w@mail.gmail.com> > Thanks for confirmation of that PostgreSQL runs as service. > > Based on the following details, we can decide whether this fix is required > or not. > 1. Starting of Postgres server using pg_ctl without service is of > production use or not? > 2. Without this fix, how difficult is the problem to find out that > something is preventing the > server to start? In case if it is easy to find out, may be better to > provide some troubleshoot > guide for windows users can help. > > I am in favor of doc fix if it easy to find the problem instead of assuming > the user usage. I don't have an idea about which is better behavior, but does this work for you? https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps > These dumps are configured and controlled independently of the > rest of the WER infrastructure. You can make use of the local > dump collection even if WER is disabled or if the user cancels > WER reporting. The local dump can be different than the dump sent > to Microsoft. I found that symantec offers this in the steps for error reporting of its products. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Mar 4, 2019 at 3:23 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello.
At Tue, 6 Nov 2018 15:53:37 +1100, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in <CAJrrPGcxZi4Z_SttnuUvYOaw+SAdk7+cJgYpuf7ao43vuJLH2w@mail.gmail.com>
> Thanks for confirmation of that PostgreSQL runs as service.
>
> Based on the following details, we can decide whether this fix is required
> or not.
> 1. Starting of Postgres server using pg_ctl without service is of
> production use or not?
> 2. Without this fix, how difficult is the problem to find out that
> something is preventing the
> server to start? In case if it is easy to find out, may be better to
> provide some troubleshoot
> guide for windows users can help.
>
> I am in favor of doc fix if it easy to find the problem instead of assuming
> the user usage.
I don't have an idea about which is better behavior, but does
this work for you?
https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps
No, this option is not generating local dumps for postgresql, but this option is useful to
generate dumps for the applications that don't have a specific WER reporting.
> These dumps are configured and controlled independently of the
> rest of the WER infrastructure. You can make use of the local
> dump collection even if WER is disabled or if the user cancels
> WER reporting. The local dump can be different than the dump sent
> to Microsoft.
I found that symantec offers this in the steps for error
reporting of its products.
PostgreSQL server doesn't start and there is no core file available.
Regards,
Haribabu Kommi
Fujitsu Australia
Hello. At Tue, 5 Mar 2019 16:45:53 +1100, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in <CAJrrPGdO0pXW55EDW6NAz=-F=bNtEdJpjQJDKert0A3-zUmdew@mail.gmail.com> > > I don't have an idea about which is better behavior, but does > > this work for you? > > > > > > https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps > > > > No, this option is not generating local dumps for postgresql, but this > option is useful to > generate dumps for the applications that don't have a specific WER > reporting. Mmm. Right. It just turn on WER for specific progarms. It donesn't override SetErrorMode().. Sorry for the noise. > Adding some doc changes for the users to refer what can be the issue > windows if the > PostgreSQL server doesn't start and there is no core file available. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Nov 6, 2018 at 10:24 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > On Thu, Jul 26, 2018 at 3:52 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > > > Thanks for confirmation of that PostgreSQL runs as service. > > Based on the following details, we can decide whether this fix is required or not. > 1. Starting of Postgres server using pg_ctl without service is of production use or not? > 2. Without this fix, how difficult is the problem to find out that something is preventing the > server to start? In case if it is easy to find out, may be better to provide some troubleshoot > guide for windows users can help. > > I am in favor of doc fix if it easy to find the problem instead of assuming the user usage. > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need some more inputs from other developers on whether to fix this or not, so ideally the status of this patch should be 'Needs Review'. Why it is in 'Waiting on Author' state? If something is required from Author, then do we expect to see the updated patch in the next few days? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
From: Amit Kapila [mailto:amit.kapila16@gmail.com] > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need > some more inputs from other developers on whether to fix this or not, > so ideally the status of this patch should be 'Needs Review'. Why it > is in 'Waiting on Author' state? If something is required from > Author, then do we expect to see the updated patch in the next few > days? Thank you for paying attention to this. I think the patch is good, but someone else may have a different solution. So Imarked it as needs review. Regards Takayuki Tsunakawa
At Mon, 1 Jul 2019 00:04:47 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in <0A3221C70F24FB45833433255569204D1FC6515F@G01JPEXMBYT05> > From: Amit Kapila [mailto:amit.kapila16@gmail.com] > > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need > > some more inputs from other developers on whether to fix this or not, > > so ideally the status of this patch should be 'Needs Review'. Why it > > is in 'Waiting on Author' state? If something is required from > > Author, then do we expect to see the updated patch in the next few > > days? > > Thank you for paying attention to this. I think the patch is good, but someone else may have a different solution. SoI marked it as needs review. We are obliged to assume that we won't have the desired behavior without detecting whether running as a service or not. My investigation convinced me that there is no way for a process to detect wheter it is running as a service (except the process directly called from system (aka entry function)). In other words, only pg_ctl knows that and other processes doesn't have a clue for that. The processes other than postmaster can receive that information via backend variables. But the postmaster has no way to get the information from pg_ctl other than command line parameter, environment variable or filesystem (or PIPE?). If we see the complexity meets the benefit, we can use, say, command line parameter, WER dialog can be shown when server is started in console but the parameter being specified, but I don't think it is a problem. Thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > We are obliged to assume that we won't have the desired behavior > without detecting whether running as a service or not. > My investigation convinced me that there is no way for a process > to detect wheter it is running as a service (except the process > directly called from system (aka entry function)). In other > words, only pg_ctl knows that and other processes doesn't have a > clue for that. The processes other than postmaster can receive > that information via backend variables. But the postmaster has no > way to get the information from pg_ctl other than command line > parameter, environment variable or filesystem (or PIPE?). > If we see the complexity meets the benefit, we can use, say, > command line parameter, WER dialog can be shown when server is > started in console but the parameter being specified, but I don't > think it is a problem. Not being a Windows user, I don't have much to say about the big question of whether disabling WER is still a good idea or not. But I will say that in my experience, behavioral differences between Postgres started manually and Postgres started as a daemon are bad. So I think going out of our way to make the cases behave differently on Windows is probably not a good plan. regards, tom lane
On 2019-Jul-23, Tom Lane wrote: > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > My investigation convinced me that there is no way for a process > > to detect wheter it is running as a service (except the process > > directly called from system (aka entry function)). Maybe we can try calling pgwin32_is_service()? > But I will say that in my experience, behavioral differences between > Postgres started manually and Postgres started as a daemon are bad. > So I think going out of our way to make the cases behave differently > on Windows is probably not a good plan. We already have such a difference in Windows -- elog.c uses it extensively to use the eventlog to print if a service, console print if not. Given this thread's OP, ISTM failing to do likewise for this case makes debugging problems difficult for no good reason. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello. On Wed, Jul 24, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Jul-23, Tom Lane wrote: > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > > > My investigation convinced me that there is no way for a process > > > to detect wheter it is running as a service (except the process > > > directly called from system (aka entry function)). > > Maybe we can try calling pgwin32_is_service()? Ah, it might be viable. (I'm not sure though.) I didn't thought that some process property we are enforcing is available. > > But I will say that in my experience, behavioral differences between > > Postgres started manually and Postgres started as a daemon are bad. > > So I think going out of our way to make the cases behave differently > > on Windows is probably not a good plan. > > We already have such a difference in Windows -- elog.c uses it > extensively to use the eventlog to print if a service, console print if > not. > > Given this thread's OP, ISTM failing to do likewise for this case makes > debugging problems difficult for no good reason. The difference of internal behavior is added to unify the external (apparenet) behavior. It prevents WER dialog from showing while running on console. Service doesn't show a dialog even if WER is enabled. The remaining issue is we cannot obtain a dump when running on console without being blocked by a dialog, but Windows just doesn't allow it.. (Might be possible if we ignore Windows 7? 8? and earlier) reagards -- Kyotaro Horiguchi NTT Open Source Software Center
On 2019-Jul-24, Kyotaro Horiguchi wrote: > Hello. > > On Wed, Jul 24, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > On 2019-Jul-23, Tom Lane wrote: > > > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > > > > > My investigation convinced me that there is no way for a process > > > > to detect wheter it is running as a service (except the process > > > > directly called from system (aka entry function)). > > > > Maybe we can try calling pgwin32_is_service()? > > Ah, it might be viable. (I'm not sure though.) I didn't thought that > some process property we are enforcing is available. Well, ereport() relies on this pretty extensively, so it seems okay to rely on it for this too. > The difference of internal behavior is added to unify the external > (apparenet) behavior. It prevents WER dialog from showing while > running on console. Service doesn't show a dialog even if WER is > enabled. Yeah, that seems to be what we want (namely, not to get the server process blocked waiting for the user to click on the dialog box). > The remaining issue is we cannot obtain a dump when running > on console without being blocked by a dialog, but Windows just doesn't > allow it.. (Might be possible if we ignore Windows 7? 8? and earlier) I don't know what a "dump" is in this context, but in the errbacktrace thread I linked to a Stackoverflow question where they mentioned the API to obtain a stack backtrace for a process in Windows. Maybe logging that much info is sufficient for most cases. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [bug fix] Produce a crash dump before main() on Windows
From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Jul-01, Tsunakawa, Takayuki wrote: > From: Amit Kapila [mailto:amit.kapila16@gmail.com] > > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need > > some more inputs from other developers on whether to fix this or not, > > so ideally the status of this patch should be 'Needs Review'. Why it > > is in 'Waiting on Author' state? If something is required from > > Author, then do we expect to see the updated patch in the next few > > days? > > Thank you for paying attention to this. I think the patch is good, > but someone else may have a different solution. So I marked it as > needs review. Tsunakawa-san, there's been some discussion downthread. Could you please submit an updated version of the patch? I would like to get it pushed before beta4 next week, if possible. Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes: > Tsunakawa-san, there's been some discussion downthread. Could you > please submit an updated version of the patch? I would like to get it > pushed before beta4 next week, if possible. [ confused... ] I haven't been paying attention to this thread, but is it really something we'd push into v12 at this point? regards, tom lane
On Thu, Sep 05, 2019 at 11:49:03PM -0400, Tom Lane wrote: > Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes: >> Tsunakawa-san, there's been some discussion downthread. Could you >> please submit an updated version of the patch? I would like to get it >> pushed before beta4 next week, if possible. > > [ confused... ] I haven't been paying attention to this thread, > but is it really something we'd push into v12 at this point? The last patch submitted is here: https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8ECF73@G01JPEXMBYT05 And based on the code paths it touches I would recommend to not play with REL_12_STABLE at this stage. -- Michael
Attachment
From: Michael Paquier [mailto:michael@paquier.xyz] > The last patch submitted is here: > https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D > 1F8ECF73@G01JPEXMBYT05 > And based on the code paths it touches I would recommend to not play with > REL_12_STABLE at this stage. I'm reading the thread to see what I should do... Anyway, I don't think we should rush to include this fix in PG 12, too. But at the same time, I don't think it would be dangerous to put in PG 12. Regards Takayuki Tsunakawa
Re: [bug fix] Produce a crash dump before main() on Windows
From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Sep-05, Tom Lane wrote: > Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes: > > Tsunakawa-san, there's been some discussion downthread. Could you > > please submit an updated version of the patch? I would like to get it > > pushed before beta4 next week, if possible. > > [ confused... ] I haven't been paying attention to this thread, > but is it really something we'd push into v12 at this point? Well, IMV this is a backpatchable, localized bug fix. I was thinking it would be better to get some coverage in the upcoming beta, where it can be fixed without too much harm if there are problems with it, before backpatching further. But if even that is considered too dangerous, I guess we could put it in master for a while, and consider a backpatch later. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes: > On 2019-Sep-05, Tom Lane wrote: >> [ confused... ] I haven't been paying attention to this thread, >> but is it really something we'd push into v12 at this point? > Well, IMV this is a backpatchable, localized bug fix. I dunno. This thread is approaching two years old, and a quick review shows few signs that we actually have any consensus on making behavioral changes here. If there is any consensus, it's that the SetErrorMode calls should depend on checking pgwin32_is_service(); but we haven't even seen a patch that does that, let alone tested it. I think we're way too close to beta4 wrap to be considering pushing such a patch the moment it appears. BTW, I also violently dislike taking Windows-specific stuff out of startup_hacks where it belongs and putting it into main() where it doesn't. I think the entire Windows bit in front of get_progname should migrate into startup_hacks. Surely the odds of failure inside get_progname are not worth worrying about --- they seem significantly less than the odds of failure inside pgwin32_is_service(), for instance. regards, tom lane
From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes: > > Well, IMV this is a backpatchable, localized bug fix. > > I dunno. This thread is approaching two years old, and a quick > review shows few signs that we actually have any consensus on > making behavioral changes here. If there is any consensus, > it's that the SetErrorMode calls should depend on checking > pgwin32_is_service(); but we haven't even seen a patch that > does that, let alone tested it. I think we're way too close > to beta4 wrap to be considering pushing such a patch the moment > it appears. We don't have to call pgwin32_is_service() to determine whether we call SetErrorMode() in postmaster.c because: * The dialog box doesn't appear under Windows service, whether PostgreSQL itself starts as a service or another (server)application runs as a service and does "pg_ctl start." * It's OK for the dialog box to appear when the user runs "pg_ctl start" on the command prompt. That usage is interactive,and should not be for production use (postgres processes vanish when the user mistakenly presses Ctrl+C, or closesthe command prompt). Even now, the dialog box pops up if postmaster crashes before main(). > BTW, I also violently dislike taking Windows-specific stuff out of > startup_hacks where it belongs and putting it into main() where > it doesn't. I think the entire Windows bit in front of get_progname > should migrate into startup_hacks. Surely the odds of failure > inside get_progname are not worth worrying about --- they seem > significantly less than the odds of failure inside > pgwin32_is_service(), for instance. Agreed. Fixed. I changed the CF status to "need review." Regards Takayuki Tsunakawa
Attachment
Re: [bug fix] Produce a crash dump before main() on Windows
From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Sep-10, Tsunakawa, Takayuki wrote: > Agreed. Fixed. I changed the CF status to "need review." I have changed it again to "ready for committer". We could really use help from a windows-enabled committer on this patch, I think. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 10, 2019 at 06:44:48AM +0000, Tsunakawa, Takayuki wrote: > We don't have to call pgwin32_is_service() to determine whether we > call SetErrorMode() in postmaster.c because: > > * The dialog box doesn't appear under Windows service, whether > * PostgreSQL itself starts as a service or another (server) > * application runs as a service and does "pg_ctl start." > > * It's OK for the dialog box to appear when the user runs "pg_ctl > * start" on the command prompt. That usage is interactive, and > * should not be for production use (postgres processes vanish when > * the user mistakenly presses Ctrl+C, or closes the command prompt). > * Even now, the dialog box pops up if postmaster crashes before > * main(). Reading the thread again from scratch, I am under the impression that we do not have an actual consensus on this patch :) Doesn't your first bullet point actually contradict the second one? Your second point looks sensible from a usage point of view (aka show a box when pg_ctl start fails from the command prompt because somebody typed the command manually), however I am really worried about the first point. Imagine an application which relies on Postgres, still does *not* start it as a service but uses "pg_ctl start" automatically. This could be triggered as part of another service startup which calls say system(), or as another script. Wouldn't that cause potentially a freeze? I am also not sure that I quite understand why a popup would never show up if a different service startup triggers pg_ctl start by itself that fails. -- Michael
Attachment
From: Michael Paquier [mailto:michael@paquier.xyz] > Imagine an application which relies on Postgres, still does *not* start > it as a service but uses "pg_ctl start" > automatically. This could be triggered as part of another service startup > which calls say system(), or as another script. Wouldn't that cause > potentially a freeze? Do you mean by "another service startup": another service (Windows service) -> an application (not a Windows service) -> pg_ctl start Then pg_ctl runs under the Windows service environment, and the popup won't appear. > I am also not sure that I quite understand why a > popup would never show up if a different service startup triggers pg_ctl > start by itself that fails. I experimented it with a sample program that I showed in this thread. A program invoked by a Windows services also runsin the service. Regards Takayuki Tsunakawa
On Wed, Sep 11, 2019 at 04:15:24AM +0000, Tsunakawa, Takayuki wrote: > From: Michael Paquier [mailto:michael@paquier.xyz] >> Imagine an application which relies on Postgres, still does *not* start >> it as a service but uses "pg_ctl start" >> automatically. This could be triggered as part of another service startup >> which calls say system(), or as another script. Wouldn't that cause >> potentially a freeze? > > Do you mean by "another service startup": > > another service (Windows service) > -> an application (not a Windows service) > -> pg_ctl start > > Then pg_ctl runs under the Windows service environment, and the > popup won't appear. Yes, I meant a pg_ctl start command invoked as par of another service startup in the way you describe it here. >> I am also not sure that I quite understand why a >> popup would never show up if a different service startup triggers pg_ctl >> start by itself that fails. > > I experimented it with a sample program that I showed in this > thread. A program invoked by a Windows services also runs in the > service. Okay, so this means that the service state of the parent is inherited and that we can rely on that. I was not sure about that. Got it now. Now, there are cases where it is possible to start an app without using a service though, no? For example using the task manager it is possible to trigger an app when the OS starts. What happens in the case where we have an app which starts Postgres using "pg_ctl start" as part of its startup process but it fails and pops an error window, freezing the whole? Can this happen? Sorry to sound noisy, but I am rather scared of the potential consequences of this change without a GUC to control it as Craig mentioned at the beginning of the thread. Perhaps that's a niche case, but it could be annoying for some users. -- Michael
Attachment
On Wed, Sep 11, 2019 at 01:55:45PM +0900, Michael Paquier wrote: > Okay, so this means that the service state of the parent is inherited > and that we can rely on that. I was not sure about that. Got it > now. Now, there are cases where it is possible to start an app > without using a service though, no? For example using the task > manager it is possible to trigger an app when the OS starts. What > happens in the case where we have an app which starts Postgres using > "pg_ctl start" as part of its startup process but it fails and pops an > error window, freezing the whole? Can this happen? Sorry to sound > noisy, but I am rather scared of the potential consequences of this > change without a GUC to control it as Craig mentioned at the beginning > of the thread. Perhaps that's a niche case, but it could be annoying > for some users. We still have a patch registered in the CF for that thread, and it seems to me that those comments have not been answered yet. Is that overthinking? Thoughts? -- Michael
Attachment
On Wed, 18 Jul 2018 at 12:10, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Wed, 18 Jul 2018 11:12:06 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YHv0KfWhA+Z=UVydpvLQ-QyLaidBqpHxQ=YqTPiDGG6dg@mail.gmail.com>
> On 26 February 2018 at 12:06, Tsunakawa, Takayuki <
> tsunakawa.takay@jp.fujitsu.com> wrote:
>
> > From: Craig Ringer [mailto:craig@2ndquadrant.com]
> > > The patch proposed here means that early crashes will invoke WER. If
> > we're
> > > going to allow WER we should probably just do so unconditionally.
> > >
> > > I'd be in favour of leaving WER on when we find out we're in a
> > noninteractive
> > > service too, but that'd be a separate patch for pg11+ only.
> >
> > As for PG11+, I agree that we want to always leave WER on. That is, call
> > SetErrorMode(SEM_FAILCRITICALERRORS) but not specify
> > SEM_NOGPFAULTERRORBOX. The problem with the current specification of
> > PostgreSQL is that the user can only get crash dumps in a fixed folder
> > $PGDATA\crashdumps. That location is bad because the crash dumps will be
> > backed up together with the database cluster without the user noticing it.
> > What's worse, the crash dumps are large. With WER, the user can control
> > the location and size of crash dumps.
> >
>
> Yeah, that's quite old and dates back to when Windows didn't offer much if
> any control over WER in services.
Yeah. If we want to take a crash dump, we cannot have
auto-restart. Since it is inevitable what we can do for this
would be adding a new knob for that, which cannot be turned on
together with restart_after_crash...?
Why?
Mind you, I don't much care about restart_after_crash, I think it's thoroughly obsolete. Windows has been capable of restarting failed services forever, and systemd does so too. There's little reason to have postgres try to do its own self-recovery now, and I prefer to disable it so the postmaster can cleanly exit and get a fresh new launch.
On Mon, 23 Jul 2018 at 16:45, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jul 23, 2018 at 08:16:52AM +0000, Tsunakawa, Takayuki wrote:
> I guess that is due to some missing files related to the libraries
> listed in shared_preload_library. If so, no, because this patch
> relates to failure before main().
No, I really mean a library dependency failure. For example, imagine
that Postgres is compiled on Windows dynamically, and that it depends on
libxml2.dll, which is itself compiled dynamically. Then imagine, in a
custom build echosystem, that a folk comes in and adds lz support to
libxml2 on Windows. If Postgres still consumes libxml2 but does not add
in its PATH a version of lz, then a backend in need of libxml2 would
fail to load, causing Postgres to not start properly. True, painful,
story.
What's super fun about this is the error message. Guess which error Windows will emit? (Some paraphrasing for exact wording):
* "There was a problem starting C:\Program Files\PostgreSQL\11\bin\postgres.exe: The specified module could not be found."
* "There was a problem starting C:\Program Files\PostgreSQL\11\bin\postgres.exe: module "libxml2.dll" could not be found."
* "The program can't start because "libxml2.dll" is missing from your computer."
* "The program "C:\Program Files\PostgreSQL\11\bin\postgres.exe" can't start because an error was encountered when loading required library "C:\Program Files\PostgreSQL\11\lib\libxml.dll": libxml.dll requires "liblz.dll" but it could not be found."
Hint: not the last one.
It'll at best complain about libxml.dll being missing, despite it being very obviously present.
I had to go hunt around with Dependency Walker to figure out the actual missing DLL the last time I had to deal with this.
On Sun, Nov 10, 2019 at 06:03:08PM +0800, Craig Ringer wrote: > I had to go hunt around with Dependency Walker to figure out the actual > missing DLL the last time I had to deal with this. Ahah. Yes, That's exactly what I used a couple of years back on that. These errors are hard to track and find. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Sun, Nov 10, 2019 at 06:03:08PM +0800, Craig Ringer wrote: >> I had to go hunt around with Dependency Walker to figure out the actual >> missing DLL the last time I had to deal with this. > Ahah. Yes, That's exactly what I used a couple of years back on that. > These errors are hard to track and find. I feel the pain on this --- indirect dependencies are not well reported on any platform that I deal with :-(. Still, I'm having a hard time convincing myself that this patch is a good idea. It seems very likely that it will break some startup scenarios, and the incremental benefit in error reporting seems marginal. regards, tom lane
On Sun, Nov 10, 2019 at 11:24:34AM -0500, Tom Lane wrote: > I feel the pain on this --- indirect dependencies are not well reported > on any platform that I deal with :-(. Still, I'm having a hard time > convincing myself that this patch is a good idea. It seems very likely > that it will break some startup scenarios, and the incremental benefit > in error reporting seems marginal. Thanks for your input, Tom. This matches with my opinion about this thread and this patch. -- Michael
Attachment
On Mon, Nov 11, 2019 at 09:02:01AM +0900, Michael Paquier wrote: > Thanks for your input, Tom. This matches with my opinion about this > thread and this patch. And switched the CF entry of the patch as rejected. -- Michael