Thread: [bug fix] Produce a crash dump before main() on Windows

[bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
Hello,

postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called.  The attached patch fixes
this. I'd like this to be back-patched.  I'll add this to the next CF.
 

The original problem happened on our customer's production system.  Their application sometimes failed to connect to
thedatabase.  That was because postgres.exe crashed due to access violation (exception code C0000005).  But there was
nocrash dump, so we had difficulty in finding the cause.  The frequency was low -- about ten times during half a year.
 

What caused the access violation was Symantec's antivirus software.  It seems that sysfer.dll of the software
interceptsregistry access, during C runtime library initialization,  before main() is called.  So, the direct cause of
thisproblem is not PostgreSQL.
 

On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult.
The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error
Reporting). This error mode is inherited from postmaster to its children.  If a crash happens before the child sets up
theexception handler, no crash dump is produced.
 


Regards
Takayuki Tsunakawa


Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Magnus Hagander
Date:


On Fri, Feb 16, 2018 at 8:28 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hello,

postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called.  The attached patch fixes this.  I'd like this to be back-patched.  I'll add this to the next CF.

The original problem happened on our customer's production system.  Their application sometimes failed to connect to the database.  That was because postgres.exe crashed due to access violation (exception code C0000005).  But there was no crash dump, so we had difficulty in finding the cause.  The frequency was low -- about ten times during half a year.

What caused the access violation was Symantec's antivirus software.  It seems that sysfer.dll of the software intercepts registry access, during C runtime library initialization,  before main() is called.  So, the direct cause of this problem is not PostgreSQL.

On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult.  The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting).  This error mode is inherited from postmaster to its children.  If a crash happens before the child sets up the exception handler, no crash dump is produced.

The original call to SetErrorMode() was put in there to make sure we didn't show a popup message which would then make everything freeze (see very old commit 27bff7502f04ee01237ed3f5a997748ae43d3a81). Doesn't this turn that back on, so that if you are not actually there to monitor something you can end up with stuck processes and exactly the issues we had before that one?

It might still be useful in debugging, but in that case we might need to make it a configurable switch?

--

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On 20 February 2018 at 21:47, Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Feb 16, 2018 at 8:28 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hello,

postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called.  The attached patch fixes this.  I'd like this to be back-patched.  I'll add this to the next CF.

The original problem happened on our customer's production system.  Their application sometimes failed to connect to the database.  That was because postgres.exe crashed due to access violation (exception code C0000005).  But there was no crash dump, so we had difficulty in finding the cause.  The frequency was low -- about ten times during half a year.

What caused the access violation was Symantec's antivirus software.  It seems that sysfer.dll of the software intercepts registry access, during C runtime library initialization,  before main() is called.  So, the direct cause of this problem is not PostgreSQL.

On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult.  The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting).  This error mode is inherited from postmaster to its children.  If a crash happens before the child sets up the exception handler, no crash dump is produced.

The original call to SetErrorMode() was put in there to make sure we didn't show a popup message which would then make everything freeze (see very old commit 27bff7502f04ee01237ed3f5a997748ae43d3a81). Doesn't this turn that back on, so that if you are not actually there to monitor something you can end up with stuck processes and exactly the issues we had before that one?

Ha, I just went digging for the same.

We should not disable WER when running as a service (no UI access), it will not display an interactive dialog.

I'm not convinced we should disable it at all personally. Things have come a long way from drwatson.exe . Disabling WER makes it hard to debug postgres by installing Visual Studio Debugger as the hander (I always wondered why that didn't work!) and is generally just painful. It prevents us from collecting data via Microsoft about crashes, should we wish to do so. And who runs Pg on windows except as a service?!

So I'm all for just removing that.
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On 20 February 2018 at 22:18, Craig Ringer <craig@2ndquadrant.com> wrote:
 
So I'm all for just removing that.


... but just to be clear, about -1000 on backpatching any such thing. At most, a new GUC that defaults to the current behaviour. But I think it's pretty niche really.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [bug fix] Produce a crash dump before main() on Windows

From
Magnus Hagander
Date:


On Tue, Feb 20, 2018 at 3:18 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 20 February 2018 at 21:47, Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Feb 16, 2018 at 8:28 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hello,

postgres.exe on Windows doesn't output a crash dump when it crashes before main() is called.  The attached patch fixes this.  I'd like this to be back-patched.  I'll add this to the next CF.

The original problem happened on our customer's production system.  Their application sometimes failed to connect to the database.  That was because postgres.exe crashed due to access violation (exception code C0000005).  But there was no crash dump, so we had difficulty in finding the cause.  The frequency was low -- about ten times during half a year.

What caused the access violation was Symantec's antivirus software.  It seems that sysfer.dll of the software intercepts registry access, during C runtime library initialization,  before main() is called.  So, the direct cause of this problem is not PostgreSQL.

On the other hand, it's PostgreSQL's problem that we can't get the crash dump, which makes the investigation difficult.  The cause is that postmaster calls SetErrorMode() to disable the outputing of crash dumps by WER (Windows Error Reporting).  This error mode is inherited from postmaster to its children.  If a crash happens before the child sets up the exception handler, no crash dump is produced.

The original call to SetErrorMode() was put in there to make sure we didn't show a popup message which would then make everything freeze (see very old commit 27bff7502f04ee01237ed3f5a997748ae43d3a81). Doesn't this turn that back on, so that if you are not actually there to monitor something you can end up with stuck processes and exactly the issues we had before that one?

Ha, I just went digging for the same.

We should not disable WER when running as a service (no UI access), it will not display an interactive dialog.

I'm not convinced we should disable it at all personally. Things have come a long way from drwatson.exe . Disabling WER makes it hard to debug postgres by installing Visual Studio Debugger as the hander (I always wondered why that didn't work!) and is generally just painful. It prevents us from collecting data via Microsoft about crashes, should we wish to do so. And who runs Pg on windows except as a service?!


I've seen a number of usecases where apps start it alongside the app instead of as a service. I'm not sure how recent those apps are though, and I'm not sure it's better than using a service in the first place (but it does let you install things without being an admin).

We really shouldn't *break* that scenario for people. But making it work well for the service usecase should definitely be the priority.

--

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On 20 February 2018 at 23:43, Magnus Hagander <magnus@hagander.net> wrote:

I've seen a number of usecases where apps start it alongside the app instead of as a service. I'm not sure how recent those apps are though, and I'm not sure it's better than using a service in the first place (but it does let you install things without being an admin).

We really shouldn't *break* that scenario for people. But making it work well for the service usecase should definitely be the priority.

Good point and agreed.

The patch proposed here means that early crashes will invoke WER. If we're going to allow WER we should probably just do so unconditionally.

I suggest changing this to a command line flag or environment variable test that suppresses Pg's default disabling of WER. A GUC probably doesn't make sense; it's too niche, and too early.

I'd be in favour of leaving WER on when we find out we're in a noninteractive service too, but that'd be a separate patch for pg11+ only.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Craig Ringer [mailto:craig@2ndquadrant.com]
> The patch proposed here means that early crashes will invoke WER. If we're
> going to allow WER we should probably just do so unconditionally.
> 
> I'd be in favour of leaving WER on when we find out we're in a noninteractive
> service too, but that'd be a separate patch for pg11+ only.

As for PG11+, I agree that we want to always leave WER on.  That is, call SetErrorMode(SEM_FAILCRITICALERRORS) but not
specifySEM_NOGPFAULTERRORBOX.  The problem with the current specification of PostgreSQL is that the user can only get
crashdumps in a fixed folder $PGDATA\crashdumps.  That location is bad because the crash dumps will be backed up
togetherwith the database cluster without the user noticing it.  What's worse, the crash dumps are large.  With WER,
theuser can control the location and size of crash dumps.
 

> I suggest changing this to a command line flag or environment variable test
> that suppresses Pg's default disabling of WER. A GUC probably doesn't make
> sense; it's too niche, and too early.

As for the past major releases, it's burdensome for the user to have to specify a new flag or an environment variable
whenhe has to get crash dumps to investigate a rare crash before main().  It's not necessary to disable WER for crashes
aftermain() is called, because PG installs an exception handler at the beginning of main(), which works fine.  So can
wego with the current patch?
 

Another idea to add to the current patch is to move the call to SetErrorMode() to the below function, which is called
firstin main().  How about this?
 

void
pgwin32_install_crashdump_handler(void)
{
    SetUnhandledExceptionFilter(crashDumpHandler);
}

Regards
Takayuki Tsunakawa




RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> Another idea to add to the current patch is to move the call to SetErrorMode()
> to the below function, which is called first in main().  How about this?
> 
> void
> pgwin32_install_crashdump_handler(void)
> {
>     SetUnhandledExceptionFilter(crashDumpHandler);
> }

I moved SetErrorMode() to the beginning of main().  It should be placed before any code which could crash.  The current
locationis a bit late: in fact, write_stderr() crashed when WSAStartup() failed.
 

Regards
Takayuki Tsunakawa


Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Haribabu Kommi
Date:

On Thu, Mar 1, 2018 at 4:14 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> Another idea to add to the current patch is to move the call to SetErrorMode()
> to the below function, which is called first in main().  How about this?
>
> void
> pgwin32_install_crashdump_handler(void)
> {
>       SetUnhandledExceptionFilter(crashDumpHandler);
> }

I moved SetErrorMode() to the beginning of main().  It should be placed before any code which could crash.  The current location is a bit late: in fact, write_stderr() crashed when WSAStartup() failed.

I reviewed patch and it works as per the subject, but I am not able to verify the actual
bug that is reported in the upthread. The moving of setErrorMode() call to the start
of the main function can handle all the cases that can lead to a crash with no popup.

The reset of setErrorMode(0) before start of any process can generate the coredump
because of any issue before it reaches the main() function, but this change can lead
to stop the postmaster restart process, if no one present to observe the scenario?
Doesn't this change can generate backward compatibility problems to particular users?

I don't have any other comments with the current patch.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On 26 February 2018 at 12:06, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Craig Ringer [mailto:craig@2ndquadrant.com]
> The patch proposed here means that early crashes will invoke WER. If we're
> going to allow WER we should probably just do so unconditionally.
>
> I'd be in favour of leaving WER on when we find out we're in a noninteractive
> service too, but that'd be a separate patch for pg11+ only.

As for PG11+, I agree that we want to always leave WER on.  That is, call SetErrorMode(SEM_FAILCRITICALERRORS) but not specify SEM_NOGPFAULTERRORBOX.  The problem with the current specification of PostgreSQL is that the user can only get crash dumps in a fixed folder $PGDATA\crashdumps.  That location is bad because the crash dumps will be backed up together with the database cluster without the user noticing it.  What's worse, the crash dumps are large.  With WER, the user can control the location and size of crash dumps.

Yeah, that's quite old and dates back to when Windows didn't offer much if any control over WER in services.
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [bug fix] Produce a crash dump before main() on Windows

From
Kyotaro HORIGUCHI
Date:
At Wed, 18 Jul 2018 11:12:06 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in
<CAMsr+YHv0KfWhA+Z=UVydpvLQ-QyLaidBqpHxQ=YqTPiDGG6dg@mail.gmail.com>
> On 26 February 2018 at 12:06, Tsunakawa, Takayuki <
> tsunakawa.takay@jp.fujitsu.com> wrote:
> 
> > From: Craig Ringer [mailto:craig@2ndquadrant.com]
> > > The patch proposed here means that early crashes will invoke WER. If
> > we're
> > > going to allow WER we should probably just do so unconditionally.
> > >
> > > I'd be in favour of leaving WER on when we find out we're in a
> > noninteractive
> > > service too, but that'd be a separate patch for pg11+ only.
> >
> > As for PG11+, I agree that we want to always leave WER on.  That is, call
> > SetErrorMode(SEM_FAILCRITICALERRORS) but not specify
> > SEM_NOGPFAULTERRORBOX.  The problem with the current specification of
> > PostgreSQL is that the user can only get crash dumps in a fixed folder
> > $PGDATA\crashdumps.  That location is bad because the crash dumps will be
> > backed up together with the database cluster without the user noticing it.
> > What's worse, the crash dumps are large.  With WER, the user can control
> > the location and size of crash dumps.
> >
> 
> Yeah, that's quite old and dates back to when Windows didn't offer much if
> any control over WER in services.

Yeah. If we want to take a crash dump, we cannot have
auto-restart. Since it is inevitable what we can do for this
would be adding a new knob for that, which cannot be turned on
together with restart_after_crash...?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
> I reviewed patch and it works as per the subject, but I am not able to verify
> the actual
> bug that is reported in the upthread. The moving of setErrorMode() call
> to the start
> of the main function can handle all the cases that can lead to a crash with
> no popup.

Thank you for reviewing the patch.  Yeah, I don't know reproduce the problem too, because some bug of Symantec
AntiVirusrandomly caused the crash before main()...
 


> The reset of setErrorMode(0) before start of any process can generate the
> coredump
> because of any issue before it reaches the main() function, but this change
> can lead
> to stop the postmaster restart process, if no one present to observe the
> scenario?
> Doesn't this change can generate backward compatibility problems to
> particular users?

Sorry, I couldn't get it.  Could you elaborate on it?  I couldn't understand how this change has an adverse effect on
thepostmaster restart process (restart_after_crash = on) and backward compatibility.
 


Regards
Takayuki Tsunakawa


Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Wed, Jul 18, 2018 at 06:09:57AM +0000, Tsunakawa, Takayuki wrote:
> From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
>> I reviewed patch and it works as per the subject, but I am not able to verify
>> the actual
>> bug that is reported in the upthread. The moving of setErrorMode() call
>> to the start
>> of the main function can handle all the cases that can lead to a crash with
>> no popup.
>
> Thank you for reviewing the patch.  Yeah, I don't know reproduce the
> problem too, because some bug of Symantec AntiVirus randomly caused
> the crash before main()...

A suggestion that I have here would be to patch manually the postmaster
with any kind of code which would make it to crash before main() is
called.  Anything is fine as long as a core dump is produced, right?
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On 18 July 2018 at 14:34, Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 18, 2018 at 06:09:57AM +0000, Tsunakawa, Takayuki wrote:
> From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
>> I reviewed patch and it works as per the subject, but I am not able to verify
>> the actual
>> bug that is reported in the upthread. The moving of setErrorMode() call
>> to the start
>> of the main function can handle all the cases that can lead to a crash with
>> no popup.
>
> Thank you for reviewing the patch.  Yeah, I don't know reproduce the
> problem too, because some bug of Symantec AntiVirus randomly caused
> the crash before main()...

A suggestion that I have here would be to patch manually the postmaster
with any kind of code which would make it to crash before main() is
called.  Anything is fine as long as a core dump is produced, right?

 
When I was  doing the original crashdump support I wrote a crashme extension that deliberately dereferenced a null pointer.

You'd do something similar in the postmaster for the testing.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [bug fix] Produce a crash dump before main() on Windows

From
Haribabu Kommi
Date:

On Wed, Jul 18, 2018 at 4:10 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
> I reviewed patch and it works as per the subject, but I am not able to verify
> the actual
> bug that is reported in the upthread. The moving of setErrorMode() call
> to the start
> of the main function can handle all the cases that can lead to a crash with
> no popup.

Thank you for reviewing the patch.  Yeah, I don't know reproduce the problem too, because some bug of Symantec AntiVirus randomly caused the crash before main()...


> The reset of setErrorMode(0) before start of any process can generate the
> coredump
> because of any issue before it reaches the main() function, but this change
> can lead
> to stop the postmaster restart process, if no one present to observe the
> scenario?
> Doesn't this change can generate backward compatibility problems to
> particular users?

Sorry, I couldn't get it.  Could you elaborate on it?  I couldn't understand how this change has an adverse effect on the postmaster restart process (restart_after_crash = on) and backward compatibility.

Sorry about not providing more details. It is just my doubt, because
I am not able to reproduce the problem to test it after the fix. May be
I can give a try by modifying the source code to get the crash.

My point is, With this patch, in case if the postgres crashses
before reaching main(), does it generate a popup?

If the answer to the above question is yes, then if the popup is
not attended by the DBA, I feel the postmaster is not able to
restart all the processes because it is waiting for the crash process
to close. Am I Correct?

Regards,
Haribabu Kommi
Fujitsu Australia

RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
> May be I can give a try by modifying the source code to get the crash.


Thank you, that would be great if you could come up with a good way!

> My point is, With this patch, in case if the postgres crashses
> before reaching main(), does it generate a popup?
> 
> If the answer to the above question is yes, then if the popup is
> not attended by the DBA, I feel the postmaster is not able to
> restart all the processes because it is waiting for the crash process
> to close. Am I Correct?

IIRC, the pop up doesn't appear under Windows service.  If you start the database server with pg_ctl start on the
commandprompt, the pop up will appear, which I think is not bad.
 


Regards
Takayuki Tsunakawa


Re: [bug fix] Produce a crash dump before main() on Windows

From
Haribabu Kommi
Date:

On Wed, Jul 18, 2018 at 6:39 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com]
> May be I can give a try by modifying the source code to get the crash.


Thank you, that would be great if you could come up with a good way!

+       if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
+       {
+               char *ptr = NULL;
+
+               printf ("%s",*ptr);
+       }
+

I added the above code as first execution code in main() function and I am able to
reproduce the WER popup.

 
> My point is, With this patch, in case if the postgres crashses
> before reaching main(), does it generate a popup?
>
> If the answer to the above question is yes, then if the popup is
> not attended by the DBA, I feel the postmaster is not able to
> restart all the processes because it is waiting for the crash process
> to close. Am I Correct?

IIRC, the pop up doesn't appear under Windows service.  If you start the database server with pg_ctl start on the command prompt, the pop up will appear, which I think is not bad.

Yes, the popup doesn't appear when it is started as service and appear when it is started from command prompt.
I also think that scenarios of starting the server from command prompt may be less, so this patch can be useful
to find out the issue by the starting the server from command prompt. 

I marked the patch as "ready for committer".

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [bug fix] Produce a crash dump before main() on Windows

From
Robert Haas
Date:
On Wed, Jul 18, 2018 at 4:38 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> IIRC, the pop up doesn't appear under Windows service.  If you start the database server with pg_ctl start on the
commandprompt, the pop up will appear, which I think is not bad.
 

Hmm.  I think that might be bad.  What makes you think it isn't?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> On Wed, Jul 18, 2018 at 4:38 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > IIRC, the pop up doesn't appear under Windows service.  If you start the
> database server with pg_ctl start on the command prompt, the pop up will
> appear, which I think is not bad.
> 
> Hmm.  I think that might be bad.  What makes you think it isn't?

First, as Hari-san said, starting the server with "pg_ctl start" on the command prompt does not feel like production
usebut like test environment,  and that usage seems mostly interactive.  Second, the current PostgreSQL is not exempt
fromthe same problem: if postmaster or standalone backend crashes before main(), the pop up would appear.
 


Regards
Takayuki Tsunakawa


Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Mon, Feb 26, 2018 at 04:06:48AM +0000, Tsunakawa, Takayuki wrote:
> As for PG11+, I agree that we want to always leave WER on.  That is,
> call SetErrorMode(SEM_FAILCRITICALERRORS) but not specify
> SEM_NOGPFAULTERRORBOX.  The problem with the current specification of
> PostgreSQL is that the user can only get crash dumps in a fixed folder
> $PGDATA\crashdumps.  That location is bad because the crash dumps will
> be backed up together with the database cluster without the user
> noticing it.  What's worse, the crash dumps are large.  With WER, the
> user can control the location and size of crash dumps.

I have not looked by myself at the problems around WER and such so I
don't have a clear opinion, but including crash dumps within backups is
*bad*.  We have a set of filtering rules in basebackup.c and pg_rewind
since v11, we could make use of it and remove all contents from
crashdumps/ from what gets backed up or rewound.  excludeDirContents
creates empty folders when those are listed, so we'd want to not create
an empty folder in Windows backups, which makes a bit more more
difficult.
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 01:31:57AM +0000, Tsunakawa, Takayuki wrote:
> First, as Hari-san said, starting the server with "pg_ctl start" on
> the command prompt does not feel like production use but like test
> environment,  and that usage seems mostly interactive.  Second, the
> current PostgreSQL is not exempt from the same problem: if postmaster
> or standalone backend crashes before main(), the pop up would appear.

When you use pg_ctl start within the command prompt, then closing the
prompt session also causes Postgres to stop immediately.  Would it be a
problem?

Another question I have is that I have seen Postgres fail to start
because of missing dependent libraries, which happened after the
postmaster startup as this was reported in the logs, could it be a
problem with this patch?  If the system waits for an operator to close
this pop-up window, then it becomes harder to check automatically such
failures and relies on the test application to timeout, if it is wise
enough to do, which is not always the case..
--
Michael

Attachment

RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> When you use pg_ctl start within the command prompt, then closing the prompt
> session also causes Postgres to stop immediately.  Would it be a problem?

No.  But that makes me think more of the startup on command prompt as non-production usage.


> Another question I have is that I have seen Postgres fail to start because
> of missing dependent libraries, which happened after the postmaster startup
> as this was reported in the logs, could it be a problem with this patch?
> If the system waits for an operator to close this pop-up window, then it
> becomes harder to check automatically such failures and relies on the test
> application to timeout, if it is wise enough to do, which is not always
> the case..

I guess that is due to some missing files related to the libraries listed in shared_preload_library.  If so, no,
becausethis patch relates to failure before main().
 


Regards
Takayuki Tsunakawa





Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 08:16:52AM +0000, Tsunakawa, Takayuki wrote:
> I guess that is due to some missing files related to the libraries
> listed in shared_preload_library.  If so, no, because this patch
> relates to failure before main().

No, I really mean a library dependency failure.  For example, imagine
that Postgres is compiled on Windows dynamically, and that it depends on
libxml2.dll, which is itself compiled dynamically.  Then imagine, in a
custom build echosystem, that a folk comes in and adds lz support to
libxml2 on Windows.  If Postgres still consumes libxml2 but does not add
in its PATH a version of lz, then a backend in need of libxml2 would
fail to load, causing Postgres to not start properly.  True, painful,
story.

What is ticking me is if the popup window stuff discussed on this thread
could be a problem in the detection of such dependency errors, as with
the product I am thinking about, Postgres is not running as a service,
but kicked by another thing which is a service, and monitors the
postmaster.
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Amit Kapila
Date:
On Mon, Jul 23, 2018 at 12:56 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Jul 23, 2018 at 01:31:57AM +0000, Tsunakawa, Takayuki wrote:
>> First, as Hari-san said, starting the server with "pg_ctl start" on
>> the command prompt does not feel like production use but like test
>> environment,  and that usage seems mostly interactive.  Second, the
>> current PostgreSQL is not exempt from the same problem: if postmaster
>> or standalone backend crashes before main(), the pop up would appear.
>
> When you use pg_ctl start within the command prompt, then closing the
> prompt session also causes Postgres to stop immediately.  Would it be a
> problem?
>
> Another question I have is that I have seen Postgres fail to start
> because of missing dependent libraries, which happened after the
> postmaster startup as this was reported in the logs, could it be a
> problem with this patch?
>

It doesn't appear so, because we have
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX); in
dlopen.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> No, I really mean a library dependency failure.  For example, imagine that
> Postgres is compiled on Windows dynamically, and that it depends on
> libxml2.dll, which is itself compiled dynamically.  Then imagine, in a
> custom build echosystem, that a folk comes in and adds lz support to
> libxml2 on Windows.  If Postgres still consumes libxml2 but does not add
> in its PATH a version of lz, then a backend in need of libxml2 would fail
> to load, causing Postgres to not start properly.  True, painful, story.

I see, that's surely painful.  But the DLL in use cannot be overwritten on Windows.  So, I assume the following:

1. postmaster loads libxml2.dll without LZ in folder A.
2. Someone adds libxml2.dll with LZ in folder B.  folder B is ahead of folder A in postgres's PATH.
3. Some user tries to connect to a database, creating a new child postgres process, which fails to load libxml2.dll in
folderB.
 




> What is ticking me is if the popup window stuff discussed on this thread
> could be a problem in the detection of such dependency errors, as with the
> product I am thinking about, Postgres is not running as a service, but kicked
> by another thing which is a service, and monitors the postmaster.

I understood you are talking about a case where some (server) application uses PostgreSQL internally.  That application
runsas a Windows service, but PostgreSQL itself doesn't on its own.  The application starts PostgreSQL by running
pg_ctlstart.
 

In that case, postgres runs under service.  I confirmed it with the following test program.  This code is extracted
frompgwin32_is_service() in PostgreSQL.
 

--------------------------------------------------
#include <windows.h>
#include <stdio.h>

int
main(void)
{
    BOOL        IsMember;
    PSID        ServiceSid;
    PSID        LocalSystemSid;
    SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
    FILE *fp;

    SetErrorMode(0);

    /* Check for service group membership */
    if (!AllocateAndInitializeSid(&NtAuthority, 1,
                                  SECURITY_SERVICE_RID, 0, 0, 0, 0, 0, 0, 0,
                                  &ServiceSid))
    {
        fprintf(stderr, "could not get SID for service group: error code %lu\n",
                GetLastError());
        return 1;
    }

    if (!CheckTokenMembership(NULL, ServiceSid, &IsMember))
    {
        fprintf(stderr, "could not check access token membership: error code %lu\n",
                GetLastError());
        FreeSid(ServiceSid);
        return 1;
    }
    FreeSid(ServiceSid);

    fp = fopen("d:\\a.txt", "a");
    if (IsMember)
        fprintf(fp, "is under service\n");
    else
        fprintf(fp, "is not under service\n");

    return 0;
}
--------------------------------------------------

You can build the above program with:
  cl chksvc.c advapi32.lib


Regards
Takayuki Tsunakawa





Re: [bug fix] Produce a crash dump before main() on Windows

From
Haribabu Kommi
Date:
On Thu, Jul 26, 2018 at 3:52 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Michael Paquier [mailto:michael@paquier.xyz]
> No, I really mean a library dependency failure.  For example, imagine that
> Postgres is compiled on Windows dynamically, and that it depends on
> libxml2.dll, which is itself compiled dynamically.  Then imagine, in a
> custom build echosystem, that a folk comes in and adds lz support to
> libxml2 on Windows.  If Postgres still consumes libxml2 but does not add
> in its PATH a version of lz, then a backend in need of libxml2 would fail
> to load, causing Postgres to not start properly.  True, painful, story.

I see, that's surely painful.  But the DLL in use cannot be overwritten on Windows.  So, I assume the following:

1. postmaster loads libxml2.dll without LZ in folder A.
2. Someone adds libxml2.dll with LZ in folder B.  folder B is ahead of folder A in postgres's PATH.
3. Some user tries to connect to a database, creating a new child postgres process, which fails to load libxml2.dll in folder B.

I doubt that the above scenario is also possible in windows,  Once the process has started, it may not receive the new
environmental variable changes that are done. I am not sure though.

 
> What is ticking me is if the popup window stuff discussed on this thread
> could be a problem in the detection of such dependency errors, as with the
> product I am thinking about, Postgres is not running as a service, but kicked
> by another thing which is a service, and monitors the postmaster.

I understood you are talking about a case where some (server) application uses PostgreSQL internally.  That application runs as a Windows service, but PostgreSQL itself doesn't on its own.  The application starts PostgreSQL by running pg_ctl start.

In that case, postgres runs under service.  I confirmed it with the following test program.  This code is extracted from pgwin32_is_service() in PostgreSQL.

--------------------------------------------------
#include <windows.h>
#include <stdio.h>

int
main(void)
{
        BOOL            IsMember;
        PSID            ServiceSid;
        PSID            LocalSystemSid;
        SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
        FILE *fp;

        SetErrorMode(0);

        /* Check for service group membership */
        if (!AllocateAndInitializeSid(&NtAuthority, 1,
                                                                  SECURITY_SERVICE_RID, 0, 0, 0, 0, 0, 0, 0,
                                                                  &ServiceSid))
        {
                fprintf(stderr, "could not get SID for service group: error code %lu\n",
                                GetLastError());
                return 1;
        }

        if (!CheckTokenMembership(NULL, ServiceSid, &IsMember))
        {
                fprintf(stderr, "could not check access token membership: error code %lu\n",
                                GetLastError());
                FreeSid(ServiceSid);
                return 1;
        }
        FreeSid(ServiceSid);

        fp = fopen("d:\\a.txt", "a");
        if (IsMember)
                fprintf(fp, "is under service\n");
        else
                fprintf(fp, "is not under service\n");

        return 0;
}
--------------------------------------------------

You can build the above program with:
  cl chksvc.c advapi32.lib

Thanks for confirmation of that PostgreSQL runs as service.

Based on the following details, we can decide whether this fix is required or not.
1. Starting of Postgres server using pg_ctl without service is of production use or not?
2. Without this fix, how difficult is the problem to find out that something is preventing the
server to start? In case if it is easy to find out, may be better to provide some troubleshoot
guide for windows users can help.

I am in favor of doc fix if it easy to find the problem instead of assuming the user usage.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [bug fix] Produce a crash dump before main() on Windows

From
Kyotaro HORIGUCHI
Date:
Hello.

At Tue, 6 Nov 2018 15:53:37 +1100, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in
<CAJrrPGcxZi4Z_SttnuUvYOaw+SAdk7+cJgYpuf7ao43vuJLH2w@mail.gmail.com>
> Thanks for confirmation of that PostgreSQL runs as service.
> 
> Based on the following details, we can decide whether this fix is required
> or not.
> 1. Starting of Postgres server using pg_ctl without service is of
> production use or not?
> 2. Without this fix, how difficult is the problem to find out that
> something is preventing the
> server to start? In case if it is easy to find out, may be better to
> provide some troubleshoot
> guide for windows users can help.
> 
> I am in favor of doc fix if it easy to find the problem instead of assuming
> the user usage.

I don't have an idea about which is better behavior, but does
this work for you?

https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps

> These dumps are configured and controlled independently of the
> rest of the WER infrastructure. You can make use of the local
> dump collection even if WER is disabled or if the user cancels
> WER reporting. The local dump can be different than the dump sent
> to Microsoft.

I found that symantec offers this in the steps for error
reporting of its products.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [bug fix] Produce a crash dump before main() on Windows

From
Haribabu Kommi
Date:

On Mon, Mar 4, 2019 at 3:23 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello.

At Tue, 6 Nov 2018 15:53:37 +1100, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in <CAJrrPGcxZi4Z_SttnuUvYOaw+SAdk7+cJgYpuf7ao43vuJLH2w@mail.gmail.com>
> Thanks for confirmation of that PostgreSQL runs as service.
>
> Based on the following details, we can decide whether this fix is required
> or not.
> 1. Starting of Postgres server using pg_ctl without service is of
> production use or not?
> 2. Without this fix, how difficult is the problem to find out that
> something is preventing the
> server to start? In case if it is easy to find out, may be better to
> provide some troubleshoot
> guide for windows users can help.
>
> I am in favor of doc fix if it easy to find the problem instead of assuming
> the user usage.

I don't have an idea about which is better behavior, but does
this work for you?

https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps

No, this option is not generating local dumps for postgresql, but this option is useful to
generate dumps for the applications that don't have a specific WER reporting.

 
> These dumps are configured and controlled independently of the
> rest of the WER infrastructure. You can make use of the local
> dump collection even if WER is disabled or if the user cancels
> WER reporting. The local dump can be different than the dump sent
> to Microsoft.

I found that symantec offers this in the steps for error
reporting of its products.

Adding some doc changes for the users to refer what can be the issue windows if the
PostgreSQL server doesn't start and there is no core file available.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [bug fix] Produce a crash dump before main() on Windows

From
Kyotaro HORIGUCHI
Date:
Hello.

At Tue, 5 Mar 2019 16:45:53 +1100, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in
<CAJrrPGdO0pXW55EDW6NAz=-F=bNtEdJpjQJDKert0A3-zUmdew@mail.gmail.com>
> > I don't have an idea about which is better behavior, but does
> > this work for you?
> >
> >
> > https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps
> >
> 
> No, this option is not generating local dumps for postgresql, but this
> option is useful to
> generate dumps for the applications that don't have a specific WER
> reporting.

Mmm. Right. It just turn on WER for specific progarms. It
donesn't override SetErrorMode().. Sorry for the noise.


> Adding some doc changes for the users to refer what can be the issue
> windows if the
> PostgreSQL server doesn't start and there is no core file available.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [bug fix] Produce a crash dump before main() on Windows

From
Amit Kapila
Date:
On Tue, Nov 6, 2018 at 10:24 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Thu, Jul 26, 2018 at 3:52 PM Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
>
>
> Thanks for confirmation of that PostgreSQL runs as service.
>
> Based on the following details, we can decide whether this fix is required or not.
> 1. Starting of Postgres server using pg_ctl without service is of production use or not?
> 2. Without this fix, how difficult is the problem to find out that something is preventing the
> server to start? In case if it is easy to find out, may be better to provide some troubleshoot
> guide for windows users can help.
>
> I am in favor of doc fix if it easy to find the problem instead of assuming the user usage.
>

Tsunakawa/Haribabu - By reading this thread briefly, it seems we need
some more inputs from other developers on whether to fix this or not,
so ideally the status of this patch should be 'Needs Review'.  Why it
is in 'Waiting on Author' state?  If something is required from
Author, then do we expect to see the updated patch in the next few
days?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Amit Kapila [mailto:amit.kapila16@gmail.com]
> Tsunakawa/Haribabu - By reading this thread briefly, it seems we need
> some more inputs from other developers on whether to fix this or not,
> so ideally the status of this patch should be 'Needs Review'.  Why it
> is in 'Waiting on Author' state?  If something is required from
> Author, then do we expect to see the updated patch in the next few
> days?

Thank you for paying attention to this.  I think the patch is good, but someone else may have a different solution.  So
Imarked it as needs review.
 


Regards
Takayuki Tsunakawa



Re: [bug fix] Produce a crash dump before main() on Windows

From
Kyotaro Horiguchi
Date:
At Mon, 1 Jul 2019 00:04:47 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in
<0A3221C70F24FB45833433255569204D1FC6515F@G01JPEXMBYT05>
> From: Amit Kapila [mailto:amit.kapila16@gmail.com]
> > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need
> > some more inputs from other developers on whether to fix this or not,
> > so ideally the status of this patch should be 'Needs Review'.  Why it
> > is in 'Waiting on Author' state?  If something is required from
> > Author, then do we expect to see the updated patch in the next few
> > days?
> 
> Thank you for paying attention to this.  I think the patch is good, but someone else may have a different solution.
SoI marked it as needs review.
 

We are obliged to assume that we won't have the desired behavior
without detecting whether running as a service or not.

My investigation convinced me that there is no way for a process
to detect wheter it is running as a service (except the process
directly called from system (aka entry function)). In other
words, only pg_ctl knows that and other processes doesn't have a
clue for that. The processes other than postmaster can receive
that information via backend variables. But the postmaster has no
way to get the information from pg_ctl other than command line
parameter, environment variable or filesystem (or PIPE?).

If we see the complexity meets the benefit, we can use, say,
command line parameter, WER dialog can be shown when server is
started in console but the parameter being specified, but I don't
think it is a problem.

Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [bug fix] Produce a crash dump before main() on Windows

From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> We are obliged to assume that we won't have the desired behavior
> without detecting whether running as a service or not.

> My investigation convinced me that there is no way for a process
> to detect wheter it is running as a service (except the process
> directly called from system (aka entry function)). In other
> words, only pg_ctl knows that and other processes doesn't have a
> clue for that. The processes other than postmaster can receive
> that information via backend variables. But the postmaster has no
> way to get the information from pg_ctl other than command line
> parameter, environment variable or filesystem (or PIPE?).

> If we see the complexity meets the benefit, we can use, say,
> command line parameter, WER dialog can be shown when server is
> started in console but the parameter being specified, but I don't
> think it is a problem.

Not being a Windows user, I don't have much to say about the big
question of whether disabling WER is still a good idea or not.  But
I will say that in my experience, behavioral differences between
Postgres started manually and Postgres started as a daemon are bad.
So I think going out of our way to make the cases behave differently
on Windows is probably not a good plan.

            regards, tom lane



Re: [bug fix] Produce a crash dump before main() on Windows

From
Alvaro Herrera
Date:
On 2019-Jul-23, Tom Lane wrote:

> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:

> > My investigation convinced me that there is no way for a process
> > to detect wheter it is running as a service (except the process
> > directly called from system (aka entry function)).

Maybe we can try calling pgwin32_is_service()?

> But I will say that in my experience, behavioral differences between
> Postgres started manually and Postgres started as a daemon are bad.
> So I think going out of our way to make the cases behave differently
> on Windows is probably not a good plan.

We already have such a difference in Windows -- elog.c uses it
extensively to use the eventlog to print if a service, console print if
not.

Given this thread's OP, ISTM failing to do likewise for this case makes
debugging problems difficult for no good reason.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [bug fix] Produce a crash dump before main() on Windows

From
Kyotaro Horiguchi
Date:
Hello.

On Wed, Jul 24, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Jul-23, Tom Lane wrote:
>
> > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>
> > > My investigation convinced me that there is no way for a process
> > > to detect wheter it is running as a service (except the process
> > > directly called from system (aka entry function)).
>
> Maybe we can try calling pgwin32_is_service()?

Ah, it might be viable. (I'm not sure though.) I didn't thought that
some process property we are enforcing is available.

> > But I will say that in my experience, behavioral differences between
> > Postgres started manually and Postgres started as a daemon are bad.
> > So I think going out of our way to make the cases behave differently
> > on Windows is probably not a good plan.
>
> We already have such a difference in Windows -- elog.c uses it
> extensively to use the eventlog to print if a service, console print if
> not.
>
> Given this thread's OP, ISTM failing to do likewise for this case makes
> debugging problems difficult for no good reason.

The difference of internal behavior is added to unify the external
(apparenet) behavior. It prevents WER dialog from showing while
running on console. Service doesn't show a dialog even if WER is
enabled. The remaining issue is we cannot obtain a dump when running
on console without being blocked by a dialog, but Windows just doesn't
allow it.. (Might be possible if we ignore Windows 7? 8? and earlier)

reagards
-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [bug fix] Produce a crash dump before main() on Windows

From
Alvaro Herrera
Date:
On 2019-Jul-24, Kyotaro Horiguchi wrote:

> Hello.
> 
> On Wed, Jul 24, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > On 2019-Jul-23, Tom Lane wrote:
> >
> > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> >
> > > > My investigation convinced me that there is no way for a process
> > > > to detect wheter it is running as a service (except the process
> > > > directly called from system (aka entry function)).
> >
> > Maybe we can try calling pgwin32_is_service()?
> 
> Ah, it might be viable. (I'm not sure though.) I didn't thought that
> some process property we are enforcing is available.

Well, ereport() relies on this pretty extensively, so it seems okay to
rely on it for this too.

> The difference of internal behavior is added to unify the external
> (apparenet) behavior. It prevents WER dialog from showing while
> running on console. Service doesn't show a dialog even if WER is
> enabled.

Yeah, that seems to be what we want (namely, not to get the server
process blocked waiting for the user to click on the dialog box).

> The remaining issue is we cannot obtain a dump when running
> on console without being blocked by a dialog, but Windows just doesn't
> allow it.. (Might be possible if we ignore Windows 7? 8? and earlier)

I don't know what a "dump" is in this context, but in the errbacktrace
thread I linked to a Stackoverflow question where they mentioned the API
to obtain a stack backtrace for a process in Windows.  Maybe logging
that much info is sufficient for most cases.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [bug fix] Produce a crash dump before main() on Windows

From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Jul-01, Tsunakawa, Takayuki wrote:

> From: Amit Kapila [mailto:amit.kapila16@gmail.com]
> > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need
> > some more inputs from other developers on whether to fix this or not,
> > so ideally the status of this patch should be 'Needs Review'.  Why it
> > is in 'Waiting on Author' state?  If something is required from
> > Author, then do we expect to see the updated patch in the next few
> > days?
> 
> Thank you for paying attention to this.  I think the patch is good,
> but someone else may have a different solution.  So I marked it as
> needs review.

Tsunakawa-san, there's been some discussion downthread.  Could you
please submit an updated version of the patch?  I would like to get it
pushed before beta4 next week, if possible.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [bug fix] Produce a crash dump before main() on Windows

From
Tom Lane
Date:
Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes:
> Tsunakawa-san, there's been some discussion downthread.  Could you
> please submit an updated version of the patch?  I would like to get it
> pushed before beta4 next week, if possible.

[ confused... ]  I haven't been paying attention to this thread,
but is it really something we'd push into v12 at this point?

            regards, tom lane



Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Thu, Sep 05, 2019 at 11:49:03PM -0400, Tom Lane wrote:
> Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes:
>> Tsunakawa-san, there's been some discussion downthread.  Could you
>> please submit an updated version of the patch?  I would like to get it
>> pushed before beta4 next week, if possible.
>
> [ confused... ]  I haven't been paying attention to this thread,
> but is it really something we'd push into v12 at this point?

The last patch submitted is here:
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8ECF73@G01JPEXMBYT05
And based on the code paths it touches I would recommend to not play
with REL_12_STABLE at this stage.
--
Michael

Attachment

RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> The last patch submitted is here:
> https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D
> 1F8ECF73@G01JPEXMBYT05
> And based on the code paths it touches I would recommend to not play with
> REL_12_STABLE at this stage.

I'm reading the thread to see what I should do...  Anyway, I don't think we should rush to include this fix in PG 12,
too. But at the same time, I don't think it would be dangerous to put in PG 12.
 


Regards
Takayuki Tsunakawa




Re: [bug fix] Produce a crash dump before main() on Windows

From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Sep-05, Tom Lane wrote:

> Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes:
> > Tsunakawa-san, there's been some discussion downthread.  Could you
> > please submit an updated version of the patch?  I would like to get it
> > pushed before beta4 next week, if possible.
> 
> [ confused... ]  I haven't been paying attention to this thread,
> but is it really something we'd push into v12 at this point?

Well, IMV this is a backpatchable, localized bug fix.  I was thinking it
would be better to get some coverage in the upcoming beta, where it can
be fixed without too much harm if there are problems with it, before
backpatching further.  But if even that is considered too dangerous, I
guess we could put it in master for a while, and consider a backpatch
later.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [bug fix] Produce a crash dump before main() on Windows

From
Tom Lane
Date:
Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes:
> On 2019-Sep-05, Tom Lane wrote:
>> [ confused... ]  I haven't been paying attention to this thread,
>> but is it really something we'd push into v12 at this point?

> Well, IMV this is a backpatchable, localized bug fix.

I dunno.  This thread is approaching two years old, and a quick
review shows few signs that we actually have any consensus on
making behavioral changes here.  If there is any consensus,
it's that the SetErrorMode calls should depend on checking
pgwin32_is_service(); but we haven't even seen a patch that
does that, let alone tested it.  I think we're way too close
to beta4 wrap to be considering pushing such a patch the moment
it appears.

BTW, I also violently dislike taking Windows-specific stuff out of
startup_hacks where it belongs and putting it into main() where
it doesn't.  I think the entire Windows bit in front of get_progname
should migrate into startup_hacks.  Surely the odds of failure
inside get_progname are not worth worrying about --- they seem
significantly less than the odds of failure inside
pgwin32_is_service(), for instance.

            regards, tom lane



RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> writes:
> > Well, IMV this is a backpatchable, localized bug fix.
> 
> I dunno.  This thread is approaching two years old, and a quick
> review shows few signs that we actually have any consensus on
> making behavioral changes here.  If there is any consensus,
> it's that the SetErrorMode calls should depend on checking
> pgwin32_is_service(); but we haven't even seen a patch that
> does that, let alone tested it.  I think we're way too close
> to beta4 wrap to be considering pushing such a patch the moment
> it appears.

We don't have to call pgwin32_is_service() to determine whether we call SetErrorMode() in postmaster.c because:

* The dialog box doesn't appear under Windows service, whether PostgreSQL itself starts as a service or another
(server)application runs as a service and does "pg_ctl start."
 

* It's OK for the dialog box to appear when the user runs "pg_ctl start" on the command prompt.  That usage is
interactive,and should not be for production use (postgres processes vanish when the user mistakenly presses Ctrl+C, or
closesthe command prompt).  Even now, the dialog box pops up if postmaster crashes before main().
 


> BTW, I also violently dislike taking Windows-specific stuff out of
> startup_hacks where it belongs and putting it into main() where
> it doesn't.  I think the entire Windows bit in front of get_progname
> should migrate into startup_hacks.  Surely the odds of failure
> inside get_progname are not worth worrying about --- they seem
> significantly less than the odds of failure inside
> pgwin32_is_service(), for instance.

Agreed.  Fixed.  I changed the CF status to "need review."


Regards
Takayuki Tsunakawa


Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Sep-10, Tsunakawa, Takayuki wrote:

> Agreed.  Fixed.  I changed the CF status to "need review."

I have changed it again to "ready for committer".  We could really use
help from a windows-enabled committer on this patch, I think.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Tue, Sep 10, 2019 at 06:44:48AM +0000, Tsunakawa, Takayuki wrote:
> We don't have to call pgwin32_is_service() to determine whether we
> call SetErrorMode() in postmaster.c because:
>
> * The dialog box doesn't appear under Windows service, whether
> * PostgreSQL itself starts as a service or another (server)
> * application runs as a service and does "pg_ctl start."
>
> * It's OK for the dialog box to appear when the user runs "pg_ctl
> * start" on the command prompt.  That usage is interactive, and
> * should not be for production use (postgres processes vanish when
> * the user mistakenly presses Ctrl+C, or closes the command prompt).
> * Even now, the dialog box pops up if postmaster crashes before
> * main().

Reading the thread again from scratch, I am under the impression that
we do not have an actual consensus on this patch :)

Doesn't your first bullet point actually contradict the second one?
Your second point looks sensible from a usage point of view (aka show
a box when pg_ctl start fails from the command prompt because somebody
typed the command manually), however I am really worried about the
first point.  Imagine an application which relies on Postgres, still
does *not* start it as a service but uses "pg_ctl start"
automatically.  This could be triggered as part of another service
startup which calls say system(), or as another script.  Wouldn't that
cause potentially a freeze?  I am also not sure that I quite
understand why a popup would never show up if a different service
startup triggers pg_ctl start by itself that fails.
--
Michael

Attachment

RE: [bug fix] Produce a crash dump before main() on Windows

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> Imagine an application which relies on Postgres, still does *not* start
> it as a service but uses "pg_ctl start"
> automatically.  This could be triggered as part of another service startup
> which calls say system(), or as another script.  Wouldn't that cause
> potentially a freeze?  

Do you mean by "another service startup":

another service (Windows service)
-> an application (not a Windows service)
  -> pg_ctl start

Then pg_ctl runs under the Windows service environment, and the popup won't appear.


> I am also not sure that I quite understand why a
> popup would never show up if a different service startup triggers pg_ctl
> start by itself that fails.

I experimented it with a sample program that I showed in this thread.  A program invoked by a Windows services also
runsin the service.
 


Regards
Takayuki Tsunakawa






Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Wed, Sep 11, 2019 at 04:15:24AM +0000, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:michael@paquier.xyz]
>> Imagine an application which relies on Postgres, still does *not* start
>> it as a service but uses "pg_ctl start"
>> automatically.  This could be triggered as part of another service startup
>> which calls say system(), or as another script.  Wouldn't that cause
>> potentially a freeze?
>
> Do you mean by "another service startup":
>
> another service (Windows service)
> -> an application (not a Windows service)
>   -> pg_ctl start
>
> Then pg_ctl runs under the Windows service environment, and the
> popup won't appear.

Yes, I meant a pg_ctl start command invoked as par of another service
startup in the way you describe it here.

>> I am also not sure that I quite understand why a
>> popup would never show up if a different service startup triggers pg_ctl
>> start by itself that fails.
>
> I experimented it with a sample program that I showed in this
> thread.  A program invoked by a Windows services also runs in the
> service.

Okay, so this means that the service state of the parent is inherited
and that we can rely on that.  I was not sure about that.  Got it
now.  Now, there are cases where it is possible to start an app
without using a service though, no?  For example using the task
manager it is possible to trigger an app when the OS starts.  What
happens in the case where we have an app which starts Postgres using
"pg_ctl start" as part of its startup process but it fails and pops an
error window, freezing the whole?  Can this happen?  Sorry to sound
noisy, but I am rather scared of the potential consequences of this
change without a GUC to control it as Craig mentioned at the beginning
of the thread.  Perhaps that's a niche case, but it could be annoying
for some users.
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Wed, Sep 11, 2019 at 01:55:45PM +0900, Michael Paquier wrote:
> Okay, so this means that the service state of the parent is inherited
> and that we can rely on that.  I was not sure about that.  Got it
> now.  Now, there are cases where it is possible to start an app
> without using a service though, no?  For example using the task
> manager it is possible to trigger an app when the OS starts.  What
> happens in the case where we have an app which starts Postgres using
> "pg_ctl start" as part of its startup process but it fails and pops an
> error window, freezing the whole?  Can this happen?  Sorry to sound
> noisy, but I am rather scared of the potential consequences of this
> change without a GUC to control it as Craig mentioned at the beginning
> of the thread.  Perhaps that's a niche case, but it could be annoying
> for some users.

We still have a patch registered in the CF for that thread, and it
seems to me that those comments have not been answered yet.  Is that
overthinking?  Thoughts?
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On Wed, 18 Jul 2018 at 12:10, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Wed, 18 Jul 2018 11:12:06 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YHv0KfWhA+Z=UVydpvLQ-QyLaidBqpHxQ=YqTPiDGG6dg@mail.gmail.com>
> On 26 February 2018 at 12:06, Tsunakawa, Takayuki <
> tsunakawa.takay@jp.fujitsu.com> wrote:
>
> > From: Craig Ringer [mailto:craig@2ndquadrant.com]
> > > The patch proposed here means that early crashes will invoke WER. If
> > we're
> > > going to allow WER we should probably just do so unconditionally.
> > >
> > > I'd be in favour of leaving WER on when we find out we're in a
> > noninteractive
> > > service too, but that'd be a separate patch for pg11+ only.
> >
> > As for PG11+, I agree that we want to always leave WER on.  That is, call
> > SetErrorMode(SEM_FAILCRITICALERRORS) but not specify
> > SEM_NOGPFAULTERRORBOX.  The problem with the current specification of
> > PostgreSQL is that the user can only get crash dumps in a fixed folder
> > $PGDATA\crashdumps.  That location is bad because the crash dumps will be
> > backed up together with the database cluster without the user noticing it.
> > What's worse, the crash dumps are large.  With WER, the user can control
> > the location and size of crash dumps.
> >
>
> Yeah, that's quite old and dates back to when Windows didn't offer much if
> any control over WER in services.

Yeah. If we want to take a crash dump, we cannot have
auto-restart. Since it is inevitable what we can do for this
would be adding a new knob for that, which cannot be turned on
together with restart_after_crash...?

Why?

Mind you, I don't much care about restart_after_crash, I think it's thoroughly obsolete. Windows has been capable of restarting failed services forever, and systemd does so too. There's little reason to have postgres try to do its own self-recovery now, and I prefer to disable it so the postmaster can cleanly exit and get a fresh new launch.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: [bug fix] Produce a crash dump before main() on Windows

From
Craig Ringer
Date:
On Mon, 23 Jul 2018 at 16:45, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jul 23, 2018 at 08:16:52AM +0000, Tsunakawa, Takayuki wrote:
> I guess that is due to some missing files related to the libraries
> listed in shared_preload_library.  If so, no, because this patch
> relates to failure before main().

No, I really mean a library dependency failure.  For example, imagine
that Postgres is compiled on Windows dynamically, and that it depends on
libxml2.dll, which is itself compiled dynamically.  Then imagine, in a
custom build echosystem, that a folk comes in and adds lz support to
libxml2 on Windows.  If Postgres still consumes libxml2 but does not add
in its PATH a version of lz, then a backend in need of libxml2 would
fail to load, causing Postgres to not start properly.  True, painful,
story.

What's super fun about this is the error message. Guess which error Windows will emit? (Some paraphrasing for exact wording):

* "There was a problem starting C:\Program Files\PostgreSQL\11\bin\postgres.exe: The specified module could not be found."

* "There was a problem starting C:\Program Files\PostgreSQL\11\bin\postgres.exe: module "libxml2.dll" could not be found."
 
* "The program can't start because "libxml2.dll" is missing from your computer."

* "The program "C:\Program Files\PostgreSQL\11\bin\postgres.exe" can't start because an error was encountered when loading required library "C:\Program Files\PostgreSQL\11\lib\libxml.dll": libxml.dll requires "liblz.dll" but it could not be found."

Hint: not the last one.

It'll at best complain about libxml.dll being missing, despite it being very obviously present.

I had to go hunt around with Dependency Walker to figure out the actual missing DLL the last time I had to deal with this.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Sun, Nov 10, 2019 at 06:03:08PM +0800, Craig Ringer wrote:
> I had to go hunt around with Dependency Walker to figure out the actual
> missing DLL the last time I had to deal with this.

Ahah.  Yes, That's exactly what I used a couple of years back on that.
These errors are hard to track and find.
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Sun, Nov 10, 2019 at 06:03:08PM +0800, Craig Ringer wrote:
>> I had to go hunt around with Dependency Walker to figure out the actual
>> missing DLL the last time I had to deal with this.

> Ahah.  Yes, That's exactly what I used a couple of years back on that.
> These errors are hard to track and find.

I feel the pain on this --- indirect dependencies are not well reported
on any platform that I deal with :-(.  Still, I'm having a hard time
convincing myself that this patch is a good idea.  It seems very likely
that it will break some startup scenarios, and the incremental benefit
in error reporting seems marginal.

            regards, tom lane



Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Sun, Nov 10, 2019 at 11:24:34AM -0500, Tom Lane wrote:
> I feel the pain on this --- indirect dependencies are not well reported
> on any platform that I deal with :-(.  Still, I'm having a hard time
> convincing myself that this patch is a good idea.  It seems very likely
> that it will break some startup scenarios, and the incremental benefit
> in error reporting seems marginal.

Thanks for your input, Tom.  This matches with my opinion about this
thread and this patch.
--
Michael

Attachment

Re: [bug fix] Produce a crash dump before main() on Windows

From
Michael Paquier
Date:
On Mon, Nov 11, 2019 at 09:02:01AM +0900, Michael Paquier wrote:
> Thanks for your input, Tom.  This matches with my opinion about this
> thread and this patch.

And switched the CF entry of the patch as rejected.
--
Michael

Attachment