Windows crash / abort handling - Mailing list pgsql-hackers

From Andres Freund
Subject Windows crash / abort handling
Date
Msg-id 20211005193033.tg4pqswgvu3hcolm@alap3.anarazel.de
Whole thread Raw
Responses Re: Windows crash / abort handling  (Craig Ringer <craig.ringer@enterprisedb.com>)
Re: Windows crash / abort handling  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Hi,

As threatened in [1]... For CI, originally in the AIO project but now more
generally, I wanted to get windows backtraces as part of CI. I also was
confused why visual studio's "just in time debugging" (i.e. a window popping
up offering to debug a process when it crashes) didn't work with postgres.

My first attempt was to try to use the existing crashdump stuff in
pgwin32_install_crashdump_handler(). That's not really quite what I want,
because it only handles postmaster rather than any binary, but I thought it'd
be a good start. But outside of toy situations it didn't work for me.

A bunch of debugging later I figured out that the reason neither the
SetUnhandledExceptionFilter() nor JIT debugging works is that the
SEM_NOGPFAULTERRORBOX in the
  SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
we do in startup_hacks() prevents the paths dealing with crashes from being
reached.

The SEM_NOGPFAULTERRORBOX hails from:

commit 27bff7502f04ee01237ed3f5a997748ae43d3a81
Author: Bruce Momjian <bruce@momjian.us>
Date:   2006-06-12 16:17:20 +0000

    Prevent Win32 from displaying a popup box on backend crash.  Instead let
    the postmaster deal with it.

    Magnus Hagander


I actually see error popups despite SEM_NOGPFAULTERRORBOX, at least for paths
reaching abort() (and thus our assertions).

The reason for abort() error boxes not being suppressed appears to be that in
debug mode a separate facility is reponsible for that: [2], [3]

"The default behavior is to print the message. _CALL_REPORTFAULT, if set,
specifies that a Watson crash dump is generated and reported when abort is
called. By default, crash dump reporting is enabled in non-DEBUG builds."

We apparently need _set_abort_behavior(_CALL_REPORTFAULT) to have abort()
behave the same between debug and release builds. [4]


To prevent the error popups we appear to at least need to call
_CrtSetReportMode(). The docs say:

  If you do not call _CrtSetReportMode to define the output destination of
  messages, then the following defaults are in effect:

      Assertion failures and errors are directed to a debug message window.

We can configure it so that that stuff goes to stderr, by calling
    _CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
    _CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
(and the same for _CRT_ERROR and perhaps _CRT_WARNING)
which removes the default _CRTDBG_MODE_WNDW.

It's possible that we'd need to do more than this, but this was sufficient to
get crash reports for segfaults and abort() in both assert and release builds,
without seeing an error popup.


To actually get the crash reports I ended up doing the following on the OS
level [5]:

    Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug' -Name 'Debugger' -Value
'\"C:\WindowsKits\10\Debuggers\x64\cdb.exe\" -p %ld -e %ld -g -kqm -c \".lines -e; .symfix+ ;.logappend
c:\cirrus\crashlog.txt; !peb; ~*kP ; .logclose ; q \"' ; `
 
    New-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug' -Name 'Auto' -Value 1
-PropertyTypeDWord ; `
 
    Get-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug' -Name Debugger; `

This requires 'cdb' to be present, which is included in the Windows 10 SDK (or
other OS versions, it doesn't appear to have changed much). Whenever there's
an unhandled crash, cdb.exe is invoked with the parameters above, which
appends the crash report to crashlog.txt.

Alternatively we can generate "minidumps" [6], but that doesn't appear to be more
helpful for CI purposes at least - all we'd do is to create a backtrace using
the same tool. But it might be helpful for local development, to e.g. analyze
crashes in more detail.

The above ends up dumping all crashes into a single file, but that can
probably be improved. But cdb is so gnarly that I wanted to stop looking once
I got this far...


Andrew, I wonder if something like this could make sense for windows BF animals?


Greetings,

Andres Freund

[1] https://postgr.es/m/20211001222752.wrz7erzh4cajvgp6%40alap3.anarazel.de
[2] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/crtsetreportmode?view=msvc-160
[3] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/set-abort-behavior?view=msvc-160
[4] If anybody can explain to me what the two different parameters to
    _set_abort_behavior() do, I'd be all ears
[5] https://docs.microsoft.com/en-us/windows/win32/debug/configuring-automatic-debugging
[6] https://docs.microsoft.com/en-us/windows/win32/wer/wer-settings



pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: .ready and .done files considered harmful
Next
From: Mark Dilger
Date:
Subject: Re: Role Self-Administration