Re: Windows crash / abort handling - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Windows crash / abort handling
Date
Msg-id CAGRY4nz_rN-5fD+AY4F4te1sNGOAoYYYP6VD6CKRSob2eMZJuA@mail.gmail.com
Whole thread Raw
In response to Windows crash / abort handling  (Andres Freund <andres@anarazel.de>)
Responses Re: Windows crash / abort handling  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers


On Wed, 6 Oct 2021, 03:30 Andres Freund, <andres@anarazel.de> wrote:
Hi,


My first attempt was to try to use the existing crashdump stuff in
pgwin32_install_crashdump_handler(). That's not really quite what I want,
because it only handles postmaster rather than any binary, but I thought it'd
be a good start. But outside of toy situations it didn't work for me.

Odd. It usually has for me, and definitely not limited to the postmaster. But it will fall down for OOM, smashed stack, and other states where in-process self-debugging is likely to fail.

A bunch of debugging later I figured out that the reason neither the
SetUnhandledExceptionFilter() nor JIT debugging works is that the
SEM_NOGPFAULTERRORBOX in the
  SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
we do in startup_hacks() prevents the paths dealing with crashes from being
reached.

Right.

I patch this out when working on windows because it's a real pain.

I keep meaning to propose that we remove this functionality entirely. It's obsolete. It was introduced back in the days where DrWatson.exe "windows error reporting") used to launch an interactive prompt asking the user what to do when a process crashed. This would block the crashed process from exiting, making everything grind to a halt until the user interacted with the 
UI. Even for a service process.

Not fun on a headless or remote server.

These days Windows handles all this a lot more sensibly, and blocking crash reporting is quite obsolete and unhelpful.

I'd like to just remove it.

If we can't do that I'd like to at least make it optional.

Alternatively we can generate "minidumps" [6], but that doesn't appear to be more
helpful for CI purposes at least - all we'd do is to create a backtrace using
the same tool. But it might be helpful for local development, to e.g. analyze
crashes in more detail.

They're immensely helpful when a bt isn't enough, but BTs are certainly the first step for CI use. 

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A test for replay of regression tests
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #17212: pg_amcheck fails on checking temporary relations