Re: AssertLog instead of Assert in some places - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AssertLog instead of Assert in some places
Date
Msg-id 20230811210447.jiwyob7nnaa3kjxq@awork3.anarazel.de
Whole thread Raw
In response to Re: AssertLog instead of Assert in some places  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: AssertLog instead of Assert in some places
List pgsql-hackers
Hi,

On 2023-08-11 13:19:34 -0700, Peter Geoghegan wrote:
> On Fri, Aug 11, 2023 at 12:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > For example, dealing with core dumps left behind by the regression
> > > tests can be annoying.
> >
> > Hm. I don't have a significant problem with that. But I can see it being
> > problematic. Unfortunately, short of preventing core dumps from happening,
> > I don't think we really can do much about that - whatever is running the tests
> > shouldn't have privileges to change system wide settings about where core
> > dumps end up etc.
>
> I was unclear. I wasn't talking about managing core dumps. I was
> talking about using core dumps to get a simple backtrace, that just
> gives me some very basic information. I probably shouldn't have even
> mentioned core dumps, because what I'm really concerned about is the
> workflow around getting very basic information about assertion
> failures. Not around core dumps per se.
>
> The inconsistent approach needed to get a simple, useful backtrace for
> assertion failures (along with other basic information associated with
> the failure) is a real problem. Particularly when running the tests.
> Most individual assertion failures that I see are for code that I'm
> practically editing in real time. So shaving cycles here really
> matters.

Ah, yes. Agreed, obviously.


> For one thing the symbol mangling that we have around the built-in
> backtraces makes them significantly less useful. I really hope that
> your libbacktrace patch gets committed soon, since that looks like it
> would be a nice quality of life improvement, all on its own.

I've been hacking a further on it:
https://github.com/anarazel/postgres/tree/backtrace

Haven't yet posted a new version. Doing it properly for fronted tools has some
dependencies on threading stuff. I'm hoping Thomas' patch for that will go in.


Now it also intercepts segfaults and prints
backtraces for them, if that's possible (it requires libbacktrace to be async
signal safe, which it isn't on all platforms).

Where supported, a crash (distinguishing from assertion failures) will now
report something like:

process with pid: 2900527 received signal: SIGSEGV, si_code: 1, si_addr: 0xdeadbeef
    [0x5628ec45212f] pg_fatalsig_handler+0x20f: ../../../../home/andres/src/postgresql/src/common/pg_backtrace.c:615
    [0x7fc4b743650f] [unknown]
    [0x5628ec14897c] check_root+0x19c (inlined): ../../../../home/andres/src/postgresql/src/backend/main/main.c:393
    [0x5628ec14897c] main+0x19c: ../../../../home/andres/src/postgresql/src/backend/main/main.c:183

after I added
    *(volatile int*)0xdeadbeef = 1;
to check_root().


For signals sent by users, it'd show the pid of the process sending the signal
on most OSs. I really would like some generalized infrastructure for that, so
that we can report for things like query cancellations.


As the patch stands, the only OS that doesn't yet have useful "backtrace on
crash" support with that is windows, as libbacktrace unfortunately isn't
signal safe on windows. But it'd still provide useful backtraces on
Assert()s. I haven't yet figured out whether/when it's required to be signal
safe on windows though - crashes are intercepted by
SetUnhandledExceptionFilter() - I don't understand the precise constraints of
that. Plenty people seem to generate backtraces on crashes on windows using
that, without concerns for signal safety like things.


Currently Macos CI doesn't use libbacktrace, but as it turns out
backtrace_symbols() on windows is a heck of a lot better than on glibc
platforms.  CI for windows with visual studio doesn't have libbacktrace
installed yet (and has the aforementioned signal safety issue), I think it's
installed for windows w/ mingw.


> It would also be great if the tests spit out information about
> assertion failures that was reasonably complete (backtrace without any
> mangling, query text included, other basic context), reliably and
> uniformly -- it shouldn't matter if it's from TAP or pg_regress test
> SQL scripts.

Hm. What other basic context are you thinking of? Pid is obvious. I guess
backend type could be useful too, but normally be inferred from the stack
trace pretty easily. Application name could be useful to know which test
caused the crash.

I'm a bit wary about trying to print things like query text - what if that
string points to memory not terminated by a \0? I guess we could use an
approach similar to pgstat_get_crashed_backend_activity().

One issue with reporting crashes from signal handlers is that the obvious way
to make that signal safe (lots of small writes) leads to the potential for
interspersed lines.  It's probably worth having a statically sized buffer that
will commonly be large enough to print a whole backtrace. When too small, it
should include the pid at the start of every "chunk".


> Which kind of test happened to be involved is usually not interesting to me
> here (even the query text won't usually be interesting), so being forced to
> think about it slows me down quite a lot.

Interesting - I quite often end up spending time trying to dig out which query
from what sql file triggered a crash, so I can try to trigger it in
isolation. I often wished the server knew the source line associated with the
query. Enough that I pondered ways to have psql transport that knowledge to the
the server.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: run pgindent on a regular basis / scripted manner
Next
From: Andres Freund
Date:
Subject: Re: run pgindent on a regular basis / scripted manner