Re: AssertLog instead of Assert in some places - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AssertLog instead of Assert in some places |
Date | |
Msg-id | 20230811210447.jiwyob7nnaa3kjxq@awork3.anarazel.de Whole thread Raw |
In response to | Re: AssertLog instead of Assert in some places (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: AssertLog instead of Assert in some places
|
List | pgsql-hackers |
Hi, On 2023-08-11 13:19:34 -0700, Peter Geoghegan wrote: > On Fri, Aug 11, 2023 at 12:26 PM Andres Freund <andres@anarazel.de> wrote: > > > For example, dealing with core dumps left behind by the regression > > > tests can be annoying. > > > > Hm. I don't have a significant problem with that. But I can see it being > > problematic. Unfortunately, short of preventing core dumps from happening, > > I don't think we really can do much about that - whatever is running the tests > > shouldn't have privileges to change system wide settings about where core > > dumps end up etc. > > I was unclear. I wasn't talking about managing core dumps. I was > talking about using core dumps to get a simple backtrace, that just > gives me some very basic information. I probably shouldn't have even > mentioned core dumps, because what I'm really concerned about is the > workflow around getting very basic information about assertion > failures. Not around core dumps per se. > > The inconsistent approach needed to get a simple, useful backtrace for > assertion failures (along with other basic information associated with > the failure) is a real problem. Particularly when running the tests. > Most individual assertion failures that I see are for code that I'm > practically editing in real time. So shaving cycles here really > matters. Ah, yes. Agreed, obviously. > For one thing the symbol mangling that we have around the built-in > backtraces makes them significantly less useful. I really hope that > your libbacktrace patch gets committed soon, since that looks like it > would be a nice quality of life improvement, all on its own. I've been hacking a further on it: https://github.com/anarazel/postgres/tree/backtrace Haven't yet posted a new version. Doing it properly for fronted tools has some dependencies on threading stuff. I'm hoping Thomas' patch for that will go in. Now it also intercepts segfaults and prints backtraces for them, if that's possible (it requires libbacktrace to be async signal safe, which it isn't on all platforms). Where supported, a crash (distinguishing from assertion failures) will now report something like: process with pid: 2900527 received signal: SIGSEGV, si_code: 1, si_addr: 0xdeadbeef [0x5628ec45212f] pg_fatalsig_handler+0x20f: ../../../../home/andres/src/postgresql/src/common/pg_backtrace.c:615 [0x7fc4b743650f] [unknown] [0x5628ec14897c] check_root+0x19c (inlined): ../../../../home/andres/src/postgresql/src/backend/main/main.c:393 [0x5628ec14897c] main+0x19c: ../../../../home/andres/src/postgresql/src/backend/main/main.c:183 after I added *(volatile int*)0xdeadbeef = 1; to check_root(). For signals sent by users, it'd show the pid of the process sending the signal on most OSs. I really would like some generalized infrastructure for that, so that we can report for things like query cancellations. As the patch stands, the only OS that doesn't yet have useful "backtrace on crash" support with that is windows, as libbacktrace unfortunately isn't signal safe on windows. But it'd still provide useful backtraces on Assert()s. I haven't yet figured out whether/when it's required to be signal safe on windows though - crashes are intercepted by SetUnhandledExceptionFilter() - I don't understand the precise constraints of that. Plenty people seem to generate backtraces on crashes on windows using that, without concerns for signal safety like things. Currently Macos CI doesn't use libbacktrace, but as it turns out backtrace_symbols() on windows is a heck of a lot better than on glibc platforms. CI for windows with visual studio doesn't have libbacktrace installed yet (and has the aforementioned signal safety issue), I think it's installed for windows w/ mingw. > It would also be great if the tests spit out information about > assertion failures that was reasonably complete (backtrace without any > mangling, query text included, other basic context), reliably and > uniformly -- it shouldn't matter if it's from TAP or pg_regress test > SQL scripts. Hm. What other basic context are you thinking of? Pid is obvious. I guess backend type could be useful too, but normally be inferred from the stack trace pretty easily. Application name could be useful to know which test caused the crash. I'm a bit wary about trying to print things like query text - what if that string points to memory not terminated by a \0? I guess we could use an approach similar to pgstat_get_crashed_backend_activity(). One issue with reporting crashes from signal handlers is that the obvious way to make that signal safe (lots of small writes) leads to the potential for interspersed lines. It's probably worth having a statically sized buffer that will commonly be large enough to print a whole backtrace. When too small, it should include the pid at the start of every "chunk". > Which kind of test happened to be involved is usually not interesting to me > here (even the query text won't usually be interesting), so being forced to > think about it slows me down quite a lot. Interesting - I quite often end up spending time trying to dig out which query from what sql file triggered a crash, so I can try to trigger it in isolation. I often wished the server knew the source line associated with the query. Enough that I pondered ways to have psql transport that knowledge to the the server. Greetings, Andres Freund
pgsql-hackers by date: