Re: AssertLog instead of Assert in some places - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: AssertLog instead of Assert in some places |
Date | |
Msg-id | CAH2-Wzmqs8BAweWoneYJzSjFDci=3UTr+dAg2WrPV-vP5DHF3Q@mail.gmail.com Whole thread Raw |
In response to | Re: AssertLog instead of Assert in some places (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Fri, Aug 11, 2023 at 2:04 PM Andres Freund <andres@anarazel.de> wrote: > Where supported, a crash (distinguishing from assertion failures) will now > report something like: > > process with pid: 2900527 received signal: SIGSEGV, si_code: 1, si_addr: 0xdeadbeef > [0x5628ec45212f] pg_fatalsig_handler+0x20f: ../../../../home/andres/src/postgresql/src/common/pg_backtrace.c:615 > [0x7fc4b743650f] [unknown] > [0x5628ec14897c] check_root+0x19c (inlined): ../../../../home/andres/src/postgresql/src/backend/main/main.c:393 > [0x5628ec14897c] main+0x19c: ../../../../home/andres/src/postgresql/src/backend/main/main.c:183 > > after I added > *(volatile int*)0xdeadbeef = 1; > to check_root(). It'll be like living in the future! > For signals sent by users, it'd show the pid of the process sending the signal > on most OSs. I really would like some generalized infrastructure for that, so > that we can report for things like query cancellations. That sounds great. > > It would also be great if the tests spit out information about > > assertion failures that was reasonably complete (backtrace without any > > mangling, query text included, other basic context), reliably and > > uniformly -- it shouldn't matter if it's from TAP or pg_regress test > > SQL scripts. > > Hm. What other basic context are you thinking of? Pid is obvious. I guess > backend type could be useful too, but normally be inferred from the stack > trace pretty easily. Application name could be useful to know which test > caused the crash. > > I'm a bit wary about trying to print things like query text - what if that > string points to memory not terminated by a \0? I guess we could use an > approach similar to pgstat_get_crashed_backend_activity(). I agree that being less verbose by default is good. On second thought even query text isn't all that important. > One issue with reporting crashes from signal handlers is that the obvious way > to make that signal safe (lots of small writes) leads to the potential for > interspersed lines. It's probably worth having a statically sized buffer that > will commonly be large enough to print a whole backtrace. When too small, it > should include the pid at the start of every "chunk". Good idea. > > Which kind of test happened to be involved is usually not interesting to me > > here (even the query text won't usually be interesting), so being forced to > > think about it slows me down quite a lot. > > Interesting - I quite often end up spending time trying to dig out which query > from what sql file triggered a crash, so I can try to trigger it in > isolation. I often wished the server knew the source line associated with the > query. Enough that I pondered ways to have psql transport that knowledge to the > the server. I actually do plenty of that too. My overall point was this: there is likely some kind of pareto principle here. That should guide the sorts of scenarios we optimize for. If you actually benchmarked where I spent time while writing code, minute to minute, I bet it would show that most of the individual debug-compile cycles were triggered by issues that had a fairly simple and immediate cause. Cases where I improve one small thing, and then rerun the tests, which show an assertion failure in nearby code. As soon as I see very basic details I immediately think "duh, of course" in these cases, at which point I'll come up with a likely-good fix in seconds. And then I'll rinse and repeat. My fix might just work (at least to the extent that all tests now pass), but it also might lead to another assertion failure of the same general nature. There are also lots of cases where I really do have to think about recreating the details from the test in order to truly understand what's going on, of course. But there are still way way more individual "duh, of course" assertion failures in practice. Those are where productivity wins are still possible, because the bottleneck isn't just that I have an incomplete mental model that I need to work to expand. Perhaps my experiences here aren't universal. But it seems like they might be roughly the same as everybody else that works on Postgres? Assuming that they are, then the information that is output should be optimized for the "duh, of course" scenarios. Not to an absurd degree, mind you. But the output shouldn't be too verbose. Ideally there'd be a still fairly straightforward way of getting extra information, for the cases where debugging is likely to take a few minutes, and require real focus. The extra work in those other cases is relatively insignificant, because the "startup costs" are relatively large -- a little extra indirection (though only a little) can't hurt too much. -- Peter Geoghegan
pgsql-hackers by date: