On Fri, Aug 11, 2023 at 12:26 PM Andres Freund <andres@anarazel.de> wrote:
> > For example, dealing with core dumps left behind by the regression
> > tests can be annoying.
>
> Hm. I don't have a significant problem with that. But I can see it being
> problematic. Unfortunately, short of preventing core dumps from happening,
> I don't think we really can do much about that - whatever is running the tests
> shouldn't have privileges to change system wide settings about where core
> dumps end up etc.
I was unclear. I wasn't talking about managing core dumps. I was
talking about using core dumps to get a simple backtrace, that just
gives me some very basic information. I probably shouldn't have even
mentioned core dumps, because what I'm really concerned about is the
workflow around getting very basic information about assertion
failures. Not around core dumps per se.
The inconsistent approach needed to get a simple, useful backtrace for
assertion failures (along with other basic information associated with
the failure) is a real problem. Particularly when running the tests.
Most individual assertion failures that I see are for code that I'm
practically editing in real time. So shaving cycles here really
matters.
For one thing the symbol mangling that we have around the built-in
backtraces makes them significantly less useful. I really hope that
your libbacktrace patch gets committed soon, since that looks like it
would be a nice quality of life improvement, all on its own.
It would also be great if the tests spit out information about
assertion failures that was reasonably complete (backtrace without any
mangling, query text included, other basic context), reliably and
uniformly -- it shouldn't matter if it's from TAP or pg_regress test
SQL scripts. Which kind of test happened to be involved is usually not
interesting to me here (even the query text won't usually be
interesting), so being forced to think about it slows me down quite a
lot.
> > Don't you also hate it when there's a regression.diffs that just shows 20k
> > lines of subtractions? Perhaps you don't -- perhaps your custom setup makes
> > it quick and easy to get relevant information about what actually went
> > wrong.
>
> I do really hate that. At the very least we should switch to using
> restart-after-crash by default, and not start new tests once the server has
> crashed and do a waitpid(postmaster, WNOHANG) after each failing test, to see
> if the reason the test failed is that the backend died.
+1
--
Peter Geoghegan