Thread: segfault tied to "IS JSON predicate" commit
I find that if I run the following test against a standard debug build on HEAD, my local installation reliably segfaults: $ meson test --setup running --suite test_rls_hooks-running Attached is a "bt full" run from gdb against a core dump. The query "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the backend segfaults. The top frame of the back trace is suggestive of a use-after-free: #0 copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187 187 switch (nodeTag(from)) ... "git bisect" suggests that the problem began at commit 6ee30209, "SQL/JSON: support the IS JSON predicate". It's a bit surprising that the bug reproduces when I run a standard test, and yet we appear to have a bug that's about 2 weeks old. There may be something unusual about my system that will turn out to be relevant -- though there is nothing particularly exotic about this machine. My repro doesn't rely on concurrent execution, or timing, or anything like that -- it's quite reliable. -- Peter Geoghegan
Attachment
On Thu, Apr 13, 2023 at 09:14:01PM -0700, Peter Geoghegan wrote: > I find that if I run the following test against a standard debug build > on HEAD, my local installation reliably segfaults: > > $ meson test --setup running --suite test_rls_hooks-running > > Attached is a "bt full" run from gdb against a core dump. The query > "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the > backend segfaults. > > The top frame of the back trace is suggestive of a use-after-free: > > #0 copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187 > 187 switch (nodeTag(from)) > ... > > "git bisect" suggests that the problem began at commit 6ee30209, > "SQL/JSON: support the IS JSON predicate". > > It's a bit surprising that the bug reproduces when I run a standard > test, and yet we appear to have a bug that's about 2 weeks old. There > may be something unusual about my system that will turn out to be > relevant -- though there is nothing particularly exotic about this > machine. My repro doesn't rely on concurrent execution, or timing, or > anything like that -- it's quite reliable. I was able to reproduce this yesterday but not today. I think what happened is that you (and I) are in the habbit of running "meson test tmp_install" to compile new binaries and install them into ./tmp_install, and then run a server out from there. But nowadays there's also "meson test install_test_files". I'm not sure what combination of things are out of sync, but I suspect you forgot one of 0) compile *and* install the new binaries; or 1) restart the running postmaster; or, 2) install the new shared library ("test files"). I saw the crash again when I did this: time ninja time meson test tmp_install install_test_files regress/regress # does not recompile, BTW ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data -p 5678 -c autovacuum=no& git checkout HEAD~222 time meson test tmp_install install_test_files time PGPORT=5678 meson test --setup running test_rls_hooks-running/regress In this case, I'm not sure if there's anything to blame meson for; the issue is running server, which surely has different structures since last month. -- Justin
On Sat, Apr 15, 2023 at 2:46 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I think what happened is that you (and I) are in the habbit of running > "meson test tmp_install" to compile new binaries and install them into > ./tmp_install, and then run a server out from there. That's not my habit; this is running against a server that was installed into a dedicated install directory. Though I agree that an issue with the environment seems likely. > But nowadays > there's also "meson test install_test_files". That only applies with "--setup tmp_install", which is the default test setup, and the one that you must be using implicitly. But I'm using "--setup running" for this. More concretely, the tmp_install setup has the tests you say are requirements: $ meson test --setup tmp_install --list | grep install postgresql:setup / tmp_install postgresql:setup / install_test_files But not the running setup: $ meson test --setup running --list | grep install | wc -l 0 I'm aware of the requirement around specifying "--suite tmp_install ..." right before "... --suite what_i_really_want_to_test" is specified. However, I can't see how it could be my fault for forgetting that, since it's structurally impossible to specify "--suite tmp_install" when using "--setup running". I was using the setup that gives you behavior that's approximately equivalent to "make installcheck" (namely "--setup running"), so naturally this would have been impossible. Let's review: * There are two possible --setup modes. I didn't use the default (which is "--setup tmp_install") here. Rather, I used "--setup running", which is kinda like "make installcheck". * There is a test suite named "setup", though it's only available with "--setup tmp_install", the default setup. (This is not to be confused with the meson-specific notion of a --setup.) * The "setup" suite happens to contain an individual test called "tmp_install" (as well as one called "install_test_files") * I cannot possibly have forgotten this, since asking for it with "--setup running" just doesn't work. Let's demonstrate what I mean. The following does not and cannot work, so I cannot have forgotten to do it in any practical sense: $ meson test --setup running postgresql:setup / tmp_install ninja: no work to do. No suitable tests defined. Such an incantation can only be expected to work with --setup tmp_install, the default. So this version does work: $ meson test --setup tmp_install postgresql:setup / tmp_install **SNIP** 1/1 postgresql:setup / tmp_install OK 0.72s **SNIP** Not confusing at all! -- Peter Geoghegan
On Sat, Apr 15, 2023 at 4:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > $ meson test --setup tmp_install --list | grep install > postgresql:setup / tmp_install > postgresql:setup / install_test_files > > But not the running setup: > > $ meson test --setup running --list | grep install | wc -l > 0 There is a concrete problem here: commit b6a0d469ca ("meson: Prevent installation of test files during main install") overlooked "--setup running". It did not add a way for the setup to run "postgresql:setup / install_test_files" (or perhaps something very similar). The segfault must have been caused by unwitting use of a leftover ancient test_rls_hooks.so from before commit b6a0d469ca. My old stale .so must have continued to work for a little while, before it broke. Now that I've fully deleted my install directory, I can see a clear problem, which is much less mysterious than the segfault. Namely, the following doesn't still work: $ meson test --setup running --suite test_rls_hooks-running This time it's not a segfault, though -- it's due to the .so being unavailable. Adding "--suite setup" fixes nothing, since I'm using "--setup running"; while the "--suite running" tests will actually run and install the .so, they won't install it into the installation directory I'm actually using (only into a tmp_install directory). While I was wrong to implicate commit 6ee30209 (the IS JSON commit) at first, there is a bug here. A bug in b6a0d469ca. ISTM that b6a0d469ca has created an unmet need for a "--suite setup-running", which is analogous to "--suite setup" but works with "--setup running". That way there'd at least be a "postgresql:setup-running / install_test_files" test that could be used here, like so: $ meson test --setup running --suite setup-running --suite test_rls_hooks-running But...maybe it would be better to find a way to install the stuff from "postgresql:setup / install_test_files" in a less kludgy, more standard kind of way? I see that the commit message from b6a0d469ca says "there is no way to set up up the build system to install extra things only when told". Is that *really* the case? -- Peter Geoghegan
On Sat, Apr 15, 2023 at 5:15 PM Peter Geoghegan <pg@bowt.ie> wrote: > ISTM that b6a0d469ca has created an unmet need for a "--suite > setup-running", which is analogous to "--suite setup" but works with > "--setup running". That way there'd at least be a > "postgresql:setup-running / install_test_files" test that could be > used here, like so: > > $ meson test --setup running --suite setup-running --suite > test_rls_hooks-running > > But...maybe it would be better to find a way to install the stuff from > "postgresql:setup / install_test_files" in a less kludgy, more > standard kind of way? I see that the commit message from b6a0d469ca > says "there is no way to set up up the build system to install extra > things only when told". Is that *really* the case? I see that CI deals with this problem using this kludge on FreeBSD, which tests "--setup running": meson test $MTEST_ARGS --quiet --suite setup export LD_LIBRARY_PATH="$(pwd)/build/tmp_install/usr/local/pgsql/lib/:$LD_LIBRARY_PATH" That's why CI never failed due to commit b6a0d469ca. This doesn't seem like something that should become standard operating procedure. Not that it is right now, mind you. This isn't documented anywhere, even though "--setup running" is documented (albeit lightly) in the sgml docs. -- Peter Geoghegan