Thread: segfault tied to "IS JSON predicate" commit

segfault tied to "IS JSON predicate" commit

From
Peter Geoghegan
Date:
I find that if I run the following test against a standard debug build
on HEAD, my local installation reliably segfaults:

$ meson test --setup running --suite test_rls_hooks-running

Attached is a "bt full" run from gdb against a core dump. The query
"EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
backend segfaults.

The top frame of the back trace is suggestive of a use-after-free:

#0  copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
187 switch (nodeTag(from))
...

"git bisect" suggests that the problem began at commit 6ee30209,
"SQL/JSON: support the IS JSON predicate".

It's a bit surprising that the bug reproduces when I run a standard
test, and yet we appear to have a bug that's about 2 weeks old.  There
may be something unusual about my system that will turn out to be
relevant -- though there is nothing particularly exotic about this
machine. My repro doesn't rely on concurrent execution, or timing, or
anything like that -- it's quite reliable.

-- 
Peter Geoghegan

Attachment

Re: segfault tied to "IS JSON predicate" commit

From
Justin Pryzby
Date:
On Thu, Apr 13, 2023 at 09:14:01PM -0700, Peter Geoghegan wrote:
> I find that if I run the following test against a standard debug build
> on HEAD, my local installation reliably segfaults:
> 
> $ meson test --setup running --suite test_rls_hooks-running
> 
> Attached is a "bt full" run from gdb against a core dump. The query
> "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
> backend segfaults.
> 
> The top frame of the back trace is suggestive of a use-after-free:
> 
> #0  copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
> 187 switch (nodeTag(from))
> ...
> 
> "git bisect" suggests that the problem began at commit 6ee30209,
> "SQL/JSON: support the IS JSON predicate".
> 
> It's a bit surprising that the bug reproduces when I run a standard
> test, and yet we appear to have a bug that's about 2 weeks old.  There
> may be something unusual about my system that will turn out to be
> relevant -- though there is nothing particularly exotic about this
> machine. My repro doesn't rely on concurrent execution, or timing, or
> anything like that -- it's quite reliable.

I was able to reproduce this yesterday but not today.

I think what happened is that you (and I) are in the habbit of running
"meson test tmp_install" to compile new binaries and install them into
./tmp_install, and then run a server out from there.  But nowadays
there's also "meson test install_test_files".  I'm not sure what
combination of things are out of sync, but I suspect you forgot one of
0) compile *and* install the new binaries; or 1) restart the running
postmaster; or, 2) install the new shared library ("test files").

I saw the crash again when I did this:

time ninja
time meson test tmp_install install_test_files regress/regress # does not recompile, BTW
./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data -p 5678 -c autovacuum=no&
git checkout HEAD~222
time meson test tmp_install install_test_files
time PGPORT=5678 meson test --setup running test_rls_hooks-running/regress

In this case, I'm not sure if there's anything to blame meson for; the
issue is running server, which surely has different structures since
last month.

-- 
Justin



Re: segfault tied to "IS JSON predicate" commit

From
Peter Geoghegan
Date:
On Sat, Apr 15, 2023 at 2:46 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I think what happened is that you (and I) are in the habbit of running
> "meson test tmp_install" to compile new binaries and install them into
> ./tmp_install, and then run a server out from there.

That's not my habit; this is running against a server that was
installed into a dedicated install directory. Though I agree that an
issue with the environment seems likely.

> But nowadays
> there's also "meson test install_test_files".

That only applies with "--setup tmp_install", which is the default
test setup, and the one that you must be using implicitly. But I'm
using "--setup running" for this.

More concretely, the tmp_install setup has the tests you say are requirements:

$ meson test --setup tmp_install --list | grep install
postgresql:setup / tmp_install
postgresql:setup / install_test_files

But not the running setup:

$ meson test --setup running --list | grep install | wc -l
0

I'm aware of the requirement around specifying "--suite tmp_install
..." right before "... --suite what_i_really_want_to_test" is
specified. However, I can't see how it could be my fault for
forgetting that, since it's structurally impossible to specify
"--suite tmp_install" when using "--setup running". I was using the
setup that gives you behavior that's approximately equivalent to "make
installcheck" (namely "--setup running"), so naturally this would have
been impossible.

Let's review:

* There are two possible --setup modes. I didn't use the default
(which is "--setup tmp_install") here. Rather, I used "--setup
running", which is kinda like "make installcheck".

* There is a test suite named "setup", though it's only available with
"--setup tmp_install", the default setup. (This is not to be confused
with the meson-specific notion of a --setup.)

* The "setup" suite happens to contain an individual test called
"tmp_install"  (as well as one called "install_test_files")

* I cannot possibly have forgotten this, since asking for it with
"--setup running" just doesn't work.

Let's demonstrate what I mean. The following does not and cannot
work, so I cannot have forgotten to do it in any practical sense:

$ meson test --setup running postgresql:setup / tmp_install
ninja: no work to do.
No suitable tests defined.

Such an incantation can only be expected to work with --setup tmp_install, the
default. So this version does work:

$ meson test --setup tmp_install postgresql:setup / tmp_install
**SNIP**
1/1 postgresql:setup / tmp_install        OK               0.72s
**SNIP**

Not confusing at all!

--
Peter Geoghegan



Re: segfault tied to "IS JSON predicate" commit

From
Peter Geoghegan
Date:
On Sat, Apr 15, 2023 at 4:11 PM Peter Geoghegan <pg@bowt.ie> wrote:
> $ meson test --setup tmp_install --list | grep install
> postgresql:setup / tmp_install
> postgresql:setup / install_test_files
>
> But not the running setup:
>
> $ meson test --setup running --list | grep install | wc -l
> 0

There is a concrete problem here: commit b6a0d469ca ("meson: Prevent
installation of test files during main install") overlooked "--setup
running". It did not add a way for the setup to run "postgresql:setup
/ install_test_files" (or perhaps something very similar).

The segfault must have been caused by unwitting use of a leftover
ancient test_rls_hooks.so from before commit b6a0d469ca. My old stale
.so must have continued to work for a little while, before it broke.
Now that I've fully deleted my install directory, I can see a clear
problem, which is much less mysterious than the segfault. Namely, the
following doesn't still work:

$ meson test --setup running --suite test_rls_hooks-running

This time it's not a segfault, though -- it's due to the .so being
unavailable. Adding "--suite setup" fixes nothing, since I'm using
"--setup running"; while the "--suite running" tests will actually run
and install the .so, they won't install it into the installation
directory I'm actually using (only into a tmp_install directory).
While I was wrong to implicate commit 6ee30209 (the IS JSON commit) at
first, there is a bug here. A bug in b6a0d469ca.

ISTM that b6a0d469ca has created an unmet need for a "--suite
setup-running", which is analogous to "--suite setup" but works with
"--setup running". That way there'd at least be a
"postgresql:setup-running / install_test_files" test that could be
used here, like so:

$ meson test --setup running --suite setup-running --suite
test_rls_hooks-running

But...maybe it would be better to find a way to install the stuff from
"postgresql:setup / install_test_files" in a less kludgy, more
standard kind of way? I see that the commit message from b6a0d469ca
says "there is no way to set up up the build system to install extra
things only when told". Is that *really* the case?

--
Peter Geoghegan



Re: segfault tied to "IS JSON predicate" commit

From
Peter Geoghegan
Date:
On Sat, Apr 15, 2023 at 5:15 PM Peter Geoghegan <pg@bowt.ie> wrote:
> ISTM that b6a0d469ca has created an unmet need for a "--suite
> setup-running", which is analogous to "--suite setup" but works with
> "--setup running". That way there'd at least be a
> "postgresql:setup-running / install_test_files" test that could be
> used here, like so:
>
> $ meson test --setup running --suite setup-running --suite
> test_rls_hooks-running
>
> But...maybe it would be better to find a way to install the stuff from
> "postgresql:setup / install_test_files" in a less kludgy, more
> standard kind of way? I see that the commit message from b6a0d469ca
> says "there is no way to set up up the build system to install extra
> things only when told". Is that *really* the case?

I see that CI deals with this problem using this kludge on FreeBSD,
which tests "--setup running":

      meson test $MTEST_ARGS --quiet --suite setup
      export LD_LIBRARY_PATH="$(pwd)/build/tmp_install/usr/local/pgsql/lib/:$LD_LIBRARY_PATH"

That's why CI never failed due to commit b6a0d469ca.

This doesn't seem like something that should become standard operating
procedure. Not that it is right now, mind you. This isn't documented
anywhere, even though "--setup running" is documented (albeit lightly)
in the sgml docs.

--
Peter Geoghegan