Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() ) - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )
Date
Msg-id CAMsr+YHX_6=ZPLygVyOyZAA6uScHgTAsT6bn3h7YUPRNcDMXBw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 8 Sep 2020 at 10:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So that's good.  However, a similar experiment with returning from inside
a PG_TRY does *not* produce a warning.  I suspect maybe the compiler
throws up its hands when it sees sigsetjmp?

I find that odd myself, given that in PG_TRY() we:

    PG_exception_stack = &_local_sigjmp_buf;

and a return within that block should count as a pointer to _local_sigjmp_buf escaping. Yet I confirm that I don't get a warning on clang-analyzer 10.0.0 (fedora 32) in my builds either. Even with -O0.

TL;DR:
-------

It's not due to sigsetjmp() at all though; see https://github.com/ringerc/scrapcode/blob/master/c/clang_return_stack_checks/mask_local_escape.c for part of what's going on.

I haven't worked out why sometimes an unrelated stack escape gets hidden though.

Attached patches demonstrate tests and are DEFINITELY NOT proposals for core!

SUMMARY
-----

I found per test details below (links and demo patches supplied, patches NOT proposed for inclusion in pg) that scan-build seems to be able to track a local escape via simple indirection such as copying the address to a local first, but cannot track it when it is assigned to a struct member in a global pointer if that global pointer is then replaced, even if it's restored immediately.


This produces no warnings from scan-build but leaks a pointer to an auto-variable on the stack, then dereferences it to read garbage. I deliberately clobber the stack to ensure that it's predictable garbage, so we get:

$ make run_mask_local_escape
./mask_local_escape
&g = 0x7fff1b31e6bc
&g has escaped: 0x7fff1b31e6bc
value is: 0x7f7f7f7f

I think we're using a similar pattern for PG_error_stack, and that's why we don't get any warnings.

In this case valgrind detects the issue when the bogus data is actually read. Run "valgrind ./mask_local_escape". But I'm pretty sure I've seen other cases where it does not.

So I'm wondering if the explicit __attribute__((callback(cbfn))) approach has some merits after all, for cassert builds. Assuming we care about this error, since it's a bit of an academic exercise at this point. I just needed to understand what was going on and why there was no warning.


DETAILS OF TESTS
-----

If I wrap the sigjmp_buf in a struct defined in Pg itself and allow that to escape I see the same behaviour, so it's not specific to sigjmp_buf.

I tried adding a dummy guard variable to PG_TRY() and PG_END_TRY() solely for the purpose of tracking this sort of escape. But I notice that a warning from scan-build is only emitted in PG_CATCH() or PG_FINALLY(), never from inside the PG_TRY() body. I'd expect to see a warning from either branch, as it's escaping either way.

scan-build will raise other warnings from within the PG_CATCH() block, such as if I introduce a

+       int foo = 0;
+       fprintf(stdout, "%d", 10/foo);

so it's not that scan-build as a whole is failing to take the sigsetjmp() == 0  branch. It raises *other* warnings. This can also be verified by replacing sigsetjmp() with a dummy static inline function that always takes one branch or the other.

I landed up writing the experiment code at https://github.com/ringerc/scrapcode/blob/master/c/clang_return_stack_checks/ to explore the behaviour some more.

The file sigjmp_escape.c explores various combinations of guard variables and early returns. The file sigjmp_escape_hdr.c and its companions sigjmp_escape_hdr_try.{c,h} check whether indirection via separate header and compilation unit makes any difference (it didn't).

Initially I found that in my PG_TRY()-alike test code I got a warning like:

sigjmp_escape_hdr.c:36:4: warning: Address of stack memory associated with local variable '_local_sigjmp_buf' is still referred to by the global variable 'TEST_exception_stack' upon returning to the caller.  This will be a dangling reference
                        return;
                        ^~~~~~

just like I want to see from inside a PG_TRY() block.

But with some further testing I noticed that it seems the checker may not notice escapes where there's even the simplest level of indirection, and also that one candidate escape may mask another. I'm not totally sure yet.

In the sigjmp_escape.c test, if I build it with both inner and outer guards (-DUSE_INNER_GUARDS), scan-build will only complain about the inner guard pointer g_branch_1 escaping. It will not complain about the stack pointer to the outer guard escaping, nor about the possibility of should_return_early causing the sigjmp_buf to escape into the global sigjmp_buf * jbuf. Yet running

    ./sigjmp_escape 1 1

shows that both can occur.

This still doesn't explain how it *also* misses the escape of &b into jbuf. Is it losing track of the unrelated escape when it stops tracking the first one?

The test case sigjmp_escape_hdr doesn't attempt to use the guards, and it reports the sigjmp_buf escape fine as shown above.

If sigjmp_escape.c is built with -DUSE_INNER_GUARDS -DPOP_ONLY_INNER_GUARDS then it raises no warnings at all, BUT has both the sigjmp_buf leak AND a guard variable stack escape as seen by:

   ./sigjmp_escape_inner_pop 1 1

We've seen above that it loses the outer guard pointer escape because it doesn't track any sort of indirection of assignments.

What's a bit surprising is that it *also* fails to complain about the escape of the sigjmp_buf pointer &b into the global variable jbuf in this case, even though it successfully detects the escape with other combinations of guard uses. It's like it can't track multiple things at once properly.

RELATED: detecting return from PG_FINALLY()
--------

Note that even if scan-build did detect the escape of the sigjmp_buf, returning from PG_FINALLY() is a coding error that we cannot detect this way since we've restored PG_errror_stack so there's no local auto ptr to escape. To do that we'd need our own stack that we pushed in PG_TRY() and didn't pop until PG_END_TRY(), like in the demo patch attached. That works, but adds another local to each PG_TRY() that's otherwise useless. If we really felt like it we could use it to produce a poor-man's backtrace through each PG_TRY() at the cost of one sizeof(void) per PG_TRY() + some constdata storage, but that seems like a solution looking for a problem.

Also, AFAICS it's actually harmless, if ugly, to return from PG_CATCH(). It should still be discouraged.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Attachment

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Inconsistent Japanese name order in v13 contributors list
Next
From: Etsuro Fujita
Date:
Subject: Re: Inconsistent Japanese name order in v13 contributors list