Thread: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

From
Dmitry Ivanov
Date:
Recently I've encountered a strange crash while calling elog(ERROR, "...")
after the WaitForBackgroundWorkerShutdown() function. It turns out that
_returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
since they leave PG_exception_stack pointing to a local struct in a dead
frame, which is an obvious UB. I've attached a patch which fixes this behavior
in the aforementioned function, but there might be more errors like that
elsewhere.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

From
Dmitry Ivanov
Date:
> Recently I've encountered a strange crash while calling elog(ERROR, "...")
> after the WaitForBackgroundWorkerShutdown() function. It turns out that
> _returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
> since they leave PG_exception_stack pointing to a local struct in a dead
> frame, which is an obvious UB. I've attached a patch which fixes this
> behavior in the aforementioned function, but there might be more errors
> like that elsewhere.

Forgot some curly braces, my bad. v2 attached.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: Bug in WaitForBackgroundWorkerShutdown()[REL9_5_STABLE]

From
Yury Zhuravlev
Date:
Dmitry Ivanov wrote:
>> Recently I've encountered a strange crash while calling elog(ERROR, "...")
>> after the WaitForBackgroundWorkerShutdown() function. It turns out that
>> _returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
>> since they leave PG_exception_stack pointing to a local struct in a dead
>> frame, which is an obvious UB. I've attached a patch which fixes this
>> behavior in the aforementioned function, but there might be more errors
>> like that elsewhere.
>
> Forgot some curly braces, my bad. v2 attached.
>

Good catch. But in 9.6 it has been fixed by
db0f6cad4884bd4c835156d3a720d9a79dbd63a9 commit.


--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Bug in WaitForBackgroundWorkerShutdown()[REL9_5_STABLE]

From
Tom Lane
Date:
Yury Zhuravlev <u.zhuravlev@postgrespro.ru> writes:
> Dmitry Ivanov wrote:
>>> Recently I've encountered a strange crash while calling elog(ERROR, "...")
>>> after the WaitForBackgroundWorkerShutdown() function. It turns out that
>>> _returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
>>> since they leave PG_exception_stack pointing to a local struct in a dead
>>> frame, which is an obvious UB. I've attached a patch which fixes this
>>> behavior in the aforementioned function, but there might be more errors
>>> like that elsewhere.

> Good catch. But in 9.6 it has been fixed by
> db0f6cad4884bd4c835156d3a720d9a79dbd63a9 commit.

Only accidentally, I'd say.  I chose to push this into HEAD as well,
so that WaitForBackgroundWorkerShutdown would look more like
WaitForBackgroundWorkerStartup, and would not have a risk of future
breakage if someone adds code after the for-loop.

I was pretty worried about the "more errors like that" point, because
we recently fixed another similar mistake in plpython, cf 1d2fe56e4.
However, a search using the attached quick-hack script to look for
"return" etc between PG_TRY and PG_CATCH did not find any other
instances.

It'd be nice to have some automatic way of catching future mistakes
like this, but I'm not sure about a reliable way to do that.
One idea is to add an Assert to PG_CATCH:

 #define PG_TRY()  \
     do { \
         sigjmp_buf *save_exception_stack = PG_exception_stack; \
         ErrorContextCallback *save_context_stack = error_context_stack; \
         sigjmp_buf local_sigjmp_buf; \
         if (sigsetjmp(local_sigjmp_buf, 0) == 0) \
         { \
             PG_exception_stack = &local_sigjmp_buf

 #define PG_CATCH()    \
+            Assert(PG_exception_stack == &local_sigjmp_buf); \
         } \
         else \
         { \
             PG_exception_stack = save_exception_stack; \
             error_context_stack = save_context_stack

but there are a couple of disadvantages to this.  One is that you don't
find out about a problem unless the bad code path is actually exercised.
I'm afraid that the most tempting places to do something like this would
be low-probability error cases, and thus that the Assert would do little
to catch them.  (Case in point: the postmaster-death path fixed in this
patch would surely never have been caught by testing.)  The other and
really worse problem is that when the Assert does fire, it's nowhere near
the scene of the crime, so while it may tell you there's a problem it
does not give much help in locating the bug.

Anyone have a better idea?

            regards, tom lane

#!/bin/sh

# search DIR for PG_TRY blocks containing WORD
# return, break, continue, goto are good things to look for

dir="$1"
word="$2"

for f in `find "$dir" -type f -name '*.c'`
do
  sed -n -e '/PG_TRY()/,/PG_CATCH()/ { /'"$word"'/ =; /'"$word"'/ p }' $f | \
  sed -e '/^[0-9]\+$/ { s|^[0-9]\+$|'"$f"': &: |; N }'
done

exit 0