Re: Bug in WaitForBackgroundWorkerShutdown()[REL9_5_STABLE] - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bug in WaitForBackgroundWorkerShutdown()[REL9_5_STABLE]
Date
Msg-id 8848.1470341965@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bug in WaitForBackgroundWorkerShutdown()[REL9_5_STABLE]  (Yury Zhuravlev <u.zhuravlev@postgrespro.ru>)
List pgsql-hackers
Yury Zhuravlev <u.zhuravlev@postgrespro.ru> writes:
> Dmitry Ivanov wrote:
>>> Recently I've encountered a strange crash while calling elog(ERROR, "...")
>>> after the WaitForBackgroundWorkerShutdown() function. It turns out that
>>> _returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
>>> since they leave PG_exception_stack pointing to a local struct in a dead
>>> frame, which is an obvious UB. I've attached a patch which fixes this
>>> behavior in the aforementioned function, but there might be more errors
>>> like that elsewhere.

> Good catch. But in 9.6 it has been fixed by
> db0f6cad4884bd4c835156d3a720d9a79dbd63a9 commit.

Only accidentally, I'd say.  I chose to push this into HEAD as well,
so that WaitForBackgroundWorkerShutdown would look more like
WaitForBackgroundWorkerStartup, and would not have a risk of future
breakage if someone adds code after the for-loop.

I was pretty worried about the "more errors like that" point, because
we recently fixed another similar mistake in plpython, cf 1d2fe56e4.
However, a search using the attached quick-hack script to look for
"return" etc between PG_TRY and PG_CATCH did not find any other
instances.

It'd be nice to have some automatic way of catching future mistakes
like this, but I'm not sure about a reliable way to do that.
One idea is to add an Assert to PG_CATCH:

 #define PG_TRY()  \
     do { \
         sigjmp_buf *save_exception_stack = PG_exception_stack; \
         ErrorContextCallback *save_context_stack = error_context_stack; \
         sigjmp_buf local_sigjmp_buf; \
         if (sigsetjmp(local_sigjmp_buf, 0) == 0) \
         { \
             PG_exception_stack = &local_sigjmp_buf

 #define PG_CATCH()    \
+            Assert(PG_exception_stack == &local_sigjmp_buf); \
         } \
         else \
         { \
             PG_exception_stack = save_exception_stack; \
             error_context_stack = save_context_stack

but there are a couple of disadvantages to this.  One is that you don't
find out about a problem unless the bad code path is actually exercised.
I'm afraid that the most tempting places to do something like this would
be low-probability error cases, and thus that the Assert would do little
to catch them.  (Case in point: the postmaster-death path fixed in this
patch would surely never have been caught by testing.)  The other and
really worse problem is that when the Assert does fire, it's nowhere near
the scene of the crime, so while it may tell you there's a problem it
does not give much help in locating the bug.

Anyone have a better idea?

            regards, tom lane

#!/bin/sh

# search DIR for PG_TRY blocks containing WORD
# return, break, continue, goto are good things to look for

dir="$1"
word="$2"

for f in `find "$dir" -type f -name '*.c'`
do
  sed -n -e '/PG_TRY()/,/PG_CATCH()/ { /'"$word"'/ =; /'"$word"'/ p }' $f | \
  sed -e '/^[0-9]\+$/ { s|^[0-9]\+$|'"$f"': &: |; N }'
done

exit 0

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Keeping CURRENT_DATE and similar constructs in original format
Next
From: Tom Lane
Date:
Subject: Re: New version numbering practices