Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() ) - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )
Date
Msg-id CAMsr+YF_Tqcq5zrv3emZ_gvZnzur4aE9f4sj8Pndct-a5h1XYg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On Thu, 3 Sep 2020 at 22:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> The attached patch series adds support for detecting coding errors where a
> stack-allocated ErrorContextCallback is not popped from the
> error_context_stack before the variable leaves scope.

So my immediate thoughts about this are

(1) It's mighty invasive for what it accomplishes.  AFAIK we have
had few of this class of bug, and so I'm not excited about touching
every callback use in the tree to add defenses.  It's also not great
that future code additions won't be protected unless they remember
to add these magic annotations.  The PG_TRY application is better since
it's wrapped into the existing macro.

I agree with you there. I'd actually like to change how we set up errcontext callbacks anyway, as I think they're ugly, verbose and error-prone.

See patch 6 for a RFC on that.

With regards to PG_TRY() you may note that patch 5 adds a similar check to detect escapes from PG_TRY() / PG_CATCH() bodies.

(2) I don't like that this is adding runtime overhead to try to detect
such bugs.

Right. Only in cassert builds, though.

Frankly I'd be happy enough even having it as something I can turn on when I wanted it. I've had a heck of a time tracking down errcontext escapes in the past. I wrote it for an extension, then figured I'd submit it to core and see if anyone wanted it.

Even if we don't adopt it in PostgreSQL, now it's out there to help out others debugging similar issues.

(3) I think it's a complete failure that such a bug will only be
detected if the faulty code path is actually taken.  I think it's
quite likely that any such bugs that are lurking are in "can't
happen", or at least hard-to-hit, corner cases --- if they were in
routinely tested paths, we'd have noticed them by now.  So it'd be far
more helpful if we had a static-analysis way to detect such issues.

Thinking about (3), I wonder whether there's a way to instruct Coverity
to detect such errors.

I think calling it a "complete failure" is ridiculous. By the same rationale, we shouldn't bother with Assert(...) at all. But I agree that it's definitely far from as good as having a statically verifiable check would be, and this *should* be something we can statically verify.

I actually had a pretty good look around for static analysis options to see if I could find anything that might help us out before I landed up with this approach.

The sticking point is the API we use. By using auto stack variables (and doing so in pure C where they cannot have destructors) and using direct assignments to globals, we cannot use the majority of static analysis annotations since they tend to be aimed at function-based APIs. And for some reason most static analysis tools appear to be poor at discovering escapes of pointers to stack variables leaving scope, at least unless you use annotated functions that record ownership transfers. Which we can't because ... direct assignment.

It's one of the reasons I want to wrap the errcontext API per patch 6 in the above series. The current API is extremely verbose, hard to validate and check, and difficult to apply static analysis to.

If we had error context setup/teardown macros we could implement them using static inline functions and annotate them as appropriate for the target toolchain and available static analysis tools.

So what do you think of patch 6?

I'll try tweaking the updated API to add annotateable functions and see if that helps static analysis tools detect issues, then report back.

I personally think it'd be well worth adopting a wrapped API and simultaneously renaming ErrorContextCallback to ErrorContextCallbackData to ensure that code must be updated to compile with the changes. It'd be simple to provide a backwards compatibility header that extension authors can copy into their trees so they can use the new API when building against old postgres, greatly reducing the impact on extension authors. (We do that sort of thing constantly in pglogical and BDR).

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur
Next
From: Pavel Stehule
Date:
Subject: Re: Compatible defaults for LEAD/LAG