Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error - Mailing list pgsql-hackers

From Xiaoran Wang
Subject Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error
Date
Msg-id CAGjhLkNa_bru6HGjQ3o33eXZLR4nkQJLrHfqtwjMuY2NjZKfoA@mail.gmail.com
Whole thread Raw
In response to Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error
Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error
List pgsql-hackers


Tom Lane <tgl@sss.pgh.pa.us> 于2024年8月19日周一 22:12写道:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 19, 2024 at 2:17 AM Xiaoran Wang <fanfuxiaoran@gmail.com> wrote:
>> If the code in PG_TRY contains any non local control flow other than
>> ereport(ERROR) like goto, break etc., the PG_CATCH or PG_END_TRY cannot
>> be called, then the PG_exception_stack will point to the memory whose
>> stack frame has been released. So after that, when the pg_re_throw
>> called, __longjmp() will crash and report Segmentation fault error.
>>
>> Addition to sigjmp_buf, add another field 'int magic' which is next to
>> the sigjum_buf in the local stack frame memory. The magic's value is always
>> 'PG_exception_magic 0x12345678'. And in 'pg_re_throw' routine, check if
>> the magic's value is still '0x12345678', if not, that means the memory
>> where the 'PG_exception_stack' points to has been released, and the 'sigbuf'
>> must be invalid.

> This is an interesting idea. I suspect if we do this we want a
> different magic number and a different error message than what you
> chose here, but those are minor details.

I would suggest just adding an Assert; I doubt this check would be
useful in production.

Agree, an Assert is enough. 

> I'm not sure how reliable this would be at actually finding problems,
> though.

Yeah, that's the big problem.  I don't have any confidence at all
that this would detect misuse.  It'd require that the old stack
frame gets overwritten, which might not happen for a long time,
and it'd require that somebody eventually do a longjmp, which again
might not happen for a long time --- and when those did happen, the
error would be detected in someplace far away from the actual bug,
with little evidence remaining to help you localize it.

Exactly, it cannot tell you which PG_TRY left the invalid sigjmp_buf,
but to implement that is easy I think, recording the line num maybe.

I think this is still useful, at least  it tell you that the error is due to the
PG_TRY.

Also, as soon as some outer level of PG_TRY is exited in the proper
way, the dangling stack pointer will be fixed up.  That means there's
a fairly narrow time frame in which the overwrite and longjmp must
happen for this to catch a bug.


Yes, if the outer level PG_TRY call pg_re_throw instead of ereport, 
the dangling stack pointer will be fixed up.  It's would be great to fix that
up in any case. But I have no idea how to implement that now.

In pg_re_throw, if we could use '_local_sigjmp_buf' instead of the
global var PG_exception_stack, that would be great. We don't
need to worry about the invalid sigjum_buf.

So on the whole I doubt this'd be terribly useful in this form;
and I don't like the amount of code churn required.

> Personally, I don't think I've ever made this particular mistake. I
> think a much more common usage error is exiting the catch-block
> without either throwing an error or rolling back a subtransaction. But
> YMMV, of course.

We have had multiple instances of code "return"ing out of a PG_TRY,
so I fully agree that some better way to detect that would be good.
But maybe we ought to think about static analysis for that.

                        regards, tom lane


--
Best regards !
Xiaoran Wang

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: PG docs - Sequence CYCLE clause
Next
From: jian he
Date:
Subject: Re: documentation structure