Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? |
Date | |
Msg-id | 20221231053602.GB1565918@rfd.leadboat.com Whole thread Raw |
In response to | Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
|
List | pgsql-hackers |
On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote: > On Thu, Dec 29, 2022 at 9:40 PM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote: > > > Thomas Munro <thomas.munro@gmail.com> writes: > > > > ... The regular expression machinery is capable of > > > > consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) > > > > frequently to avoid getting stuck. With the patch as it stands, that > > > > would never be true. > > > > > > Surely that can't be too hard to fix. We might have to refactor > > > the code around QueryCancelPending a little bit so that callers > > > can ask "do we need a query cancel now?" without actually triggering > > > a longjmp ... but why would that be problematic? > > > > It could work. The problems are like those of making code safe to run in a > > signal handler. You can use e.g. snprintf in rcancelrequested(), but you > > still can't use palloc() or ereport(). I see at least these strategies: > > > > 1. Accept that recovery conflict checks run after a regex call completes. > > 2. Have rcancelrequested() return true unconditionally if we need a conflict > > check. If there's no actual conflict, restart the regex. > > 3. Have rcancelrequested() run the conflict check, including elog-using > > PostgreSQL code. On longjmp(), accept the leak of regex mallocs. > > 4. Have rcancelrequested() run the conflict check, including elog-using > > PostgreSQL code. On longjmp(), escalate to FATAL. > > 5. Write the conflict check code to dutifully avoid longjmp(). > > 6. Convert src/backend/regex to use palloc, so longjmp() is fine. > > Thanks! I appreciate the help getting unstuck here. I'd thought > about some of these but not all. I also considered a couple more: > > 7. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is > true, and copy the error somewhere to be re-thrown later after the > regexp code exits with REG_CANCEL. > 8. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is > true, and call a new regexp function that will free everything before > re-throwing. > > After Tom's response I spent some time trying to figure out how to > make a SOFT_CHECK_FOR_INTERRUPTS(), which would return a value to > indicate that it would like to throw. I think it would need to re-arm > various flags and introduce a programming rule for all interrupt > processing routines that if they fired once under a soft check they > must fire again later under a non-soft check. That all seems a bit > complicated, and a general mechanism like that seemed like overkill > for a single user, which led me to idea #7. > > Idea #8 is a realisation that twisting oneself into a pretzel to avoid > having to change the regexp code or its REG_CANCEL control flow may be > a bit silly. If the only thing it really needs to do is free some > memory, maybe the regexp module should provide a function that frees > everything that is safe to call from our rcancelrequested callback, so > we can do so before we longjmp back to Kansas. Then the REG_CANCEL > code paths would be effectively unreachable in PostgreSQL. I don't > know if it's better or worse than your idea #6, "use palloc instead, > it already has garbage collection, duh", but it's a different take on > the same realisation that this is just about free(). PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in some not-yet-revealed way, (8) could get more relevant. > I guess idea #6 must be pretty easy to try: just point that MALLOC() > macro to palloc(), and do a plain old CFI() in rcancelrequested(). > Why do you suggest #3 as an interim measure? No strong reason. I think I suggested it because it's a strict subset of (6), but I didn't think through in detail. (I've never modified src/backend/regex and have barely read its code, for whatever that's worth.) > Do we fear that palloc() might hurt regexp performance? Nah. I don't recall any place in PostgreSQL where performance is an argument for raw malloc() calls. > > Incidentally, the affected test > > contains comment "# DROP TABLE containing block which standby has in a pinned > > buffer". The standby holds no pin at that moment; the LOCK TABLE pins system > > catalog pages, but it drops every pin it acquires. > > Oh, I guess the comment is just wrong? There are earlier sections > concerned with buffer pins, but the section "RECOVERY CONFLICT 3" is > about locks. Yes.
pgsql-hackers by date: